Files
livedash-node/docs/transcript-parsing-implementation.md
Kaj Kowalski 1e0ee37a39 fix: resolve all Biome linting errors and Prettier formatting issues
- Reduce cognitive complexity in lib/api/handler.ts (23 → 15)
- Reduce cognitive complexity in lib/config/provider.ts (38 → 15)
- Fix TypeScript any type violations in multiple files
- Remove unused variable in lib/batchSchedulerOptimized.ts
- Add prettier-ignore comments to documentation with intentional syntax errors
- Resolve Prettier/Biome formatting conflicts with targeted ignores
- Create .prettierignore for build artifacts and dependencies

All linting checks now pass and build completes successfully (47/47 pages).
2025-07-13 22:06:18 +02:00

6.4 KiB

Transcript Parsing Implementation

Overview

Added structured message parsing to the LiveDash system, allowing transcripts to be broken down into individual messages with timestamps, roles, and content. This provides a much better user experience for viewing conversations.

Database Changes

New Message Table

CREATE TABLE Message (
  id        TEXT PRIMARY KEY DEFAULT (uuid()),
  sessionId TEXT NOT NULL,
  timestamp DATETIME NOT NULL,
  role      TEXT NOT NULL,
  content   TEXT NOT NULL,
  order     INTEGER NOT NULL,
  createdAt DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (sessionId) REFERENCES Session(id) ON DELETE CASCADE
);

CREATE INDEX Message_sessionId_order_idx ON Message(sessionId, order);

Updated Session Table

  • Added messages relation to Session model
  • Sessions can now have both raw transcript content AND parsed messages

New Components

1. Message Interface (lib/types.ts)

export interface Message {
  id: string;
  sessionId: string;
  timestamp: Date;
  role: string; // "User", "Assistant", "System", etc.
  content: string;
  order: number; // Order within the conversation (0, 1, 2, ...)
  createdAt: Date;
}

2. Transcript Parser (lib/transcriptParser.js)

  • parseChatLogToJSON(logString) - Parses raw transcript text into structured messages
  • storeMessagesForSession(sessionId, messages) - Stores parsed messages in database
  • processTranscriptForSession(sessionId, transcriptContent) - Complete processing for one session
  • processAllUnparsedTranscripts() - Batch process all unparsed transcripts
  • getMessagesForSession(sessionId) - Retrieve messages for a session

3. MessageViewer Component (components/MessageViewer.tsx)

  • Chat-like interface for displaying parsed messages
  • Color-coded by role (User: blue, Assistant: gray, System: yellow)
  • Shows timestamps and message order
  • Scrollable with conversation metadata

Updated Components

1. Session API (pages/api/dashboard/session/[id].ts)

  • Now includes parsed messages in session response
  • Messages are ordered by order field (ascending)

2. Session Details Page (app/dashboard/sessions/[id]/page.tsx)

  • Added MessageViewer component
  • Shows both parsed messages AND raw transcript
  • Prioritizes parsed messages when available

3. ChatSession Interface (lib/types.ts)

  • Added optional messages?: Message[] field

Parsing Logic

Supported Format

The parser expects transcript format:

[DD.MM.YYYY HH:MM:SS] Role: Message content
[DD.MM.YYYY HH:MM:SS] User: Hello, I need help
[DD.MM.YYYY HH:MM:SS] Assistant: How can I help you today?

Features

  • Multi-line support - Messages can span multiple lines
  • Timestamp parsing - Converts DD.MM.YYYY HH:MM:SS to ISO format
  • Role detection - Extracts sender role from each message
  • Ordering - Maintains conversation order with explicit order field
  • Sorting - Messages sorted by timestamp, then by role (User before Assistant)

Manual Commands

New Commands Added

# Parse transcripts into structured messages
node scripts/manual-triggers.js parse

# Complete workflow: refresh → parse → process
node scripts/manual-triggers.js all

# Check status (now shows parsing info)
node scripts/manual-triggers.js status

Updated Commands

  • status - Now shows transcript and parsing statistics
  • all - New command that runs refresh → parse → process in sequence

Workflow Integration

Complete Processing Pipeline

  1. Session Refresh - Fetch sessions from CSV, download transcripts
  2. Transcript Parsing - Parse raw transcripts into structured messages
  3. AI Processing - Process sessions with OpenAI for sentiment, categories, etc.

Database States

// After CSV fetch
{
  transcriptContent: "raw text...",
  messages: [], // Empty
  processed: null
}

// After parsing
{
  transcriptContent: "raw text...",
  messages: [Message, Message, ...], // Parsed
  processed: null
}

// After AI processing
{
  transcriptContent: "raw text...",
  messages: [Message, Message, ...], // Parsed
  processed: true,
  sentimentCategory: "positive",
  summary: "Brief summary...",
  // ... other AI fields
}

User Experience Improvements

Before

  • Only raw transcript text in a text area
  • Difficult to follow conversation flow
  • No clear distinction between speakers

After

  • Chat-like interface with message bubbles
  • Color-coded roles for easy identification
  • Timestamps for each message
  • Conversation metadata (first/last message times)
  • Fallback to raw transcript if parsing fails
  • Both views available - structured AND raw

Testing

Manual Testing Commands

# Check current status
node scripts/manual-triggers.js status

# Parse existing transcripts
node scripts/manual-triggers.js parse

# Full pipeline test
node scripts/manual-triggers.js all

Expected Results

  1. Sessions with transcript content get parsed into individual messages
  2. Session detail pages show chat-like interface
  3. Both parsed messages and raw transcript are available
  4. No data loss - original transcript content preserved

Technical Benefits

Performance

  • Indexed queries - Messages indexed by sessionId and order
  • Efficient loading - Only load messages when needed
  • Cascading deletes - Messages automatically deleted with sessions

Maintainability

  • Separation of concerns - Parsing logic isolated in dedicated module
  • Type safety - Full TypeScript support for Message interface
  • Error handling - Graceful fallbacks when parsing fails

Extensibility

  • Role flexibility - Supports any role names (User, Assistant, System, etc.)
  • Content preservation - Multi-line messages fully supported
  • Metadata ready - Easy to add message-level metadata in future

Migration Notes

Existing Data

  • No data loss - Original transcript content preserved
  • Backward compatibility - Pages work with or without parsed messages
  • Gradual migration - Can parse transcripts incrementally

Database Migration

  • New Message table created with foreign key constraints
  • Existing Session table unchanged (only added relation)
  • Index created for efficient message queries

This implementation provides a solid foundation for enhanced conversation analysis and user experience while maintaining full backward compatibility.