DB refactor

This commit is contained in:
Max Kowalski
2025-06-27 23:05:46 +02:00
parent 185bb6da58
commit 2dfc49f840
20 changed files with 1607 additions and 339 deletions

View File

@ -1,11 +1,13 @@
# Transcript Parsing Implementation
## Overview
Added structured message parsing to the LiveDash system, allowing transcripts to be broken down into individual messages with timestamps, roles, and content. This provides a much better user experience for viewing conversations.
## Database Changes
### New Message Table
```sql
CREATE TABLE Message (
id TEXT PRIMARY KEY DEFAULT (uuid()),
@ -22,12 +24,14 @@ CREATE INDEX Message_sessionId_order_idx ON Message(sessionId, order);
```
### Updated Session Table
- Added `messages` relation to Session model
- Sessions can now have both raw transcript content AND parsed messages
- Added `messages` relation to Session model
- Sessions can now have both raw transcript content AND parsed messages
## New Components
### 1. Message Interface (`lib/types.ts`)
```typescript
export interface Message {
id: string;
@ -41,36 +45,43 @@ export interface Message {
```
### 2. Transcript Parser (`lib/transcriptParser.js`)
- **`parseChatLogToJSON(logString)`** - Parses raw transcript text into structured messages
- **`storeMessagesForSession(sessionId, messages)`** - Stores parsed messages in database
- **`processTranscriptForSession(sessionId, transcriptContent)`** - Complete processing for one session
- **`processAllUnparsedTranscripts()`** - Batch process all unparsed transcripts
- **`getMessagesForSession(sessionId)`** - Retrieve messages for a session
- **`parseChatLogToJSON(logString)`** - Parses raw transcript text into structured messages
- **`storeMessagesForSession(sessionId, messages)`** - Stores parsed messages in database
- **`processTranscriptForSession(sessionId, transcriptContent)`** - Complete processing for one session
- **`processAllUnparsedTranscripts()`** - Batch process all unparsed transcripts
- **`getMessagesForSession(sessionId)`** - Retrieve messages for a session
### 3. MessageViewer Component (`components/MessageViewer.tsx`)
- Chat-like interface for displaying parsed messages
- Color-coded by role (User: blue, Assistant: gray, System: yellow)
- Shows timestamps and message order
- Scrollable with conversation metadata
- Chat-like interface for displaying parsed messages
- Color-coded by role (User: blue, Assistant: gray, System: yellow)
- Shows timestamps and message order
- Scrollable with conversation metadata
## Updated Components
### 1. Session API (`pages/api/dashboard/session/[id].ts`)
- Now includes parsed messages in session response
- Messages are ordered by `order` field (ascending)
- Now includes parsed messages in session response
- Messages are ordered by `order` field (ascending)
### 2. Session Details Page (`app/dashboard/sessions/[id]/page.tsx`)
- Added MessageViewer component
- Shows both parsed messages AND raw transcript
- Prioritizes parsed messages when available
- Added MessageViewer component
- Shows both parsed messages AND raw transcript
- Prioritizes parsed messages when available
### 3. ChatSession Interface (`lib/types.ts`)
- Added optional `messages?: Message[]` field
- Added optional `messages?: Message[]` field
## Parsing Logic
### Supported Format
The parser expects transcript format:
```
[DD.MM.YYYY HH:MM:SS] Role: Message content
[DD.MM.YYYY HH:MM:SS] User: Hello, I need help
@ -78,15 +89,17 @@ The parser expects transcript format:
```
### Features
- **Multi-line support** - Messages can span multiple lines
- **Timestamp parsing** - Converts DD.MM.YYYY HH:MM:SS to ISO format
- **Role detection** - Extracts sender role from each message
- **Ordering** - Maintains conversation order with explicit order field
- **Sorting** - Messages sorted by timestamp, then by role (User before Assistant)
- **Multi-line support** - Messages can span multiple lines
- **Timestamp parsing** - Converts DD.MM.YYYY HH:MM:SS to ISO format
- **Role detection** - Extracts sender role from each message
- **Ordering** - Maintains conversation order with explicit order field
- **Sorting** - Messages sorted by timestamp, then by role (User before Assistant)
## Manual Commands
### New Commands Added
```bash
# Parse transcripts into structured messages
node scripts/manual-triggers.js parse
@ -99,17 +112,20 @@ node scripts/manual-triggers.js status
```
### Updated Commands
- **`status`** - Now shows transcript and parsing statistics
- **`all`** - New command that runs refresh → parse → process in sequence
- **`status`** - Now shows transcript and parsing statistics
- **`all`** - New command that runs refresh → parse → process in sequence
## Workflow Integration
### Complete Processing Pipeline
1. **Session Refresh** - Fetch sessions from CSV, download transcripts
2. **Transcript Parsing** - Parse raw transcripts into structured messages
3. **AI Processing** - Process sessions with OpenAI for sentiment, categories, etc.
1. **Session Refresh** - Fetch sessions from CSV, download transcripts
2. **Transcript Parsing** - Parse raw transcripts into structured messages
3. **AI Processing** - Process sessions with OpenAI for sentiment, categories, etc.
### Database States
```javascript
// After CSV fetch
{
@ -139,21 +155,24 @@ node scripts/manual-triggers.js status
## User Experience Improvements
### Before
- Only raw transcript text in a text area
- Difficult to follow conversation flow
- No clear distinction between speakers
- Only raw transcript text in a text area
- Difficult to follow conversation flow
- No clear distinction between speakers
### After
- **Chat-like interface** with message bubbles
- **Color-coded roles** for easy identification
- **Timestamps** for each message
- **Conversation metadata** (first/last message times)
- **Fallback to raw transcript** if parsing fails
- **Both views available** - structured AND raw
- **Chat-like interface** with message bubbles
- **Color-coded roles** for easy identification
- **Timestamps** for each message
- **Conversation metadata** (first/last message times)
- **Fallback to raw transcript** if parsing fails
- **Both views available** - structured AND raw
## Testing
### Manual Testing Commands
```bash
# Check current status
node scripts/manual-triggers.js status
@ -166,38 +185,44 @@ node scripts/manual-triggers.js all
```
### Expected Results
1. Sessions with transcript content get parsed into individual messages
2. Session detail pages show chat-like interface
3. Both parsed messages and raw transcript are available
4. No data loss - original transcript content preserved
1. Sessions with transcript content get parsed into individual messages
2. Session detail pages show chat-like interface
3. Both parsed messages and raw transcript are available
4. No data loss - original transcript content preserved
## Technical Benefits
### Performance
- **Indexed queries** - Messages indexed by sessionId and order
- **Efficient loading** - Only load messages when needed
- **Cascading deletes** - Messages automatically deleted with sessions
- **Indexed queries** - Messages indexed by sessionId and order
- **Efficient loading** - Only load messages when needed
- **Cascading deletes** - Messages automatically deleted with sessions
### Maintainability
- **Separation of concerns** - Parsing logic isolated in dedicated module
- **Type safety** - Full TypeScript support for Message interface
- **Error handling** - Graceful fallbacks when parsing fails
- **Separation of concerns** - Parsing logic isolated in dedicated module
- **Type safety** - Full TypeScript support for Message interface
- **Error handling** - Graceful fallbacks when parsing fails
### Extensibility
- **Role flexibility** - Supports any role names (User, Assistant, System, etc.)
- **Content preservation** - Multi-line messages fully supported
- **Metadata ready** - Easy to add message-level metadata in future
- **Role flexibility** - Supports any role names (User, Assistant, System, etc.)
- **Content preservation** - Multi-line messages fully supported
- **Metadata ready** - Easy to add message-level metadata in future
## Migration Notes
### Existing Data
- **No data loss** - Original transcript content preserved
- **Backward compatibility** - Pages work with or without parsed messages
- **Gradual migration** - Can parse transcripts incrementally
- **No data loss** - Original transcript content preserved
- **Backward compatibility** - Pages work with or without parsed messages
- **Gradual migration** - Can parse transcripts incrementally
### Database Migration
- New Message table created with foreign key constraints
- Existing Session table unchanged (only added relation)
- Index created for efficient message queries
- New Message table created with foreign key constraints
- Existing Session table unchanged (only added relation)
- Index created for efficient message queries
This implementation provides a solid foundation for enhanced conversation analysis and user experience while maintaining full backward compatibility.