feat: Implement structured message parsing and display in MessageViewer component

- Added MessageViewer component to display parsed messages in a chat-like format.
- Introduced new Message table in the database to store individual messages with timestamps, roles, and content.
- Updated Session model to include a relation to parsed messages.
- Created transcript parsing logic to convert raw transcripts into structured messages.
- Enhanced processing scheduler to handle sessions with parsed messages.
- Updated API endpoints to return parsed messages alongside session details.
- Added manual trigger commands for session refresh, transcript parsing, and processing.
- Improved user experience with color-coded message roles and timestamps in the UI.
- Documented the new scheduler workflow and transcript parsing implementation.
This commit is contained in:
Max Kowalski
2025-06-25 17:45:08 +02:00
parent 3196dabdf2
commit a9e4145001
20 changed files with 1043 additions and 90 deletions

185
docs/scheduler-workflow.md Normal file
View File

@ -0,0 +1,185 @@
# Scheduler Workflow Documentation
## Overview
The LiveDash system has two main schedulers that work together to fetch and process session data:
1. **Session Refresh Scheduler** - Fetches new sessions from CSV files
2. **Processing Scheduler** - Processes session transcripts with AI
## Current Status (as of latest check)
- **Total sessions**: 107
- **Processed sessions**: 0
- **Sessions with transcript**: 0
- **Ready for processing**: 0
## How the `processed` Field Works
The ProcessingScheduler picks up sessions where `processed` is **NOT** `true`, which includes:
- `processed = false`
- `processed = null`
**Query used:**
```javascript
{ processed: { not: true } } // Either false or null
```
## Complete Workflow
### Step 1: Session Refresh (CSV Fetching)
**What it does:**
- Fetches session data from company CSV URLs
- Creates session records in database with basic metadata
- Sets `transcriptContent = null` initially
- Sets `processed = null` initially
**Runs:** Every 30 minutes (cron: `*/30 * * * *`)
### Step 2: Transcript Fetching
**What it does:**
- Downloads full transcript content for sessions
- Updates `transcriptContent` field with actual conversation data
- Sessions remain `processed = null` until AI processing
**Runs:** As part of session refresh process
### Step 3: AI Processing
**What it does:**
- Finds sessions with transcript content where `processed != true`
- Sends transcripts to OpenAI for analysis
- Extracts: sentiment, category, questions, summary, etc.
- Updates session with processed data
- Sets `processed = true`
**Runs:** Every hour (cron: `0 * * * *`)
## Manual Trigger Commands
### Check Current Status
```bash
node scripts/manual-triggers.js status
```
### Trigger Session Refresh (Fetch new sessions from CSV)
```bash
node scripts/manual-triggers.js refresh
```
### Trigger AI Processing (Process unprocessed sessions)
```bash
node scripts/manual-triggers.js process
```
### Run Both Schedulers
```bash
node scripts/manual-triggers.js both
```
## Troubleshooting
### No Sessions Being Processed?
1. **Check if sessions have transcripts:**
```bash
node scripts/manual-triggers.js status
```
2. **If "Sessions with transcript" is 0:**
- Sessions exist but transcripts haven't been fetched yet
- Run session refresh: `node scripts/manual-triggers.js refresh`
3. **If "Ready for processing" is 0 but "Sessions with transcript" > 0:**
- All sessions with transcripts have already been processed
- Check if `OPENAI_API_KEY` is set in environment
### Common Issues
#### "No sessions found requiring processing"
- All sessions with transcripts have been processed (`processed = true`)
- Or no sessions have transcript content yet
#### "OPENAI_API_KEY environment variable is not set"
- Add OpenAI API key to `.env.development` file
- Restart the application
#### "Error fetching transcript: Unauthorized"
- CSV credentials are incorrect or expired
- Check company CSV username/password in database
## Database Field Mapping
### Before AI Processing
```javascript
{
id: "session-uuid",
transcriptContent: "full conversation text" | null,
processed: null,
sentimentCategory: null,
questions: null,
summary: null,
// ... other fields
}
```
### After AI Processing
```javascript
{
id: "session-uuid",
transcriptContent: "full conversation text",
processed: true,
sentimentCategory: "positive" | "neutral" | "negative",
questions: '["question 1", "question 2"]', // JSON string
summary: "Brief conversation summary",
language: "en", // ISO 639-1 code
messagesSent: 5,
sentiment: 0.8, // Float value (-1 to 1)
escalated: false,
forwardedHr: false,
category: "Schedule & Hours",
// ... other fields
}
```
## Scheduler Configuration
### Session Refresh Scheduler
- **File**: `lib/scheduler.js`
- **Frequency**: Every 30 minutes
- **Cron**: `*/30 * * * *`
### Processing Scheduler
- **File**: `lib/processingScheduler.js`
- **Frequency**: Every hour
- **Cron**: `0 * * * *`
- **Batch size**: 10 sessions per run
## Environment Variables Required
```bash
# Database
DATABASE_URL="postgresql://..."
# OpenAI (for processing)
OPENAI_API_KEY="sk-..."
# NextAuth
NEXTAUTH_SECRET="..."
NEXTAUTH_URL="http://localhost:3000"
```
## Next Steps for Testing
1. **Trigger session refresh** to fetch transcripts:
```bash
node scripts/manual-triggers.js refresh
```
2. **Check status** to see if transcripts were fetched:
```bash
node scripts/manual-triggers.js status
```
3. **Trigger processing** if transcripts are available:
```bash
node scripts/manual-triggers.js process
```
4. **View results** in the dashboard session details pages

View File

@ -0,0 +1,203 @@
# Transcript Parsing Implementation
## Overview
Added structured message parsing to the LiveDash system, allowing transcripts to be broken down into individual messages with timestamps, roles, and content. This provides a much better user experience for viewing conversations.
## Database Changes
### New Message Table
```sql
CREATE TABLE Message (
id TEXT PRIMARY KEY DEFAULT (uuid()),
sessionId TEXT NOT NULL,
timestamp DATETIME NOT NULL,
role TEXT NOT NULL,
content TEXT NOT NULL,
order INTEGER NOT NULL,
createdAt DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (sessionId) REFERENCES Session(id) ON DELETE CASCADE
);
CREATE INDEX Message_sessionId_order_idx ON Message(sessionId, order);
```
### Updated Session Table
- Added `messages` relation to Session model
- Sessions can now have both raw transcript content AND parsed messages
## New Components
### 1. Message Interface (`lib/types.ts`)
```typescript
export interface Message {
id: string;
sessionId: string;
timestamp: Date;
role: string; // "User", "Assistant", "System", etc.
content: string;
order: number; // Order within the conversation (0, 1, 2, ...)
createdAt: Date;
}
```
### 2. Transcript Parser (`lib/transcriptParser.js`)
- **`parseChatLogToJSON(logString)`** - Parses raw transcript text into structured messages
- **`storeMessagesForSession(sessionId, messages)`** - Stores parsed messages in database
- **`processTranscriptForSession(sessionId, transcriptContent)`** - Complete processing for one session
- **`processAllUnparsedTranscripts()`** - Batch process all unparsed transcripts
- **`getMessagesForSession(sessionId)`** - Retrieve messages for a session
### 3. MessageViewer Component (`components/MessageViewer.tsx`)
- Chat-like interface for displaying parsed messages
- Color-coded by role (User: blue, Assistant: gray, System: yellow)
- Shows timestamps and message order
- Scrollable with conversation metadata
## Updated Components
### 1. Session API (`pages/api/dashboard/session/[id].ts`)
- Now includes parsed messages in session response
- Messages are ordered by `order` field (ascending)
### 2. Session Details Page (`app/dashboard/sessions/[id]/page.tsx`)
- Added MessageViewer component
- Shows both parsed messages AND raw transcript
- Prioritizes parsed messages when available
### 3. ChatSession Interface (`lib/types.ts`)
- Added optional `messages?: Message[]` field
## Parsing Logic
### Supported Format
The parser expects transcript format:
```
[DD.MM.YYYY HH:MM:SS] Role: Message content
[DD.MM.YYYY HH:MM:SS] User: Hello, I need help
[DD.MM.YYYY HH:MM:SS] Assistant: How can I help you today?
```
### Features
- **Multi-line support** - Messages can span multiple lines
- **Timestamp parsing** - Converts DD.MM.YYYY HH:MM:SS to ISO format
- **Role detection** - Extracts sender role from each message
- **Ordering** - Maintains conversation order with explicit order field
- **Sorting** - Messages sorted by timestamp, then by role (User before Assistant)
## Manual Commands
### New Commands Added
```bash
# Parse transcripts into structured messages
node scripts/manual-triggers.js parse
# Complete workflow: refresh → parse → process
node scripts/manual-triggers.js all
# Check status (now shows parsing info)
node scripts/manual-triggers.js status
```
### Updated Commands
- **`status`** - Now shows transcript and parsing statistics
- **`all`** - New command that runs refresh → parse → process in sequence
## Workflow Integration
### Complete Processing Pipeline
1. **Session Refresh** - Fetch sessions from CSV, download transcripts
2. **Transcript Parsing** - Parse raw transcripts into structured messages
3. **AI Processing** - Process sessions with OpenAI for sentiment, categories, etc.
### Database States
```javascript
// After CSV fetch
{
transcriptContent: "raw text...",
messages: [], // Empty
processed: null
}
// After parsing
{
transcriptContent: "raw text...",
messages: [Message, Message, ...], // Parsed
processed: null
}
// After AI processing
{
transcriptContent: "raw text...",
messages: [Message, Message, ...], // Parsed
processed: true,
sentimentCategory: "positive",
summary: "Brief summary...",
// ... other AI fields
}
```
## User Experience Improvements
### Before
- Only raw transcript text in a text area
- Difficult to follow conversation flow
- No clear distinction between speakers
### After
- **Chat-like interface** with message bubbles
- **Color-coded roles** for easy identification
- **Timestamps** for each message
- **Conversation metadata** (first/last message times)
- **Fallback to raw transcript** if parsing fails
- **Both views available** - structured AND raw
## Testing
### Manual Testing Commands
```bash
# Check current status
node scripts/manual-triggers.js status
# Parse existing transcripts
node scripts/manual-triggers.js parse
# Full pipeline test
node scripts/manual-triggers.js all
```
### Expected Results
1. Sessions with transcript content get parsed into individual messages
2. Session detail pages show chat-like interface
3. Both parsed messages and raw transcript are available
4. No data loss - original transcript content preserved
## Technical Benefits
### Performance
- **Indexed queries** - Messages indexed by sessionId and order
- **Efficient loading** - Only load messages when needed
- **Cascading deletes** - Messages automatically deleted with sessions
### Maintainability
- **Separation of concerns** - Parsing logic isolated in dedicated module
- **Type safety** - Full TypeScript support for Message interface
- **Error handling** - Graceful fallbacks when parsing fails
### Extensibility
- **Role flexibility** - Supports any role names (User, Assistant, System, etc.)
- **Content preservation** - Multi-line messages fully supported
- **Metadata ready** - Easy to add message-level metadata in future
## Migration Notes
### Existing Data
- **No data loss** - Original transcript content preserved
- **Backward compatibility** - Pages work with or without parsed messages
- **Gradual migration** - Can parse transcripts incrementally
### Database Migration
- New Message table created with foreign key constraints
- Existing Session table unchanged (only added relation)
- Index created for efficient message queries
This implementation provides a solid foundation for enhanced conversation analysis and user experience while maintaining full backward compatibility.