feat: Implement structured message parsing and display in MessageViewer component

- Added MessageViewer component to display parsed messages in a chat-like format. - Introduced new Message table in the database to store individual messages with timestamps, roles, and content. - Updated Session model to include a relation to parsed messages. - Created transcript parsing logic to convert raw transcripts into structured messages. - Enhanced processing scheduler to handle sessions with parsed messages. - Updated API endpoints to return parsed messages alongside session details. - Added manual trigger commands for session refresh, transcript parsing, and processing. - Improved user experience with color-coded message roles and timestamps in the UI. - Documented the new scheduler workflow and transcript parsing implementation.
2026-03-03 00:01:30 +01:00 · 2025-06-25 17:45:08 +02:00
parent 3196dabdf2
commit a9e4145001
20 changed files with 1043 additions and 90 deletions
--- a/docs/scheduler-workflow.md
+++ b/docs/scheduler-workflow.md
@@ -0,0 +1,185 @@
+# Scheduler Workflow Documentation
+
+## Overview
+The LiveDash system has two main schedulers that work together to fetch and process session data:
+
+1. **Session Refresh Scheduler** - Fetches new sessions from CSV files
+2. **Processing Scheduler** - Processes session transcripts with AI
+
+## Current Status (as of latest check)
+- **Total sessions**: 107
+- **Processed sessions**: 0  
+- **Sessions with transcript**: 0
+- **Ready for processing**: 0
+
+## How the `processed` Field Works
+
+The ProcessingScheduler picks up sessions where `processed` is **NOT** `true`, which includes:
+- `processed = false` 
+- `processed = null`
+
+**Query used:**
+```javascript
+{ processed: { not: true } } // Either false or null
+```
+
+## Complete Workflow
+
+### Step 1: Session Refresh (CSV Fetching)
+**What it does:**
+- Fetches session data from company CSV URLs
+- Creates session records in database with basic metadata
+- Sets `transcriptContent = null` initially
+- Sets `processed = null` initially
+
+**Runs:** Every 30 minutes (cron: `*/30 * * * *`)
+
+### Step 2: Transcript Fetching
+**What it does:**
+- Downloads full transcript content for sessions
+- Updates `transcriptContent` field with actual conversation data
+- Sessions remain `processed = null` until AI processing
+
+**Runs:** As part of session refresh process
+
+### Step 3: AI Processing
+**What it does:**
+- Finds sessions with transcript content where `processed != true`
+- Sends transcripts to OpenAI for analysis
+- Extracts: sentiment, category, questions, summary, etc.
+- Updates session with processed data
+- Sets `processed = true`
+
+**Runs:** Every hour (cron: `0 * * * *`)
+
+## Manual Trigger Commands
+
+### Check Current Status
+```bash
+node scripts/manual-triggers.js status
+```
+
+### Trigger Session Refresh (Fetch new sessions from CSV)
+```bash
+node scripts/manual-triggers.js refresh
+```
+
+### Trigger AI Processing (Process unprocessed sessions)
+```bash
+node scripts/manual-triggers.js process
+```
+
+### Run Both Schedulers
+```bash
+node scripts/manual-triggers.js both
+```
+
+## Troubleshooting
+
+### No Sessions Being Processed?
+1. **Check if sessions have transcripts:**
+   ```bash
+   node scripts/manual-triggers.js status
+   ```
+
+2. **If "Sessions with transcript" is 0:**
+   - Sessions exist but transcripts haven't been fetched yet
+   - Run session refresh: `node scripts/manual-triggers.js refresh`
+
+3. **If "Ready for processing" is 0 but "Sessions with transcript" > 0:**
+   - All sessions with transcripts have already been processed
+   - Check if `OPENAI_API_KEY` is set in environment
+
+### Common Issues
+
+#### "No sessions found requiring processing"
+- All sessions with transcripts have been processed (`processed = true`)
+- Or no sessions have transcript content yet
+
+#### "OPENAI_API_KEY environment variable is not set"
+- Add OpenAI API key to `.env.development` file
+- Restart the application
+
+#### "Error fetching transcript: Unauthorized"
+- CSV credentials are incorrect or expired
+- Check company CSV username/password in database
+
+## Database Field Mapping
+
+### Before AI Processing
+```javascript
+{
+  id: "session-uuid",
+  transcriptContent: "full conversation text" | null,
+  processed: null,
+  sentimentCategory: null,
+  questions: null,
+  summary: null,
+  // ... other fields
+}
+```
+
+### After AI Processing
+```javascript
+{
+  id: "session-uuid", 
+  transcriptContent: "full conversation text",
+  processed: true,
+  sentimentCategory: "positive" | "neutral" | "negative",
+  questions: '["question 1", "question 2"]', // JSON string
+  summary: "Brief conversation summary",
+  language: "en", // ISO 639-1 code
+  messagesSent: 5,
+  sentiment: 0.8, // Float value (-1 to 1)
+  escalated: false,
+  forwardedHr: false,
+  category: "Schedule & Hours",
+  // ... other fields
+}
+```
+
+## Scheduler Configuration
+
+### Session Refresh Scheduler
+- **File**: `lib/scheduler.js`
+- **Frequency**: Every 30 minutes
+- **Cron**: `*/30 * * * *`
+
+### Processing Scheduler  
+- **File**: `lib/processingScheduler.js`
+- **Frequency**: Every hour
+- **Cron**: `0 * * * *`
+- **Batch size**: 10 sessions per run
+
+## Environment Variables Required
+
+```bash
+# Database
+DATABASE_URL="postgresql://..."
+
+# OpenAI (for processing)
+OPENAI_API_KEY="sk-..."
+
+# NextAuth
+NEXTAUTH_SECRET="..."
+NEXTAUTH_URL="http://localhost:3000"
+```
+
+## Next Steps for Testing
+
+1. **Trigger session refresh** to fetch transcripts:
+   ```bash
+   node scripts/manual-triggers.js refresh
+   ```
+
+2. **Check status** to see if transcripts were fetched:
+   ```bash
+   node scripts/manual-triggers.js status
+   ```
+
+3. **Trigger processing** if transcripts are available:
+   ```bash
+   node scripts/manual-triggers.js process
+   ```
+
+4. **View results** in the dashboard session details pages
--- a/docs/transcript-parsing-implementation.md
+++ b/docs/transcript-parsing-implementation.md
@@ -0,0 +1,203 @@
+# Transcript Parsing Implementation
+
+## Overview
+Added structured message parsing to the LiveDash system, allowing transcripts to be broken down into individual messages with timestamps, roles, and content. This provides a much better user experience for viewing conversations.
+
+## Database Changes
+
+### New Message Table
+```sql
+CREATE TABLE Message (
+  id        TEXT PRIMARY KEY DEFAULT (uuid()),
+  sessionId TEXT NOT NULL,
+  timestamp DATETIME NOT NULL,
+  role      TEXT NOT NULL,
+  content   TEXT NOT NULL,
+  order     INTEGER NOT NULL,
+  createdAt DATETIME DEFAULT CURRENT_TIMESTAMP,
+  FOREIGN KEY (sessionId) REFERENCES Session(id) ON DELETE CASCADE
+);
+
+CREATE INDEX Message_sessionId_order_idx ON Message(sessionId, order);
+```
+
+### Updated Session Table
+- Added `messages` relation to Session model
+- Sessions can now have both raw transcript content AND parsed messages
+
+## New Components
+
+### 1. Message Interface (`lib/types.ts`)
+```typescript
+export interface Message {
+  id: string;
+  sessionId: string;
+  timestamp: Date;
+  role: string; // "User", "Assistant", "System", etc.
+  content: string;
+  order: number; // Order within the conversation (0, 1, 2, ...)
+  createdAt: Date;
+}
+```
+
+### 2. Transcript Parser (`lib/transcriptParser.js`)
+- **`parseChatLogToJSON(logString)`** - Parses raw transcript text into structured messages
+- **`storeMessagesForSession(sessionId, messages)`** - Stores parsed messages in database
+- **`processTranscriptForSession(sessionId, transcriptContent)`** - Complete processing for one session
+- **`processAllUnparsedTranscripts()`** - Batch process all unparsed transcripts
+- **`getMessagesForSession(sessionId)`** - Retrieve messages for a session
+
+### 3. MessageViewer Component (`components/MessageViewer.tsx`)
+- Chat-like interface for displaying parsed messages
+- Color-coded by role (User: blue, Assistant: gray, System: yellow)
+- Shows timestamps and message order
+- Scrollable with conversation metadata
+
+## Updated Components
+
+### 1. Session API (`pages/api/dashboard/session/[id].ts`)
+- Now includes parsed messages in session response
+- Messages are ordered by `order` field (ascending)
+
+### 2. Session Details Page (`app/dashboard/sessions/[id]/page.tsx`)
+- Added MessageViewer component
+- Shows both parsed messages AND raw transcript
+- Prioritizes parsed messages when available
+
+### 3. ChatSession Interface (`lib/types.ts`)
+- Added optional `messages?: Message[]` field
+
+## Parsing Logic
+
+### Supported Format
+The parser expects transcript format:
+```
+[DD.MM.YYYY HH:MM:SS] Role: Message content
+[DD.MM.YYYY HH:MM:SS] User: Hello, I need help
+[DD.MM.YYYY HH:MM:SS] Assistant: How can I help you today?
+```
+
+### Features
+- **Multi-line support** - Messages can span multiple lines
+- **Timestamp parsing** - Converts DD.MM.YYYY HH:MM:SS to ISO format
+- **Role detection** - Extracts sender role from each message
+- **Ordering** - Maintains conversation order with explicit order field
+- **Sorting** - Messages sorted by timestamp, then by role (User before Assistant)
+
+## Manual Commands
+
+### New Commands Added
+```bash
+# Parse transcripts into structured messages
+node scripts/manual-triggers.js parse
+
+# Complete workflow: refresh → parse → process
+node scripts/manual-triggers.js all
+
+# Check status (now shows parsing info)
+node scripts/manual-triggers.js status
+```
+
+### Updated Commands
+- **`status`** - Now shows transcript and parsing statistics
+- **`all`** - New command that runs refresh → parse → process in sequence
+
+## Workflow Integration
+
+### Complete Processing Pipeline
+1. **Session Refresh** - Fetch sessions from CSV, download transcripts
+2. **Transcript Parsing** - Parse raw transcripts into structured messages
+3. **AI Processing** - Process sessions with OpenAI for sentiment, categories, etc.
+
+### Database States
+```javascript
+// After CSV fetch
+{
+  transcriptContent: "raw text...",
+  messages: [], // Empty
+  processed: null
+}
+
+// After parsing
+{
+  transcriptContent: "raw text...",
+  messages: [Message, Message, ...], // Parsed
+  processed: null
+}
+
+// After AI processing
+{
+  transcriptContent: "raw text...",
+  messages: [Message, Message, ...], // Parsed
+  processed: true,
+  sentimentCategory: "positive",
+  summary: "Brief summary...",
+  // ... other AI fields
+}
+```
+
+## User Experience Improvements
+
+### Before
+- Only raw transcript text in a text area
+- Difficult to follow conversation flow
+- No clear distinction between speakers
+
+### After
+- **Chat-like interface** with message bubbles
+- **Color-coded roles** for easy identification
+- **Timestamps** for each message
+- **Conversation metadata** (first/last message times)
+- **Fallback to raw transcript** if parsing fails
+- **Both views available** - structured AND raw
+
+## Testing
+
+### Manual Testing Commands
+```bash
+# Check current status
+node scripts/manual-triggers.js status
+
+# Parse existing transcripts
+node scripts/manual-triggers.js parse
+
+# Full pipeline test
+node scripts/manual-triggers.js all
+```
+
+### Expected Results
+1. Sessions with transcript content get parsed into individual messages
+2. Session detail pages show chat-like interface
+3. Both parsed messages and raw transcript are available
+4. No data loss - original transcript content preserved
+
+## Technical Benefits
+
+### Performance
+- **Indexed queries** - Messages indexed by sessionId and order
+- **Efficient loading** - Only load messages when needed
+- **Cascading deletes** - Messages automatically deleted with sessions
+
+### Maintainability
+- **Separation of concerns** - Parsing logic isolated in dedicated module
+- **Type safety** - Full TypeScript support for Message interface
+- **Error handling** - Graceful fallbacks when parsing fails
+
+### Extensibility
+- **Role flexibility** - Supports any role names (User, Assistant, System, etc.)
+- **Content preservation** - Multi-line messages fully supported
+- **Metadata ready** - Easy to add message-level metadata in future
+
+## Migration Notes
+
+### Existing Data
+- **No data loss** - Original transcript content preserved
+- **Backward compatibility** - Pages work with or without parsed messages
+- **Gradual migration** - Can parse transcripts incrementally
+
+### Database Migration
+- New Message table created with foreign key constraints
+- Existing Session table unchanged (only added relation)
+- Index created for efficient message queries
+
+This implementation provides a solid foundation for enhanced conversation analysis and user experience while maintaining full backward compatibility.