DB refactor

This commit is contained in:
Max Kowalski
2025-06-27 23:05:46 +02:00
parent 185bb6da58
commit 2dfc49f840
20 changed files with 1607 additions and 339 deletions

View File

@ -12,10 +12,10 @@ The WordCloud component visualizes categories or topics based on their frequency
**Features:**
- Dynamic sizing based on frequency
- Colorful display with a pleasing color palette
- Responsive design
- Interactive hover effects
- Dynamic sizing based on frequency
- Colorful display with a pleasing color palette
- Responsive design
- Interactive hover effects
### 2. GeographicMap
@ -25,10 +25,10 @@ This component displays a world map with circles representing the number of sess
**Features:**
- Interactive map using React Leaflet
- Circle sizes scaled by session count
- Tooltips showing country names and session counts
- Responsive design
- Interactive map using React Leaflet
- Circle sizes scaled by session count
- Tooltips showing country names and session counts
- Responsive design
### 3. MetricCard
@ -38,10 +38,10 @@ A modern, visually appealing card for displaying key metrics.
**Features:**
- Multiple design variants (default, primary, success, warning, danger)
- Support for trend indicators
- Icons and descriptions
- Clean, modern styling
- Multiple design variants (default, primary, success, warning, danger)
- Support for trend indicators
- Icons and descriptions
- Clean, modern styling
### 4. DonutChart
@ -51,10 +51,10 @@ An enhanced donut chart with better styling and a central text display capabilit
**Features:**
- Customizable colors
- Center text area for displaying summaries
- Interactive tooltips with percentages
- Well-balanced legend display
- Customizable colors
- Center text area for displaying summaries
- Interactive tooltips with percentages
- Well-balanced legend display
### 5. ResponseTimeDistribution
@ -64,28 +64,28 @@ Visualizes the distribution of response times as a histogram.
**Features:**
- Color-coded bars (green for fast, yellow for medium, red for slow)
- Target time indicator
- Automatic binning of response times
- Clear labeling and scales
- Color-coded bars (green for fast, yellow for medium, red for slow)
- Target time indicator
- Automatic binning of response times
- Clear labeling and scales
## Dashboard Enhancements
The dashboard has been enhanced with:
1. **Improved Layout**: Better use of space and responsive grid layouts
2. **Visual Hierarchies**: Clear heading styles and consistent spacing
3. **Color Coding**: Semantic use of colors to indicate statuses
4. **Interactive Elements**: Better button styles with loading indicators
5. **Data Context**: More complete view of metrics with additional visualizations
6. **Geographic Insights**: Map view of session distribution by country
7. **Language Analysis**: Improved language distribution visualization
8. **Category Analysis**: Word cloud for category popularity
9. **Performance Metrics**: Response time distribution for better insight into system performance
1. **Improved Layout**: Better use of space and responsive grid layouts
2. **Visual Hierarchies**: Clear heading styles and consistent spacing
3. **Color Coding**: Semantic use of colors to indicate statuses
4. **Interactive Elements**: Better button styles with loading indicators
5. **Data Context**: More complete view of metrics with additional visualizations
6. **Geographic Insights**: Map view of session distribution by country
7. **Language Analysis**: Improved language distribution visualization
8. **Category Analysis**: Word cloud for category popularity
9. **Performance Metrics**: Response time distribution for better insight into system performance
## Usage Notes
- The geographic map and response time distribution use simulated data where actual data is not available
- All components are responsive and will adjust to different screen sizes
- The dashboard automatically refreshes data when using the refresh button
- Admin users have access to additional controls at the bottom of the dashboard
- The geographic map and response time distribution use simulated data where actual data is not available
- All components are responsive and will adjust to different screen sizes
- The dashboard automatically refreshes data when using the refresh button
- Admin users have access to additional controls at the bottom of the dashboard

View File

@ -8,8 +8,8 @@
**Solution**:
- Added validation in `fetchAndStoreSessionsForAllCompanies()` to skip companies with example/invalid URLs
- Removed the invalid company record from the database using `fix_companies.js`
- Added validation in `fetchAndStoreSessionsForAllCompanies()` to skip companies with example/invalid URLs
- Removed the invalid company record from the database using `fix_companies.js`
### 2. Transcript Fetching Errors
@ -17,10 +17,10 @@
**Solution**:
- Improved error handling in `fetchTranscriptContent()` function
- Added probabilistic logging (only ~10% of errors logged) to prevent log spam
- Added timeout (10 seconds) for transcript fetching
- Made transcript fetching failures non-blocking (sessions are still created without transcript content)
- Improved error handling in `fetchTranscriptContent()` function
- Added probabilistic logging (only ~10% of errors logged) to prevent log spam
- Added timeout (10 seconds) for transcript fetching
- Made transcript fetching failures non-blocking (sessions are still created without transcript content)
### 3. CSV Fetching Errors
@ -28,8 +28,8 @@
**Solution**:
- Added URL validation to skip companies with `example.com` URLs
- Improved error logging to be more descriptive
- Added URL validation to skip companies with `example.com` URLs
- Improved error logging to be more descriptive
## Current Status
@ -42,22 +42,22 @@
After cleanup, only valid companies remain:
- **Demo Company** (`790b9233-d369-451f-b92c-f4dceb42b649`)
- CSV URL: `https://proto.notso.ai/jumbo/chats`
- Has valid authentication credentials
- 107 sessions in database
- **Demo Company** (`790b9233-d369-451f-b92c-f4dceb42b649`)
- CSV URL: `https://proto.notso.ai/jumbo/chats`
- Has valid authentication credentials
- 107 sessions in database
## Files Modified
1. **lib/csvFetcher.js**
1. **lib/csvFetcher.js**
- Added company URL validation
- Improved transcript fetching error handling
- Reduced error log verbosity
- Added company URL validation
- Improved transcript fetching error handling
- Reduced error log verbosity
2. **fix_companies.js** (cleanup script)
- Removes invalid company records
- Can be run again if needed
2. **fix_companies.js** (cleanup script)
- Removes invalid company records
- Can be run again if needed
## Monitoring
@ -73,7 +73,7 @@ node -e "import('./lib/csvFetcher.js').then(m => m.fetchAndStoreSessionsForAllCo
## Future Improvements
1. Add health check endpoint for scheduler status
2. Add metrics for successful/failed fetches
3. Consider retry logic for temporary failures
4. Add alerting for persistent failures
1. Add health check endpoint for scheduler status
2. Add metrics for successful/failed fetches
3. Consider retry logic for temporary failures
4. Add alerting for persistent failures

View File

@ -1,24 +1,28 @@
# Scheduler Workflow Documentation
## Overview
The LiveDash system has two main schedulers that work together to fetch and process session data:
1. **Session Refresh Scheduler** - Fetches new sessions from CSV files
2. **Processing Scheduler** - Processes session transcripts with AI
1. **Session Refresh Scheduler** - Fetches new sessions from CSV files
2. **Processing Scheduler** - Processes session transcripts with AI
## Current Status (as of latest check)
- **Total sessions**: 107
- **Processed sessions**: 0
- **Sessions with transcript**: 0
- **Ready for processing**: 0
- **Total sessions**: 107
- **Processed sessions**: 0
- **Sessions with transcript**: 0
- **Ready for processing**: 0
## How the `processed` Field Works
The ProcessingScheduler picks up sessions where `processed` is **NOT** `true`, which includes:
- `processed = false`
- `processed = null`
- `processed = false`
- `processed = null`
**Query used:**
```javascript
{ processed: { not: true } } // Either false or null
```
@ -26,50 +30,60 @@ The ProcessingScheduler picks up sessions where `processed` is **NOT** `true`, w
## Complete Workflow
### Step 1: Session Refresh (CSV Fetching)
**What it does:**
- Fetches session data from company CSV URLs
- Creates session records in database with basic metadata
- Sets `transcriptContent = null` initially
- Sets `processed = null` initially
- Fetches session data from company CSV URLs
- Creates session records in database with basic metadata
- Sets `transcriptContent = null` initially
- Sets `processed = null` initially
**Runs:** Every 30 minutes (cron: `*/30 * * * *`)
### Step 2: Transcript Fetching
**What it does:**
- Downloads full transcript content for sessions
- Updates `transcriptContent` field with actual conversation data
- Sessions remain `processed = null` until AI processing
- Downloads full transcript content for sessions
- Updates `transcriptContent` field with actual conversation data
- Sessions remain `processed = null` until AI processing
**Runs:** As part of session refresh process
### Step 3: AI Processing
**What it does:**
- Finds sessions with transcript content where `processed != true`
- Sends transcripts to OpenAI for analysis
- Extracts: sentiment, category, questions, summary, etc.
- Updates session with processed data
- Sets `processed = true`
- Finds sessions with transcript content where `processed != true`
- Sends transcripts to OpenAI for analysis
- Extracts: sentiment, category, questions, summary, etc.
- Updates session with processed data
- Sets `processed = true`
**Runs:** Every hour (cron: `0 * * * *`)
## Manual Trigger Commands
### Check Current Status
```bash
node scripts/manual-triggers.js status
```
### Trigger Session Refresh (Fetch new sessions from CSV)
```bash
node scripts/manual-triggers.js refresh
```
### Trigger AI Processing (Process unprocessed sessions)
```bash
node scripts/manual-triggers.js process
```
### Run Both Schedulers
```bash
node scripts/manual-triggers.js both
```
@ -77,36 +91,42 @@ node scripts/manual-triggers.js both
## Troubleshooting
### No Sessions Being Processed?
1. **Check if sessions have transcripts:**
1. **Check if sessions have transcripts:**
```bash
node scripts/manual-triggers.js status
```
2. **If "Sessions with transcript" is 0:**
- Sessions exist but transcripts haven't been fetched yet
- Run session refresh: `node scripts/manual-triggers.js refresh`
2. **If "Sessions with transcript" is 0:**
- Sessions exist but transcripts haven't been fetched yet
- Run session refresh: `node scripts/manual-triggers.js refresh`
3. **If "Ready for processing" is 0 but "Sessions with transcript" > 0:**
- All sessions with transcripts have already been processed
- Check if `OPENAI_API_KEY` is set in environment
3. **If "Ready for processing" is 0 but "Sessions with transcript" > 0:**
- All sessions with transcripts have already been processed
- Check if `OPENAI_API_KEY` is set in environment
### Common Issues
#### "No sessions found requiring processing"
- All sessions with transcripts have been processed (`processed = true`)
- Or no sessions have transcript content yet
- All sessions with transcripts have been processed (`processed = true`)
- Or no sessions have transcript content yet
#### "OPENAI_API_KEY environment variable is not set"
- Add OpenAI API key to `.env.development` file
- Restart the application
- Add OpenAI API key to `.env.development` file
- Restart the application
#### "Error fetching transcript: Unauthorized"
- CSV credentials are incorrect or expired
- Check company CSV username/password in database
- CSV credentials are incorrect or expired
- Check company CSV username/password in database
## Database Field Mapping
### Before AI Processing
```javascript
{
id: "session-uuid",
@ -120,6 +140,7 @@ node scripts/manual-triggers.js both
```
### After AI Processing
```javascript
{
id: "session-uuid",
@ -141,15 +162,17 @@ node scripts/manual-triggers.js both
## Scheduler Configuration
### Session Refresh Scheduler
- **File**: `lib/scheduler.js`
- **Frequency**: Every 30 minutes
- **Cron**: `*/30 * * * *`
- **File**: `lib/scheduler.js`
- **Frequency**: Every 30 minutes
- **Cron**: `*/30 * * * *`
### Processing Scheduler
- **File**: `lib/processingScheduler.js`
- **Frequency**: Every hour
- **Cron**: `0 * * * *`
- **Batch size**: 10 sessions per run
- **File**: `lib/processingScheduler.js`
- **Frequency**: Every hour
- **Cron**: `0 * * * *`
- **Batch size**: 10 sessions per run
## Environment Variables Required
@ -167,19 +190,22 @@ NEXTAUTH_URL="http://localhost:3000"
## Next Steps for Testing
1. **Trigger session refresh** to fetch transcripts:
1. **Trigger session refresh** to fetch transcripts:
```bash
node scripts/manual-triggers.js refresh
```
2. **Check status** to see if transcripts were fetched:
2. **Check status** to see if transcripts were fetched:
```bash
node scripts/manual-triggers.js status
```
3. **Trigger processing** if transcripts are available:
3. **Trigger processing** if transcripts are available:
```bash
node scripts/manual-triggers.js process
```
4. **View results** in the dashboard session details pages
4. **View results** in the dashboard session details pages

View File

@ -6,47 +6,47 @@ This document explains how the session processing system works in LiveDash-Node.
The system now includes an automated process for analyzing chat session transcripts using OpenAI's API. This process:
1. Fetches session data from CSV sources
2. Only adds new sessions that don't already exist in the database
3. Processes session transcripts with OpenAI to extract valuable insights
4. Updates the database with the processed information
1. Fetches session data from CSV sources
2. Only adds new sessions that don't already exist in the database
3. Processes session transcripts with OpenAI to extract valuable insights
4. Updates the database with the processed information
## How It Works
### Session Fetching
- The system fetches session data from configured CSV URLs for each company
- Unlike the previous implementation, it now only adds sessions that don't already exist in the database
- This prevents duplicate sessions and allows for incremental updates
- The system fetches session data from configured CSV URLs for each company
- Unlike the previous implementation, it now only adds sessions that don't already exist in the database
- This prevents duplicate sessions and allows for incremental updates
### Transcript Processing
- For sessions with transcript content that haven't been processed yet, the system calls OpenAI's API
- The API analyzes the transcript and extracts the following information:
- Primary language used (ISO 639-1 code)
- Number of messages sent by the user
- Overall sentiment (positive, neutral, negative)
- Whether the conversation was escalated
- Whether HR contact was mentioned or provided
- Best-fitting category for the conversation
- Up to 5 paraphrased questions asked by the user
- A brief summary of the conversation
- For sessions with transcript content that haven't been processed yet, the system calls OpenAI's API
- The API analyzes the transcript and extracts the following information:
- Primary language used (ISO 639-1 code)
- Number of messages sent by the user
- Overall sentiment (positive, neutral, negative)
- Whether the conversation was escalated
- Whether HR contact was mentioned or provided
- Best-fitting category for the conversation
- Up to 5 paraphrased questions asked by the user
- A brief summary of the conversation
### Scheduling
The system includes two schedulers:
1. **Session Refresh Scheduler**: Runs every 15 minutes to fetch new sessions from CSV sources
2. **Session Processing Scheduler**: Runs every hour to process unprocessed sessions with OpenAI
1. **Session Refresh Scheduler**: Runs every 15 minutes to fetch new sessions from CSV sources
2. **Session Processing Scheduler**: Runs every hour to process unprocessed sessions with OpenAI
## Database Schema
The Session model has been updated with new fields to store the processed data:
- `processed`: Boolean flag indicating whether the session has been processed
- `sentimentCategory`: String value ("positive", "neutral", "negative") from OpenAI
- `questions`: JSON array of questions asked by the user
- `summary`: Brief summary of the conversation
- `processed`: Boolean flag indicating whether the session has been processed
- `sentimentCategory`: String value ("positive", "neutral", "negative") from OpenAI
- `questions`: JSON array of questions asked by the user
- `summary`: Brief summary of the conversation
## Configuration
@ -62,9 +62,9 @@ OPENAI_API_KEY=your_api_key_here
To run the application with schedulers enabled:
- Development: `npm run dev`
- Development (with schedulers disabled): `npm run dev:no-schedulers`
- Production: `npm run start`
- Development: `npm run dev`
- Development (with schedulers disabled): `npm run dev:no-schedulers`
- Production: `npm run start`
Note: These commands will start a custom Next.js server with the schedulers enabled. You'll need to have an OpenAI API key set in your `.env.local` file for the session processing to work.
@ -82,5 +82,5 @@ This will process all unprocessed sessions that have transcript content.
The processing logic can be customized by modifying:
- `lib/processingScheduler.ts`: Contains the OpenAI processing logic
- `scripts/process_sessions.ts`: Standalone script for manual processing
- `lib/processingScheduler.ts`: Contains the OpenAI processing logic
- `scripts/process_sessions.ts`: Standalone script for manual processing

View File

@ -1,11 +1,13 @@
# Transcript Parsing Implementation
## Overview
Added structured message parsing to the LiveDash system, allowing transcripts to be broken down into individual messages with timestamps, roles, and content. This provides a much better user experience for viewing conversations.
## Database Changes
### New Message Table
```sql
CREATE TABLE Message (
id TEXT PRIMARY KEY DEFAULT (uuid()),
@ -22,12 +24,14 @@ CREATE INDEX Message_sessionId_order_idx ON Message(sessionId, order);
```
### Updated Session Table
- Added `messages` relation to Session model
- Sessions can now have both raw transcript content AND parsed messages
- Added `messages` relation to Session model
- Sessions can now have both raw transcript content AND parsed messages
## New Components
### 1. Message Interface (`lib/types.ts`)
```typescript
export interface Message {
id: string;
@ -41,36 +45,43 @@ export interface Message {
```
### 2. Transcript Parser (`lib/transcriptParser.js`)
- **`parseChatLogToJSON(logString)`** - Parses raw transcript text into structured messages
- **`storeMessagesForSession(sessionId, messages)`** - Stores parsed messages in database
- **`processTranscriptForSession(sessionId, transcriptContent)`** - Complete processing for one session
- **`processAllUnparsedTranscripts()`** - Batch process all unparsed transcripts
- **`getMessagesForSession(sessionId)`** - Retrieve messages for a session
- **`parseChatLogToJSON(logString)`** - Parses raw transcript text into structured messages
- **`storeMessagesForSession(sessionId, messages)`** - Stores parsed messages in database
- **`processTranscriptForSession(sessionId, transcriptContent)`** - Complete processing for one session
- **`processAllUnparsedTranscripts()`** - Batch process all unparsed transcripts
- **`getMessagesForSession(sessionId)`** - Retrieve messages for a session
### 3. MessageViewer Component (`components/MessageViewer.tsx`)
- Chat-like interface for displaying parsed messages
- Color-coded by role (User: blue, Assistant: gray, System: yellow)
- Shows timestamps and message order
- Scrollable with conversation metadata
- Chat-like interface for displaying parsed messages
- Color-coded by role (User: blue, Assistant: gray, System: yellow)
- Shows timestamps and message order
- Scrollable with conversation metadata
## Updated Components
### 1. Session API (`pages/api/dashboard/session/[id].ts`)
- Now includes parsed messages in session response
- Messages are ordered by `order` field (ascending)
- Now includes parsed messages in session response
- Messages are ordered by `order` field (ascending)
### 2. Session Details Page (`app/dashboard/sessions/[id]/page.tsx`)
- Added MessageViewer component
- Shows both parsed messages AND raw transcript
- Prioritizes parsed messages when available
- Added MessageViewer component
- Shows both parsed messages AND raw transcript
- Prioritizes parsed messages when available
### 3. ChatSession Interface (`lib/types.ts`)
- Added optional `messages?: Message[]` field
- Added optional `messages?: Message[]` field
## Parsing Logic
### Supported Format
The parser expects transcript format:
```
[DD.MM.YYYY HH:MM:SS] Role: Message content
[DD.MM.YYYY HH:MM:SS] User: Hello, I need help
@ -78,15 +89,17 @@ The parser expects transcript format:
```
### Features
- **Multi-line support** - Messages can span multiple lines
- **Timestamp parsing** - Converts DD.MM.YYYY HH:MM:SS to ISO format
- **Role detection** - Extracts sender role from each message
- **Ordering** - Maintains conversation order with explicit order field
- **Sorting** - Messages sorted by timestamp, then by role (User before Assistant)
- **Multi-line support** - Messages can span multiple lines
- **Timestamp parsing** - Converts DD.MM.YYYY HH:MM:SS to ISO format
- **Role detection** - Extracts sender role from each message
- **Ordering** - Maintains conversation order with explicit order field
- **Sorting** - Messages sorted by timestamp, then by role (User before Assistant)
## Manual Commands
### New Commands Added
```bash
# Parse transcripts into structured messages
node scripts/manual-triggers.js parse
@ -99,17 +112,20 @@ node scripts/manual-triggers.js status
```
### Updated Commands
- **`status`** - Now shows transcript and parsing statistics
- **`all`** - New command that runs refresh → parse → process in sequence
- **`status`** - Now shows transcript and parsing statistics
- **`all`** - New command that runs refresh → parse → process in sequence
## Workflow Integration
### Complete Processing Pipeline
1. **Session Refresh** - Fetch sessions from CSV, download transcripts
2. **Transcript Parsing** - Parse raw transcripts into structured messages
3. **AI Processing** - Process sessions with OpenAI for sentiment, categories, etc.
1. **Session Refresh** - Fetch sessions from CSV, download transcripts
2. **Transcript Parsing** - Parse raw transcripts into structured messages
3. **AI Processing** - Process sessions with OpenAI for sentiment, categories, etc.
### Database States
```javascript
// After CSV fetch
{
@ -139,21 +155,24 @@ node scripts/manual-triggers.js status
## User Experience Improvements
### Before
- Only raw transcript text in a text area
- Difficult to follow conversation flow
- No clear distinction between speakers
- Only raw transcript text in a text area
- Difficult to follow conversation flow
- No clear distinction between speakers
### After
- **Chat-like interface** with message bubbles
- **Color-coded roles** for easy identification
- **Timestamps** for each message
- **Conversation metadata** (first/last message times)
- **Fallback to raw transcript** if parsing fails
- **Both views available** - structured AND raw
- **Chat-like interface** with message bubbles
- **Color-coded roles** for easy identification
- **Timestamps** for each message
- **Conversation metadata** (first/last message times)
- **Fallback to raw transcript** if parsing fails
- **Both views available** - structured AND raw
## Testing
### Manual Testing Commands
```bash
# Check current status
node scripts/manual-triggers.js status
@ -166,38 +185,44 @@ node scripts/manual-triggers.js all
```
### Expected Results
1. Sessions with transcript content get parsed into individual messages
2. Session detail pages show chat-like interface
3. Both parsed messages and raw transcript are available
4. No data loss - original transcript content preserved
1. Sessions with transcript content get parsed into individual messages
2. Session detail pages show chat-like interface
3. Both parsed messages and raw transcript are available
4. No data loss - original transcript content preserved
## Technical Benefits
### Performance
- **Indexed queries** - Messages indexed by sessionId and order
- **Efficient loading** - Only load messages when needed
- **Cascading deletes** - Messages automatically deleted with sessions
- **Indexed queries** - Messages indexed by sessionId and order
- **Efficient loading** - Only load messages when needed
- **Cascading deletes** - Messages automatically deleted with sessions
### Maintainability
- **Separation of concerns** - Parsing logic isolated in dedicated module
- **Type safety** - Full TypeScript support for Message interface
- **Error handling** - Graceful fallbacks when parsing fails
- **Separation of concerns** - Parsing logic isolated in dedicated module
- **Type safety** - Full TypeScript support for Message interface
- **Error handling** - Graceful fallbacks when parsing fails
### Extensibility
- **Role flexibility** - Supports any role names (User, Assistant, System, etc.)
- **Content preservation** - Multi-line messages fully supported
- **Metadata ready** - Easy to add message-level metadata in future
- **Role flexibility** - Supports any role names (User, Assistant, System, etc.)
- **Content preservation** - Multi-line messages fully supported
- **Metadata ready** - Easy to add message-level metadata in future
## Migration Notes
### Existing Data
- **No data loss** - Original transcript content preserved
- **Backward compatibility** - Pages work with or without parsed messages
- **Gradual migration** - Can parse transcripts incrementally
- **No data loss** - Original transcript content preserved
- **Backward compatibility** - Pages work with or without parsed messages
- **Gradual migration** - Can parse transcripts incrementally
### Database Migration
- New Message table created with foreign key constraints
- Existing Session table unchanged (only added relation)
- Index created for efficient message queries
- New Message table created with foreign key constraints
- Existing Session table unchanged (only added relation)
- Index created for efficient message queries
This implementation provides a solid foundation for enhanced conversation analysis and user experience while maintaining full backward compatibility.