feat: Implement session processing and refresh schedulers

- Added processingScheduler.js and processingScheduler.ts to handle session transcript processing using OpenAI API.
- Implemented a new scheduler (scheduler.js and schedulers.ts) for refreshing sessions every 15 minutes.
- Updated Prisma migrations to add new fields for processed sessions, including questions, sentimentCategory, and summary.
- Created scripts (process_sessions.mjs and process_sessions.ts) for manual processing of unprocessed sessions.
- Enhanced server.js and server.mjs to initialize schedulers on server start.
This commit is contained in:
Max Kowalski
2025-06-25 16:14:01 +02:00
parent c9e24298cd
commit 3196dabdf2
23 changed files with 2267 additions and 49 deletions

71
docs/scheduler-fixes.md Normal file
View File

@ -0,0 +1,71 @@
# Scheduler Error Fixes
## Issues Identified and Resolved
### 1. Invalid Company Configuration
**Problem**: Company `26fc3d34-c074-4556-85bd-9a66fafc0e08` had an invalid CSV URL (`https://example.com/data.csv`) with no authentication credentials.
**Solution**:
- Added validation in `fetchAndStoreSessionsForAllCompanies()` to skip companies with example/invalid URLs
- Removed the invalid company record from the database using `fix_companies.js`
### 2. Transcript Fetching Errors
**Problem**: Multiple "Error fetching transcript: Unauthorized" messages were flooding the logs when individual transcript files couldn't be accessed.
**Solution**:
- Improved error handling in `fetchTranscriptContent()` function
- Added probabilistic logging (only ~10% of errors logged) to prevent log spam
- Added timeout (10 seconds) for transcript fetching
- Made transcript fetching failures non-blocking (sessions are still created without transcript content)
### 3. CSV Fetching Errors
**Problem**: "Failed to fetch CSV: Not Found" errors for companies with invalid URLs.
**Solution**:
- Added URL validation to skip companies with `example.com` URLs
- Improved error logging to be more descriptive
## Current Status
**Fixed**: No more "Unauthorized" error spam
**Fixed**: No more "Not Found" CSV errors
**Fixed**: Scheduler runs cleanly without errors
**Improved**: Better error handling and logging
## Remaining Companies
After cleanup, only valid companies remain:
- **Demo Company** (`790b9233-d369-451f-b92c-f4dceb42b649`)
- CSV URL: `https://proto.notso.ai/jumbo/chats`
- Has valid authentication credentials
- 107 sessions in database
## Files Modified
1. **lib/csvFetcher.js**
- Added company URL validation
- Improved transcript fetching error handling
- Reduced error log verbosity
2. **fix_companies.js** (cleanup script)
- Removes invalid company records
- Can be run again if needed
## Monitoring
The scheduler now runs cleanly every 15 minutes. To monitor:
```bash
# Check scheduler logs
node debug_db.js
# Test manual refresh
node -e "import('./lib/csvFetcher.js').then(m => m.fetchAndStoreSessionsForAllCompanies())"
```
## Future Improvements
1. Add health check endpoint for scheduler status
2. Add metrics for successful/failed fetches
3. Consider retry logic for temporary failures
4. Add alerting for persistent failures

View File

@ -0,0 +1,85 @@
# Session Processing with OpenAI
This document explains how the session processing system works in LiveDash-Node.
## Overview
The system now includes an automated process for analyzing chat session transcripts using OpenAI's API. This process:
1. Fetches session data from CSV sources
2. Only adds new sessions that don't already exist in the database
3. Processes session transcripts with OpenAI to extract valuable insights
4. Updates the database with the processed information
## How It Works
### Session Fetching
- The system fetches session data from configured CSV URLs for each company
- Unlike the previous implementation, it now only adds sessions that don't already exist in the database
- This prevents duplicate sessions and allows for incremental updates
### Transcript Processing
- For sessions with transcript content that haven't been processed yet, the system calls OpenAI's API
- The API analyzes the transcript and extracts the following information:
- Primary language used (ISO 639-1 code)
- Number of messages sent by the user
- Overall sentiment (positive, neutral, negative)
- Whether the conversation was escalated
- Whether HR contact was mentioned or provided
- Best-fitting category for the conversation
- Up to 5 paraphrased questions asked by the user
- A brief summary of the conversation
### Scheduling
The system includes two schedulers:
1. **Session Refresh Scheduler**: Runs every 15 minutes to fetch new sessions from CSV sources
2. **Session Processing Scheduler**: Runs every hour to process unprocessed sessions with OpenAI
## Database Schema
The Session model has been updated with new fields to store the processed data:
- `processed`: Boolean flag indicating whether the session has been processed
- `sentimentCategory`: String value ("positive", "neutral", "negative") from OpenAI
- `questions`: JSON array of questions asked by the user
- `summary`: Brief summary of the conversation
## Configuration
### OpenAI API Key
To use the session processing feature, you need to add your OpenAI API key to the `.env.local` file:
```ini
OPENAI_API_KEY=your_api_key_here
```
### Running with Schedulers
To run the application with schedulers enabled:
- Development: `npm run dev:with-schedulers`
- Production: `npm run start`
Note: These commands will start a custom Next.js server with the schedulers enabled. You'll need to have an OpenAI API key set in your `.env.local` file for the session processing to work.
## Manual Processing
You can also manually process sessions by running the script:
```
node scripts/process_sessions.mjs
```
This will process all unprocessed sessions that have transcript content.
## Customization
The processing logic can be customized by modifying:
- `lib/processingScheduler.ts`: Contains the OpenAI processing logic
- `scripts/process_sessions.ts`: Standalone script for manual processing