feat: Implement session processing and refresh schedulers

- Added processingScheduler.js and processingScheduler.ts to handle session transcript processing using OpenAI API. - Implemented a new scheduler (scheduler.js and schedulers.ts) for refreshing sessions every 15 minutes. - Updated Prisma migrations to add new fields for processed sessions, including questions, sentimentCategory, and summary. - Created scripts (process_sessions.mjs and process_sessions.ts) for manual processing of unprocessed sessions. - Enhanced server.js and server.mjs to initialize schedulers on server start.
2026-03-02 21:01:28 +01:00 · 2025-06-25 16:14:01 +02:00
parent c9e24298cd
commit 3196dabdf2
23 changed files with 2267 additions and 49 deletions
--- a/docs/scheduler-fixes.md
+++ b/docs/scheduler-fixes.md
@@ -0,0 +1,71 @@
+# Scheduler Error Fixes
+
+## Issues Identified and Resolved
+
+### 1. Invalid Company Configuration
+**Problem**: Company `26fc3d34-c074-4556-85bd-9a66fafc0e08` had an invalid CSV URL (`https://example.com/data.csv`) with no authentication credentials.
+
+**Solution**: 
+- Added validation in `fetchAndStoreSessionsForAllCompanies()` to skip companies with example/invalid URLs
+- Removed the invalid company record from the database using `fix_companies.js`
+
+### 2. Transcript Fetching Errors
+**Problem**: Multiple "Error fetching transcript: Unauthorized" messages were flooding the logs when individual transcript files couldn't be accessed.
+
+**Solution**:
+- Improved error handling in `fetchTranscriptContent()` function
+- Added probabilistic logging (only ~10% of errors logged) to prevent log spam
+- Added timeout (10 seconds) for transcript fetching
+- Made transcript fetching failures non-blocking (sessions are still created without transcript content)
+
+### 3. CSV Fetching Errors
+**Problem**: "Failed to fetch CSV: Not Found" errors for companies with invalid URLs.
+
+**Solution**:
+- Added URL validation to skip companies with `example.com` URLs
+- Improved error logging to be more descriptive
+
+## Current Status
+
+✅ **Fixed**: No more "Unauthorized" error spam
+✅ **Fixed**: No more "Not Found" CSV errors  
+✅ **Fixed**: Scheduler runs cleanly without errors
+✅ **Improved**: Better error handling and logging
+
+## Remaining Companies
+
+After cleanup, only valid companies remain:
+- **Demo Company** (`790b9233-d369-451f-b92c-f4dceb42b649`)
+  - CSV URL: `https://proto.notso.ai/jumbo/chats`
+  - Has valid authentication credentials
+  - 107 sessions in database
+
+## Files Modified
+
+1. **lib/csvFetcher.js**
+   - Added company URL validation
+   - Improved transcript fetching error handling
+   - Reduced error log verbosity
+
+2. **fix_companies.js** (cleanup script)
+   - Removes invalid company records
+   - Can be run again if needed
+
+## Monitoring
+
+The scheduler now runs cleanly every 15 minutes. To monitor:
+
+```bash
+# Check scheduler logs
+node debug_db.js
+
+# Test manual refresh
+node -e "import('./lib/csvFetcher.js').then(m => m.fetchAndStoreSessionsForAllCompanies())"
+```
+
+## Future Improvements
+
+1. Add health check endpoint for scheduler status
+2. Add metrics for successful/failed fetches
+3. Consider retry logic for temporary failures
+4. Add alerting for persistent failures
--- a/docs/session-processing.md
+++ b/docs/session-processing.md
@@ -0,0 +1,85 @@
+# Session Processing with OpenAI
+
+This document explains how the session processing system works in LiveDash-Node.
+
+## Overview
+
+The system now includes an automated process for analyzing chat session transcripts using OpenAI's API. This process:
+
+1. Fetches session data from CSV sources
+2. Only adds new sessions that don't already exist in the database
+3. Processes session transcripts with OpenAI to extract valuable insights
+4. Updates the database with the processed information
+
+## How It Works
+
+### Session Fetching
+
+- The system fetches session data from configured CSV URLs for each company
+- Unlike the previous implementation, it now only adds sessions that don't already exist in the database
+- This prevents duplicate sessions and allows for incremental updates
+
+### Transcript Processing
+
+- For sessions with transcript content that haven't been processed yet, the system calls OpenAI's API
+- The API analyzes the transcript and extracts the following information:
+  - Primary language used (ISO 639-1 code)
+  - Number of messages sent by the user
+  - Overall sentiment (positive, neutral, negative)
+  - Whether the conversation was escalated
+  - Whether HR contact was mentioned or provided
+  - Best-fitting category for the conversation
+  - Up to 5 paraphrased questions asked by the user
+  - A brief summary of the conversation
+
+### Scheduling
+
+The system includes two schedulers:
+
+1. **Session Refresh Scheduler**: Runs every 15 minutes to fetch new sessions from CSV sources
+2. **Session Processing Scheduler**: Runs every hour to process unprocessed sessions with OpenAI
+
+## Database Schema
+
+The Session model has been updated with new fields to store the processed data:
+
+- `processed`: Boolean flag indicating whether the session has been processed
+- `sentimentCategory`: String value ("positive", "neutral", "negative") from OpenAI
+- `questions`: JSON array of questions asked by the user
+- `summary`: Brief summary of the conversation
+
+## Configuration
+
+### OpenAI API Key
+
+To use the session processing feature, you need to add your OpenAI API key to the `.env.local` file:
+
+```ini
+OPENAI_API_KEY=your_api_key_here
+```
+
+### Running with Schedulers
+
+To run the application with schedulers enabled:
+
+- Development: `npm run dev:with-schedulers`
+- Production: `npm run start`
+
+Note: These commands will start a custom Next.js server with the schedulers enabled. You'll need to have an OpenAI API key set in your `.env.local` file for the session processing to work.
+
+## Manual Processing
+
+You can also manually process sessions by running the script:
+
+```
+node scripts/process_sessions.mjs
+```
+
+This will process all unprocessed sessions that have transcript content.
+
+## Customization
+
+The processing logic can be customized by modifying:
+
+- `lib/processingScheduler.ts`: Contains the OpenAI processing logic
+- `scripts/process_sessions.ts`: Standalone script for manual processing