Files
livedash-node/docs/scheduler-workflow.md
Kaj Kowalski e2301725a3 feat: complete development environment setup and code quality improvements
- Set up pre-commit hooks with husky and lint-staged for automated code quality
- Improved TypeScript type safety by replacing 'any' types with proper generics
- Fixed markdown linting violations (MD030 spacing) across all documentation
- Fixed compound adjective hyphenation in technical documentation
- Fixed invalid JSON union syntax in API documentation examples
- Automated code formatting and linting on commit
- Enhanced error handling with better type constraints
- Configured biome and markdownlint for consistent code style
- All changes verified with successful production build
2025-07-13 14:44:05 +02:00

214 lines
4.7 KiB
Markdown

# Scheduler Workflow Documentation
## Overview
The LiveDash system has two main schedulers that work together to fetch and process session data:
1. **Session Refresh Scheduler** - Fetches new sessions from CSV files
2. **Processing Scheduler** - Processes session transcripts with AI
## Current Status (as of latest check)
- **Total sessions**: 107
- **Processed sessions**: 0
- **Sessions with transcript**: 0
- **Ready for processing**: 0
## How the `processed` Field Works
The ProcessingScheduler picks up sessions where `processed` is **NOT** `true`, which includes:
- `processed = false`
- `processed = null`
**Query used:**
```javascript
{ processed: { not: true } } // Either false or null
```
## Complete Workflow
### Step 1: Session Refresh (CSV Fetching)
**What it does:**
- Fetches session data from company CSV URLs
- Creates session records in database with basic metadata
- Sets `transcriptContent = null` initially
- Sets `processed = null` initially
**Runs:** Every 30 minutes (cron: `*/30 * * * *`)
### Step 2: Transcript Fetching
**What it does:**
- Downloads full transcript content for sessions
- Updates `transcriptContent` field with actual conversation data
- Sessions remain `processed = null` until AI processing
**Runs:** As part of session refresh process
### Step 3: AI Processing
**What it does:**
- Finds sessions with transcript content where `processed != true`
- Sends transcripts to OpenAI for analysis
- Extracts: sentiment, category, questions, summary, etc.
- Updates session with processed data
- Sets `processed = true`
**Runs:** Every hour (cron: `0 * * * *`)
## Manual Trigger Commands
### Check Current Status
```bash
node scripts/manual-triggers.js status
```
### Trigger Session Refresh (Fetch new sessions from CSV)
```bash
node scripts/manual-triggers.js refresh
```
### Trigger AI Processing (Process unprocessed sessions)
```bash
node scripts/manual-triggers.js process
```
### Run Both Schedulers
```bash
node scripts/manual-triggers.js both
```
## Troubleshooting
### No Sessions Being Processed?
1. **Check if sessions have transcripts:**
```bash
node scripts/manual-triggers.js status
```
2. **If "Sessions with transcript" is 0:**
- Sessions exist but transcripts haven't been fetched yet
- Run session refresh: `node scripts/manual-triggers.js refresh`
3. **If "Ready for processing" is 0 but "Sessions with transcript" > 0:**
- All sessions with transcripts have already been processed
- Check if `OPENAI_API_KEY` is set in environment
### Common Issues
#### "No sessions found requiring processing"
- All sessions with transcripts have been processed (`processed = true`)
- Or no sessions have transcript content yet
#### "OPENAI_API_KEY environment variable is not set"
- Add OpenAI API key to `.env.development` file
- Restart the application
#### "Error fetching transcript: Unauthorized"
- CSV credentials are incorrect or expired
- Check company CSV username/password in database
## Database Field Mapping
### Before AI Processing
```javascript
{
id: "session-uuid",
transcriptContent: "full conversation text" | null,
processed: null,
sentimentCategory: null,
questions: null,
summary: null,
// ... other fields
}
```
### After AI Processing
```javascript
{
id: "session-uuid",
transcriptContent: "full conversation text",
processed: true,
sentimentCategory: "positive" | "neutral" | "negative",
questions: '["question 1", "question 2"]', // JSON string
summary: "Brief conversation summary",
language: "en", // ISO 639-1 code
messagesSent: 5,
sentiment: 0.8, // Float value (-1 to 1)
escalated: false,
forwardedHr: false,
category: "Schedule & Hours",
// ... other fields
}
```
## Scheduler Configuration
### Session Refresh Scheduler
- **File**: `lib/scheduler.js`
- **Frequency**: Every 30 minutes
- **Cron**: `*/30 * * * *`
### Processing Scheduler
- **File**: `lib/processingScheduler.js`
- **Frequency**: Every hour
- **Cron**: `0 * * * *`
- **Batch size**: 10 sessions per run
## Environment Variables Required
```bash
# Database
DATABASE_URL="postgresql://..."
# OpenAI (for processing)
OPENAI_API_KEY="sk-..."
# NextAuth
NEXTAUTH_SECRET="..."
NEXTAUTH_URL="http://localhost:3000"
```
## Next Steps for Testing
1. **Trigger session refresh** to fetch transcripts:
```bash
node scripts/manual-triggers.js refresh
```
2. **Check status** to see if transcripts were fetched:
```bash
node scripts/manual-triggers.js status
```
3. **Trigger processing** if transcripts are available:
```bash
node scripts/manual-triggers.js process
```
4. **View results** in the dashboard session details pages