mirror of
https://github.com/kjanat/livedash-node.git
synced 2026-01-16 22:12:08 +01:00
refactor: fix biome linting issues and update project documentation
- Fix 36+ biome linting issues reducing errors/warnings from 227 to 191 - Replace explicit 'any' types with proper TypeScript interfaces - Fix React hooks dependencies and useCallback patterns - Resolve unused variables and parameter assignment issues - Improve accessibility with proper label associations - Add comprehensive API documentation for admin and security features - Update README.md with accurate PostgreSQL setup and current tech stack - Create complete documentation for audit logging, CSP monitoring, and batch processing - Fix outdated project information and missing developer workflows
This commit is contained in:
213
docs/batch-processing-optimizations.md
Normal file
213
docs/batch-processing-optimizations.md
Normal file
@ -0,0 +1,213 @@
|
||||
# Batch Processing Database Query Optimizations
|
||||
|
||||
This document outlines the database query optimizations implemented to improve the performance of the OpenAI Batch API processing pipeline.
|
||||
|
||||
## Overview
|
||||
|
||||
The batch processing system was optimized to reduce database load and improve response times through several key strategies:
|
||||
|
||||
1. **Database Index Optimization**
|
||||
2. **Query Pattern Improvements**
|
||||
3. **Company Caching**
|
||||
4. **Batch Operations**
|
||||
5. **Integration Layer with Fallback**
|
||||
|
||||
## Database Index Improvements
|
||||
|
||||
### New Indexes Added
|
||||
|
||||
The following composite indexes were added to the `AIProcessingRequest` table in the Prisma schema:
|
||||
|
||||
```sql
|
||||
-- Optimize time-based status queries
|
||||
@@index([processingStatus, requestedAt])
|
||||
|
||||
-- Optimize batch-related queries
|
||||
@@index([batchId])
|
||||
|
||||
-- Composite index for batch status filtering
|
||||
@@index([processingStatus, batchId])
|
||||
```
|
||||
|
||||
### Query Performance Impact
|
||||
|
||||
These indexes specifically optimize:
|
||||
- Finding pending requests by status and creation time
|
||||
- Batch-related lookups by batch ID
|
||||
- Combined status and batch filtering operations
|
||||
|
||||
## Query Optimization Strategies
|
||||
|
||||
### 1. Selective Data Fetching
|
||||
|
||||
**Before:**
|
||||
```typescript
|
||||
// Loaded full session with all messages
|
||||
include: {
|
||||
session: {
|
||||
include: {
|
||||
messages: {
|
||||
orderBy: { order: "asc" },
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```typescript
|
||||
// Only essential data with message count
|
||||
include: {
|
||||
session: {
|
||||
select: {
|
||||
id: true,
|
||||
companyId: true,
|
||||
_count: { select: { messages: true } }
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Company Caching
|
||||
|
||||
Implemented a 5-minute TTL cache for active companies to eliminate redundant database lookups:
|
||||
|
||||
```typescript
|
||||
class CompanyCache {
|
||||
private readonly CACHE_TTL = 5 * 60 * 1000; // 5 minutes
|
||||
|
||||
async getActiveCompanies(): Promise<CachedCompany[]> {
|
||||
// Returns cached data if available and fresh
|
||||
// Otherwise refreshes from database
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Batch Operations
|
||||
|
||||
**Before:** N+1 queries for each company
|
||||
```typescript
|
||||
// Sequential processing per company
|
||||
for (const company of companies) {
|
||||
const requests = await getPendingRequests(company.id);
|
||||
// Process each company separately
|
||||
}
|
||||
```
|
||||
|
||||
**After:** Single query for all companies
|
||||
```typescript
|
||||
// Batch query for all companies at once
|
||||
const allRequests = await prisma.aIProcessingRequest.findMany({
|
||||
where: {
|
||||
session: {
|
||||
companyId: { in: companies.map(c => c.id) },
|
||||
},
|
||||
processingStatus: AIRequestStatus.PENDING_BATCHING,
|
||||
},
|
||||
});
|
||||
|
||||
// Group results by company in memory
|
||||
const requestsByCompany = groupByCompany(allRequests);
|
||||
```
|
||||
|
||||
## Performance Improvements
|
||||
|
||||
### Query Count Reduction
|
||||
|
||||
- **Company lookups:** Reduced from 4 separate queries per scheduler run to 1 cached lookup
|
||||
- **Pending requests:** Reduced from N queries (one per company) to 1 batch query
|
||||
- **Status checks:** Reduced from N queries to 1 batch query
|
||||
- **Failed requests:** Reduced from N queries to 1 batch query
|
||||
|
||||
### Parallel Processing
|
||||
|
||||
Added configurable parallel processing with batching:
|
||||
|
||||
```typescript
|
||||
const SCHEDULER_CONFIG = {
|
||||
MAX_CONCURRENT_COMPANIES: 5,
|
||||
USE_BATCH_OPERATIONS: true,
|
||||
PARALLEL_COMPANY_PROCESSING: true,
|
||||
};
|
||||
```
|
||||
|
||||
### Memory Optimization
|
||||
|
||||
- Eliminated loading unnecessary message content
|
||||
- Used `select` instead of `include` where possible
|
||||
- Implemented automatic cache cleanup
|
||||
|
||||
## Integration Layer
|
||||
|
||||
Created a unified interface that can switch between original and optimized implementations:
|
||||
|
||||
### Environment Configuration
|
||||
|
||||
```bash
|
||||
# Enable optimizations (default: true)
|
||||
ENABLE_BATCH_OPTIMIZATION=true
|
||||
ENABLE_BATCH_OPERATIONS=true
|
||||
ENABLE_PARALLEL_PROCESSING=true
|
||||
|
||||
# Fallback behavior
|
||||
FALLBACK_ON_ERRORS=true
|
||||
```
|
||||
|
||||
### Performance Tracking
|
||||
|
||||
The integration layer automatically tracks performance metrics and can fall back to the original implementation if optimizations fail:
|
||||
|
||||
```typescript
|
||||
class PerformanceTracker {
|
||||
shouldUseOptimized(): boolean {
|
||||
// Uses optimized if faster and success rate > 90%
|
||||
return optimizedAvg < originalAvg && optimizedSuccess > 0.9;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Files Modified
|
||||
|
||||
### New Files
|
||||
- `lib/batchProcessorOptimized.ts` - Optimized query implementations
|
||||
- `lib/batchSchedulerOptimized.ts` - Optimized scheduler
|
||||
- `lib/batchProcessorIntegration.ts` - Integration layer with fallback
|
||||
|
||||
### Modified Files
|
||||
- `prisma/schema.prisma` - Added composite indexes
|
||||
- `server.ts` - Updated to use integration layer
|
||||
- `app/api/admin/batch-monitoring/route.ts` - Updated import
|
||||
|
||||
## Monitoring
|
||||
|
||||
The optimizations include comprehensive logging and monitoring:
|
||||
|
||||
- Performance metrics for each operation type
|
||||
- Cache hit/miss statistics
|
||||
- Fallback events tracking
|
||||
- Query execution time monitoring
|
||||
|
||||
## Rollback Strategy
|
||||
|
||||
The integration layer allows for easy rollback:
|
||||
|
||||
1. Set `ENABLE_BATCH_OPTIMIZATION=false`
|
||||
2. System automatically uses original implementation
|
||||
3. No database schema changes needed for rollback
|
||||
4. Indexes remain beneficial for manual queries
|
||||
|
||||
## Expected Performance Gains
|
||||
|
||||
- **Database Query Count:** 60-80% reduction in scheduler operations
|
||||
- **Memory Usage:** 40-60% reduction from selective data loading
|
||||
- **Response Time:** 30-50% improvement for batch operations
|
||||
- **Cache Hit Rate:** 95%+ for company lookups after warmup
|
||||
|
||||
## Testing
|
||||
|
||||
Performance improvements can be validated by:
|
||||
|
||||
1. Monitoring the batch monitoring dashboard
|
||||
2. Checking performance metrics in logs
|
||||
3. Comparing execution times before/after optimization
|
||||
4. Load testing with multiple companies and large batches
|
||||
Reference in New Issue
Block a user