- Reduce cognitive complexity in lib/api/handler.ts (23 → 15) - Reduce cognitive complexity in lib/config/provider.ts (38 → 15) - Fix TypeScript any type violations in multiple files - Remove unused variable in lib/batchSchedulerOptimized.ts - Add prettier-ignore comments to documentation with intentional syntax errors - Resolve Prettier/Biome formatting conflicts with targeted ignores - Create .prettierignore for build artifacts and dependencies All linting checks now pass and build completes successfully (47/47 pages).
15 KiB
Batch Processing Monitoring Dashboard
This document describes the batch processing monitoring dashboard and API endpoints for tracking OpenAI Batch API operations in the LiveDash application.
Overview
The Batch Monitoring Dashboard provides real-time visibility into the OpenAI Batch API processing pipeline, including job status tracking, cost analysis, and performance monitoring. This system enables 50% cost reduction on AI processing while maintaining comprehensive oversight.
Features
Real-time Monitoring
- Job Status Tracking: Monitor batch jobs from creation to completion
- Queue Management: View pending, running, and completed batch queues
- Processing Metrics: Track throughput, success rates, and error patterns
- Cost Analysis: Monitor API costs and savings compared to individual requests
Performance Analytics
- Batch Efficiency: Analyze batch size optimization and processing times
- Success Rates: Track completion and failure rates across different job types
- Resource Utilization: Monitor API quota usage and rate limiting
- Historical Trends: View processing patterns over time
Administrative Controls
- Manual Intervention: Pause, resume, or cancel batch operations
- Priority Management: Adjust processing priorities for urgent requests
- Error Handling: Review and retry failed batch operations
- Configuration Management: Adjust batch parameters and thresholds
API Endpoints
Batch Monitoring API
Retrieve comprehensive batch processing metrics and status information.
GET /api/admin/batch-monitoring
Query Parameters
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
timeRange |
string | Time range for metrics | 24h |
?timeRange=7d |
status |
string | Filter by batch status | - | ?status=completed |
jobType |
string | Filter by job type | - | ?jobType=ai_analysis |
includeDetails |
boolean | Include detailed job information | false |
?includeDetails=true |
page |
number | Page number for pagination | 1 | ?page=2 |
limit |
number | Records per page (max 100) | 50 | ?limit=25 |
Example Request
const response = await fetch(
"/api/admin/batch-monitoring?" +
new URLSearchParams({
timeRange: "24h",
status: "completed",
includeDetails: "true",
})
);
const data = await response.json();
Response Format
{
"success": true,
"data": {
"summary": {
"totalJobs": 156,
"completedJobs": 142,
"failedJobs": 8,
"pendingJobs": 6,
"totalRequests": 15600,
"processedRequests": 14200,
"costSavings": {
"currentPeriod": 234.56,
"projectedMonthly": 7038.45,
"savingsPercentage": 48.2
},
"averageProcessingTime": 1800000,
"successRate": 95.2
},
"queues": {
"pending": 12,
"processing": 3,
"completed": 142,
"failed": 8
},
"performance": {
"throughput": {
"requestsPerHour": 650,
"jobsPerHour": 6.5,
"averageBatchSize": 100
},
"efficiency": {
"batchUtilization": 87.3,
"processingEfficiency": 92.1,
"errorRate": 4.8
}
},
"jobs": [
{
"id": "batch-job-123",
"batchId": "batch_abc123",
"status": "completed",
"jobType": "ai_analysis",
"requestCount": 100,
"completedCount": 98,
"failedCount": 2,
"createdAt": "2024-01-01T10:00:00Z",
"startedAt": "2024-01-01T10:05:00Z",
"completedAt": "2024-01-01T10:35:00Z",
"processingTimeMs": 1800000,
"costEstimate": 12.5,
"errorSummary": [
{
"error": "token_limit_exceeded",
"count": 2,
"percentage": 2.0
}
]
}
]
}
}
Dashboard Components
BatchMonitoringDashboard Component
The main dashboard component (components/admin/BatchMonitoringDashboard.tsx) provides:
Key Metrics Cards
// Real-time overview cards
<>
<MetricCard
title="Total Jobs"
value={data.summary.totalJobs}
change={"+12 from yesterday"}
trend="up"
/>
<MetricCard
title="Success Rate"
value={`${data.summary.successRate}%`}
change={"+2.1% from last week"}
trend="up"
/>
<MetricCard
title="Cost Savings"
value={`$${data.summary.costSavings.currentPeriod}`}
change={`${data.summary.costSavings.savingsPercentage}% vs individual API`}
trend="up"
/>
</>
Queue Status Visualization
// Visual representation of batch job queues
<QueueStatusChart
pending={data.queues.pending}
processing={data.queues.processing}
completed={data.queues.completed}
failed={data.queues.failed}
/>
Performance Charts
// Processing throughput over time
<ThroughputChart
data={data.performance.throughput}
timeRange={timeRange}
/>
// Cost savings trend
<CostSavingsChart
savings={data.summary.costSavings}
historical={data.historical}
/>
Job Management Table
// Detailed job listing with actions
<BatchJobTable
jobs={data.jobs}
onRetry={handleRetryJob}
onCancel={handleCancelJob}
onViewDetails={handleViewDetails}
/>
Usage Examples
Monitor Batch Performance
async function monitorBatchPerformance() {
const response = await fetch("/api/admin/batch-monitoring?timeRange=24h");
const data = await response.json();
const performance = data.data.performance;
// Check if performance is within acceptable ranges
if (performance.efficiency.errorRate > 10) {
console.warn("High error rate detected:", performance.efficiency.errorRate + "%");
// Get failed jobs for analysis
const failedJobs = await fetch("/api/admin/batch-monitoring?status=failed");
const failures = await failedJobs.json();
// Analyze common failure patterns
const errorSummary = failures.data.jobs.reduce((acc, job) => {
job.errorSummary?.forEach((error) => {
acc[error.error] = (acc[error.error] || 0) + error.count;
});
return acc;
}, {});
console.log("Error patterns:", errorSummary);
}
}
Cost Savings Analysis
async function analyzeCostSavings() {
const response = await fetch("/api/admin/batch-monitoring?timeRange=30d&includeDetails=true");
const data = await response.json();
const savings = data.data.summary.costSavings;
return {
currentSavings: savings.currentPeriod,
projectedAnnual: savings.projectedMonthly * 12,
savingsRate: savings.savingsPercentage,
totalProcessed: data.data.summary.processedRequests,
averageCostPerRequest: savings.currentPeriod / data.data.summary.processedRequests,
};
}
Retry Failed Jobs
async function retryFailedJobs() {
// Get failed jobs
const response = await fetch("/api/admin/batch-monitoring?status=failed");
const data = await response.json();
const retryableJobs = data.data.jobs.filter((job) => {
// Only retry jobs that failed due to temporary issues
const hasRetryableErrors = job.errorSummary?.some((error) =>
["rate_limit_exceeded", "temporary_error", "timeout"].includes(error.error)
);
return hasRetryableErrors;
});
// Retry jobs individually
for (const job of retryableJobs) {
try {
await fetch(`/api/admin/batch-monitoring/${job.id}/retry`, {
method: "POST",
});
console.log(`Retried job ${job.id}`);
} catch (error) {
console.error(`Failed to retry job ${job.id}:`, error);
}
}
}
Real-time Dashboard Updates
function useRealtimeBatchMonitoring() {
const [data, setData] = useState(null);
const [isLoading, setIsLoading] = useState(true);
useEffect(() => {
const fetchData = async () => {
try {
const response = await fetch("/api/admin/batch-monitoring?timeRange=1h");
const result = await response.json();
setData(result.data);
} catch (error) {
console.error("Failed to fetch batch monitoring data:", error);
} finally {
setIsLoading(false);
}
};
// Initial fetch
fetchData();
// Update every 30 seconds
const interval = setInterval(fetchData, 30000);
return () => clearInterval(interval);
}, []);
return { data, isLoading };
}
Configuration
Batch Processing Settings
Configure batch processing parameters in environment variables:
# Batch Processing Configuration
BATCH_PROCESSING_ENABLED="true"
BATCH_CREATE_INTERVAL="*/5 * * * *" # Create batches every 5 minutes
BATCH_STATUS_CHECK_INTERVAL="*/2 * * * *" # Check status every 2 minutes
BATCH_RESULT_PROCESSING_INTERVAL="*/1 * * * *" # Process results every minute
# Batch Size and Limits
BATCH_MAX_REQUESTS="1000" # Maximum requests per batch
BATCH_TIMEOUT_HOURS="24" # Batch timeout in hours
BATCH_MIN_SIZE="10" # Minimum batch size
# Monitoring Configuration
BATCH_MONITORING_RETENTION_DAYS="30" # How long to keep monitoring data
BATCH_ALERT_THRESHOLD_ERROR_RATE="10" # Alert if error rate exceeds 10%
BATCH_ALERT_THRESHOLD_PROCESSING_TIME="3600" # Alert if processing takes >1 hour
Dashboard Refresh Settings
// Configure dashboard update intervals
const DASHBOARD_CONFIG = {
refreshInterval: 30000, // 30 seconds
alertRefreshInterval: 10000, // 10 seconds for alerts
detailRefreshInterval: 60000, // 1 minute for detailed views
maxRetries: 3, // Maximum retry attempts
retryDelay: 5000, // Delay between retries
};
Alerts and Notifications
Automated Alerts
The system automatically generates alerts for:
const alertConditions = {
highErrorRate: {
threshold: 10, // Error rate > 10%
severity: "high",
notification: "immediate",
},
longProcessingTime: {
threshold: 3600000, // > 1 hour
severity: "medium",
notification: "hourly",
},
lowThroughput: {
threshold: 0.5, // < 0.5 jobs per hour
severity: "medium",
notification: "daily",
},
batchFailure: {
threshold: 1, // Any complete batch failure
severity: "critical",
notification: "immediate",
},
};
Custom Alert Configuration
// Configure custom alerts through the admin interface
async function configureAlerts(alertConfig) {
const response = await fetch("/api/admin/batch-monitoring/alerts", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
errorRateThreshold: alertConfig.errorRate,
processingTimeThreshold: alertConfig.processingTime,
notificationChannels: alertConfig.channels,
alertSuppression: alertConfig.suppression,
}),
});
return response.json();
}
Troubleshooting
Common Issues
High Error Rates
// Investigate high error rates
async function investigateErrors() {
const response = await fetch("/api/admin/batch-monitoring?status=failed&includeDetails=true");
const data = await response.json();
// Group errors by type
const errorAnalysis = data.data.jobs.reduce((acc, job) => {
job.errorSummary?.forEach((error) => {
if (!acc[error.error]) {
acc[error.error] = { count: 0, jobs: [] };
}
acc[error.error].count += error.count;
acc[error.error].jobs.push(job.id);
});
return acc;
}, {});
console.log("Error analysis:", errorAnalysis);
return errorAnalysis;
}
Slow Processing
// Analyze processing bottlenecks
async function analyzePerformance() {
const response = await fetch("/api/admin/batch-monitoring?timeRange=24h&includeDetails=true");
const data = await response.json();
const slowJobs = data.data.jobs
.filter((job) => job.processingTimeMs > 3600000) // > 1 hour
.sort((a, b) => b.processingTimeMs - a.processingTimeMs);
console.log("Slowest jobs:", slowJobs.slice(0, 5));
// Analyze patterns
const avgByType = slowJobs.reduce((acc, job) => {
if (!acc[job.jobType]) {
acc[job.jobType] = { total: 0, count: 0 };
}
acc[job.jobType].total += job.processingTimeMs;
acc[job.jobType].count++;
return acc;
}, {});
Object.keys(avgByType).forEach((type) => {
avgByType[type].average = avgByType[type].total / avgByType[type].count;
});
return avgByType;
}
Performance Optimization
Batch Size Optimization
// Analyze optimal batch sizes
async function optimizeBatchSizes() {
const response = await fetch("/api/admin/batch-monitoring?timeRange=7d&includeDetails=true");
const data = await response.json();
// Group by batch size ranges
const sizePerformance = data.data.jobs.reduce((acc, job) => {
const sizeRange = Math.floor(job.requestCount / 50) * 50; // Group by 50s
if (!acc[sizeRange]) {
acc[sizeRange] = {
jobs: 0,
totalTime: 0,
totalRequests: 0,
successRate: 0,
};
}
acc[sizeRange].jobs++;
acc[sizeRange].totalTime += job.processingTimeMs;
acc[sizeRange].totalRequests += job.requestCount;
acc[sizeRange].successRate += job.completedCount / job.requestCount;
return acc;
}, {});
// Calculate averages
Object.keys(sizePerformance).forEach((range) => {
const perf = sizePerformance[range];
perf.avgTimePerRequest = perf.totalTime / perf.totalRequests;
perf.avgSuccessRate = perf.successRate / perf.jobs;
});
return sizePerformance;
}
Integration with Existing Systems
Security Audit Integration
All batch monitoring activities are logged through the security audit system:
// Automatic audit logging for monitoring activities
await securityAuditLogger.logPlatformAdmin(
"batch_monitoring_access",
AuditOutcome.SUCCESS,
context,
"Admin accessed batch monitoring dashboard"
);
Rate Limiting Integration
Monitoring API endpoints use the existing rate limiting system:
// Protected by admin rate limiting
const rateLimitResult = await rateLimiter.check(
`admin-batch-monitoring:${userId}`,
60, // 60 requests
60 * 1000 // per minute
);
Related Documentation
- Batch Processing Optimizations
- Security Monitoring
- Admin Audit Logs API
- OpenAI Batch API Integration
The batch monitoring dashboard provides comprehensive visibility into the AI processing pipeline, enabling administrators to optimize performance, monitor costs, and ensure reliable operation of the batch processing system.