mirror of https://github.com/kjanat/livedash-node.git synced 2026-01-16 18:32:10 +01:00

Files

Kaj Kowalski 1eea2cc3e4 refactor: fix biome linting issues and update project documentation

- Fix 36+ biome linting issues reducing errors/warnings from 227 to 191
- Replace explicit 'any' types with proper TypeScript interfaces
- Fix React hooks dependencies and useCallback patterns
- Resolve unused variables and parameter assignment issues
- Improve accessibility with proper label associations
- Add comprehensive API documentation for admin and security features
- Update README.md with accurate PostgreSQL setup and current tech stack
- Create complete documentation for audit logging, CSP monitoring, and batch processing
- Fix outdated project information and missing developer workflows

2025-07-12 00:28:09 +02:00

14 KiB

Raw Blame History

Batch Processing Monitoring Dashboard

This document describes the batch processing monitoring dashboard and API endpoints for tracking OpenAI Batch API operations in the LiveDash application.

Overview

The Batch Monitoring Dashboard provides real-time visibility into the OpenAI Batch API processing pipeline, including job status tracking, cost analysis, and performance monitoring. This system enables 50% cost reduction on AI processing while maintaining comprehensive oversight.

Features

Real-time Monitoring

Job Status Tracking: Monitor batch jobs from creation to completion
Queue Management: View pending, running, and completed batch queues
Processing Metrics: Track throughput, success rates, and error patterns
Cost Analysis: Monitor API costs and savings compared to individual requests

Performance Analytics

Batch Efficiency: Analyze batch size optimization and processing times
Success Rates: Track completion and failure rates across different job types
Resource Utilization: Monitor API quota usage and rate limiting
Historical Trends: View processing patterns over time

Administrative Controls

Manual Intervention: Pause, resume, or cancel batch operations
Priority Management: Adjust processing priorities for urgent requests
Error Handling: Review and retry failed batch operations
Configuration Management: Adjust batch parameters and thresholds

API Endpoints

Batch Monitoring API

Retrieve comprehensive batch processing metrics and status information.

GET /api/admin/batch-monitoring

Query Parameters

Parameter	Type	Description	Default	Example
`timeRange`	string	Time range for metrics	`24h`	`?timeRange=7d`
`status`	string	Filter by batch status	-	`?status=completed`
`jobType`	string	Filter by job type	-	`?jobType=ai_analysis`
`includeDetails`	boolean	Include detailed job information	`false`	`?includeDetails=true`
`page`	number	Page number for pagination	1	`?page=2`
`limit`	number	Records per page (max 100)	50	`?limit=25`

Example Request

const response = await fetch('/api/admin/batch-monitoring?' + new URLSearchParams({
  timeRange: '24h',
  status: 'completed',
  includeDetails: 'true'
}));

const data = await response.json();

Response Format

{
  "success": true,
  "data": {
    "summary": {
      "totalJobs": 156,
      "completedJobs": 142,
      "failedJobs": 8,
      "pendingJobs": 6,
      "totalRequests": 15600,
      "processedRequests": 14200,
      "costSavings": {
        "currentPeriod": 234.56,
        "projectedMonthly": 7038.45,
        "savingsPercentage": 48.2
      },
      "averageProcessingTime": 1800000,
      "successRate": 95.2
    },
    "queues": {
      "pending": 12,
      "processing": 3,
      "completed": 142,
      "failed": 8
    },
    "performance": {
      "throughput": {
        "requestsPerHour": 650,
        "jobsPerHour": 6.5,
        "averageBatchSize": 100
      },
      "efficiency": {
        "batchUtilization": 87.3,
        "processingEfficiency": 92.1,
        "errorRate": 4.8
      }
    },
    "jobs": [
      {
        "id": "batch-job-123",
        "batchId": "batch_abc123",
        "status": "completed",
        "jobType": "ai_analysis",
        "requestCount": 100,
        "completedCount": 98,
        "failedCount": 2,
        "createdAt": "2024-01-01T10:00:00Z",
        "startedAt": "2024-01-01T10:05:00Z",
        "completedAt": "2024-01-01T10:35:00Z",
        "processingTimeMs": 1800000,
        "costEstimate": 12.50,
        "errorSummary": [
          {
            "error": "token_limit_exceeded",
            "count": 2,
            "percentage": 2.0
          }
        ]
      }
    ]
  }
}

Dashboard Components

BatchMonitoringDashboard Component

The main dashboard component (components/admin/BatchMonitoringDashboard.tsx) provides:

Key Metrics Cards

// Real-time overview cards
<MetricCard
  title="Total Jobs"
  value={data.summary.totalJobs}
  change={"+12 from yesterday"}
  trend="up"
/>

<MetricCard
  title="Success Rate"
  value={`${data.summary.successRate}%`}
  change={"+2.1% from last week"}
  trend="up"
/>

<MetricCard
  title="Cost Savings"
  value={`$${data.summary.costSavings.currentPeriod}`}
  change={`${data.summary.costSavings.savingsPercentage}% vs individual API`}
  trend="up"
/>

Queue Status Visualization

// Visual representation of batch job queues
<QueueStatusChart
  pending={data.queues.pending}
  processing={data.queues.processing}
  completed={data.queues.completed}
  failed={data.queues.failed}
/>

Performance Charts

// Processing throughput over time
<ThroughputChart
  data={data.performance.throughput}
  timeRange={timeRange}
/>

// Cost savings trend
<CostSavingsChart
  savings={data.summary.costSavings}
  historical={data.historical}
/>

Job Management Table

// Detailed job listing with actions
<BatchJobTable
  jobs={data.jobs}
  onRetry={handleRetryJob}
  onCancel={handleCancelJob}
  onViewDetails={handleViewDetails}
/>

Usage Examples

Monitor Batch Performance

async function monitorBatchPerformance() {
  const response = await fetch('/api/admin/batch-monitoring?timeRange=24h');
  const data = await response.json();
  
  const performance = data.data.performance;
  
  // Check if performance is within acceptable ranges
  if (performance.efficiency.errorRate > 10) {
    console.warn('High error rate detected:', performance.efficiency.errorRate + '%');
    
    // Get failed jobs for analysis
    const failedJobs = await fetch('/api/admin/batch-monitoring?status=failed');
    const failures = await failedJobs.json();
    
    // Analyze common failure patterns
    const errorSummary = failures.data.jobs.reduce((acc, job) => {
      job.errorSummary?.forEach(error => {
        acc[error.error] = (acc[error.error] || 0) + error.count;
      });
      return acc;
    }, {});
    
    console.log('Error patterns:', errorSummary);
  }
}

Cost Savings Analysis

async function analyzeCostSavings() {
  const response = await fetch('/api/admin/batch-monitoring?timeRange=30d&includeDetails=true');
  const data = await response.json();
  
  const savings = data.data.summary.costSavings;
  
  return {
    currentSavings: savings.currentPeriod,
    projectedAnnual: savings.projectedMonthly * 12,
    savingsRate: savings.savingsPercentage,
    totalProcessed: data.data.summary.processedRequests,
    averageCostPerRequest: savings.currentPeriod / data.data.summary.processedRequests
  };
}

Retry Failed Jobs

async function retryFailedJobs() {
  // Get failed jobs
  const response = await fetch('/api/admin/batch-monitoring?status=failed');
  const data = await response.json();
  
  const retryableJobs = data.data.jobs.filter(job => {
    // Only retry jobs that failed due to temporary issues
    const hasRetryableErrors = job.errorSummary?.some(error => 
      ['rate_limit_exceeded', 'temporary_error', 'timeout'].includes(error.error)
    );
    return hasRetryableErrors;
  });
  
  // Retry jobs individually
  for (const job of retryableJobs) {
    try {
      await fetch(`/api/admin/batch-monitoring/${job.id}/retry`, {
        method: 'POST'
      });
      console.log(`Retried job ${job.id}`);
    } catch (error) {
      console.error(`Failed to retry job ${job.id}:`, error);
    }
  }
}

Real-time Dashboard Updates

function useRealtimeBatchMonitoring() {
  const [data, setData] = useState(null);
  const [isLoading, setIsLoading] = useState(true);
  
  useEffect(() => {
    const fetchData = async () => {
      try {
        const response = await fetch('/api/admin/batch-monitoring?timeRange=1h');
        const result = await response.json();
        setData(result.data);
      } catch (error) {
        console.error('Failed to fetch batch monitoring data:', error);
      } finally {
        setIsLoading(false);
      }
    };
    
    // Initial fetch
    fetchData();
    
    // Update every 30 seconds
    const interval = setInterval(fetchData, 30000);
    
    return () => clearInterval(interval);
  }, []);
  
  return { data, isLoading };
}

Configuration

Batch Processing Settings

Configure batch processing parameters in environment variables:

# Batch Processing Configuration
BATCH_PROCESSING_ENABLED="true"
BATCH_CREATE_INTERVAL="*/5 * * * *"          # Create batches every 5 minutes
BATCH_STATUS_CHECK_INTERVAL="*/2 * * * *"    # Check status every 2 minutes
BATCH_RESULT_PROCESSING_INTERVAL="*/1 * * * *" # Process results every minute

# Batch Size and Limits
BATCH_MAX_REQUESTS="1000"                     # Maximum requests per batch
BATCH_TIMEOUT_HOURS="24"                      # Batch timeout in hours
BATCH_MIN_SIZE="10"                           # Minimum batch size

# Monitoring Configuration
BATCH_MONITORING_RETENTION_DAYS="30"          # How long to keep monitoring data
BATCH_ALERT_THRESHOLD_ERROR_RATE="10"         # Alert if error rate exceeds 10%
BATCH_ALERT_THRESHOLD_PROCESSING_TIME="3600"  # Alert if processing takes >1 hour

Dashboard Refresh Settings

// Configure dashboard update intervals
const DASHBOARD_CONFIG = {
  refreshInterval: 30000,        // 30 seconds
  alertRefreshInterval: 10000,   // 10 seconds for alerts
  detailRefreshInterval: 60000,  // 1 minute for detailed views
  maxRetries: 3,                 // Maximum retry attempts
  retryDelay: 5000              // Delay between retries
};

Alerts and Notifications

Automated Alerts

The system automatically generates alerts for:

const alertConditions = {
  highErrorRate: {
    threshold: 10, // Error rate > 10%
    severity: 'high',
    notification: 'immediate'
  },
  longProcessingTime: {
    threshold: 3600000, // > 1 hour
    severity: 'medium',
    notification: 'hourly'
  },
  lowThroughput: {
    threshold: 0.5, // < 0.5 jobs per hour
    severity: 'medium',
    notification: 'daily'
  },
  batchFailure: {
    threshold: 1, // Any complete batch failure
    severity: 'critical',
    notification: 'immediate'
  }
};

Custom Alert Configuration

// Configure custom alerts through the admin interface
async function configureAlerts(alertConfig) {
  const response = await fetch('/api/admin/batch-monitoring/alerts', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      errorRateThreshold: alertConfig.errorRate,
      processingTimeThreshold: alertConfig.processingTime,
      notificationChannels: alertConfig.channels,
      alertSuppression: alertConfig.suppression
    })
  });
  
  return response.json();
}

Troubleshooting

Common Issues

High Error Rates

// Investigate high error rates
async function investigateErrors() {
  const response = await fetch('/api/admin/batch-monitoring?status=failed&includeDetails=true');
  const data = await response.json();
  
  // Group errors by type
  const errorAnalysis = data.data.jobs.reduce((acc, job) => {
    job.errorSummary?.forEach(error => {
      if (!acc[error.error]) {
        acc[error.error] = { count: 0, jobs: [] };
      }
      acc[error.error].count += error.count;
      acc[error.error].jobs.push(job.id);
    });
    return acc;
  }, {});
  
  console.log('Error analysis:', errorAnalysis);
  return errorAnalysis;
}

Slow Processing

// Analyze processing bottlenecks
async function analyzePerformance() {
  const response = await fetch('/api/admin/batch-monitoring?timeRange=24h&includeDetails=true');
  const data = await response.json();
  
  const slowJobs = data.data.jobs
    .filter(job => job.processingTimeMs > 3600000) // > 1 hour
    .sort((a, b) => b.processingTimeMs - a.processingTimeMs);
  
  console.log('Slowest jobs:', slowJobs.slice(0, 5));
  
  // Analyze patterns
  const avgByType = slowJobs.reduce((acc, job) => {
    if (!acc[job.jobType]) {
      acc[job.jobType] = { total: 0, count: 0 };
    }
    acc[job.jobType].total += job.processingTimeMs;
    acc[job.jobType].count++;
    return acc;
  }, {});
  
  Object.keys(avgByType).forEach(type => {
    avgByType[type].average = avgByType[type].total / avgByType[type].count;
  });
  
  return avgByType;
}

Performance Optimization

Batch Size Optimization

// Analyze optimal batch sizes
async function optimizeBatchSizes() {
  const response = await fetch('/api/admin/batch-monitoring?timeRange=7d&includeDetails=true');
  const data = await response.json();
  
  // Group by batch size ranges
  const sizePerformance = data.data.jobs.reduce((acc, job) => {
    const sizeRange = Math.floor(job.requestCount / 50) * 50; // Group by 50s
    if (!acc[sizeRange]) {
      acc[sizeRange] = {
        jobs: 0,
        totalTime: 0,
        totalRequests: 0,
        successRate: 0
      };
    }
    
    acc[sizeRange].jobs++;
    acc[sizeRange].totalTime += job.processingTimeMs;
    acc[sizeRange].totalRequests += job.requestCount;
    acc[sizeRange].successRate += job.completedCount / job.requestCount;
    
    return acc;
  }, {});
  
  // Calculate averages
  Object.keys(sizePerformance).forEach(range => {
    const perf = sizePerformance[range];
    perf.avgTimePerRequest = perf.totalTime / perf.totalRequests;
    perf.avgSuccessRate = perf.successRate / perf.jobs;
  });
  
  return sizePerformance;
}

Integration with Existing Systems

Security Audit Integration

All batch monitoring activities are logged through the security audit system:

// Automatic audit logging for monitoring activities
await securityAuditLogger.logPlatformAdmin(
  'batch_monitoring_access',
  AuditOutcome.SUCCESS,
  context,
  'Admin accessed batch monitoring dashboard'
);

Rate Limiting Integration

Monitoring API endpoints use the existing rate limiting system:

// Protected by admin rate limiting
const rateLimitResult = await rateLimiter.check(
  `admin-batch-monitoring:${userId}`,
  60,  // 60 requests
  60 * 1000  // per minute
);

The batch monitoring dashboard provides comprehensive visibility into the AI processing pipeline, enabling administrators to optimize performance, monitor costs, and ensure reliable operation of the batch processing system.

14 KiB Raw Blame History