From b946bdc8033d20cf5a1ce921fe829b34cf96be98 Mon Sep 17 00:00:00 2001 From: Kaj Kowalski Date: Sun, 13 Jul 2025 17:11:11 +0200 Subject: [PATCH] style: formatted the docs with prettier --- docs/csp-metrics-api.md | 98 ++++----- docs/dashboard-components.md | 48 ++-- docs/database-connection-pooling.md | 77 +++---- docs/database-performance-optimizations.md | 94 ++++---- docs/neon-database-optimization.md | 52 ++--- docs/postgresql-migration.md | 50 ++--- docs/scheduler-architecture.md | 124 ++++++----- docs/scheduler-fixes.md | 34 +-- docs/security-audit-logging.md | 136 ++++++------ docs/security-headers.md | 124 +++++------ docs/security-monitoring.md | 243 +++++++++++---------- docs/session-processing.md | 44 ++-- 12 files changed, 571 insertions(+), 553 deletions(-) diff --git a/docs/csp-metrics-api.md b/docs/csp-metrics-api.md index 5816fad..8eb521d 100644 --- a/docs/csp-metrics-api.md +++ b/docs/csp-metrics-api.md @@ -24,7 +24,7 @@ POST /api/csp-report #### Request Headers - - `Content-Type`: `application/csp-report` or `application/json` +- `Content-Type`: `application/csp-report` or `application/json` #### Request Body (Automatic from Browser) @@ -58,21 +58,21 @@ GET /api/csp-metrics #### Query Parameters -| Parameter | Type | Description | Default | Example | -| ---------------- | ------- | ------------------------- | ------- | ---------------------- | -| `timeRange` | string | Time range for metrics | `24h` | `?timeRange=7d` | -| `format` | string | Response format | `json` | `?format=csv` | -| `groupBy` | string | Group results by field | `hour` | `?groupBy=directive` | -| `includeDetails` | boolean | Include violation details | `false` | `?includeDetails=true` | -| `offset` | string | Shift the queried time-window backwards by the given duration (for comparisons) | `0` | `?offset=24h` | +| Parameter | Type | Description | Default | Example | +| ---------------- | ------- | ------------------------------------------------------------------------------- | ------- | ---------------------- | +| `timeRange` | string | Time range for metrics | `24h` | `?timeRange=7d` | +| `format` | string | Response format | `json` | `?format=csv` | +| `groupBy` | string | Group results by field | `hour` | `?groupBy=directive` | +| `includeDetails` | boolean | Include violation details | `false` | `?includeDetails=true` | +| `offset` | string | Shift the queried time-window backwards by the given duration (for comparisons) | `0` | `?offset=24h` | #### Time Range Options - - `1h` - Last 1 hour - - `6h` - Last 6 hours - - `24h` - Last 24 hours (default) - - `7d` - Last 7 days - - `30d` - Last 30 days +- `1h` - Last 1 hour +- `6h` - Last 6 hours +- `24h` - Last 24 hours (default) +- `7d` - Last 7 days +- `30d` - Last 30 days #### Example Request @@ -166,11 +166,11 @@ console.log(result.recommendations); // array of suggestions The service automatically assesses violation risk based on: - - **Directive Type**: Script violations are higher risk than style violations - - **Source Pattern**: External domains vs inline vs data URIs - - **Bypass Indicators**: Known CSP bypass techniques - - **Frequency**: Repeated violations from same source - - **Geographic Factors**: Unusual source locations +- **Directive Type**: Script violations are higher risk than style violations +- **Source Pattern**: External domains vs inline vs data URIs +- **Bypass Indicators**: Known CSP bypass techniques +- **Frequency**: Repeated violations from same source +- **Geographic Factors**: Unusual source locations #### 3. Bypass Detection @@ -192,10 +192,10 @@ const bypassPatterns = [ Based on violation patterns, the service provides actionable recommendations: - - **Tighten Policies**: Suggest removing broad allowlists - - **Add Domains**: Recommend allowing legitimate external resources - - **Implement Nonces**: Suggest nonce-based policies for inline content - - **Upgrade Directives**: Recommend modern CSP features +- **Tighten Policies**: Suggest removing broad allowlists +- **Add Domains**: Recommend allowing legitimate external resources +- **Implement Nonces**: Suggest nonce-based policies for inline content +- **Upgrade Directives**: Recommend modern CSP features ## Violation Analysis @@ -406,22 +406,22 @@ CSP_ALERT_THRESHOLD=5 # violations per 10 minutes ### Rate Limiting - - **10 reports per minute per IP** prevents spam attacks - - **Exponential backoff** for repeated violations from same source - - **Memory cleanup** removes old violations automatically +- **10 reports per minute per IP** prevents spam attacks +- **Exponential backoff** for repeated violations from same source +- **Memory cleanup** removes old violations automatically ### Memory Management - - **Violation buffer** limited to 7 days of data in memory - - **Hard cap** of 10,000 violation entries to prevent memory exhaustion - - **Automatic cleanup** runs every 100 requests (1% probability) - - **Efficient storage** using Map data structures +- **Violation buffer** limited to 7 days of data in memory +- **Hard cap** of 10,000 violation entries to prevent memory exhaustion +- **Automatic cleanup** runs every 100 requests (1% probability) +- **Efficient storage** using Map data structures ### Database Impact - - **No persistent storage** for real-time metrics (memory only) - - **Optional logging** to database for long-term analysis - - **Indexed queries** for historical data retrieval +- **No persistent storage** for real-time metrics (memory only) +- **Optional logging** to database for long-term analysis +- **Indexed queries** for historical data retrieval ## Security Considerations @@ -429,28 +429,28 @@ CSP_ALERT_THRESHOLD=5 # violations per 10 minutes **⚠️ Data Collection Notice:** - - **IP addresses** are collected and stored in memory for security monitoring - - **User agent strings** are stored for browser compatibility analysis - - **Legal basis**: Legitimate interest for security incident detection and prevention - - **Retention**: In-memory storage only, automatically purged after 7 days or application restart - - **Data minimization**: Only violation-related metadata is retained, not page content +- **IP addresses** are collected and stored in memory for security monitoring +- **User agent strings** are stored for browser compatibility analysis +- **Legal basis**: Legitimate interest for security incident detection and prevention +- **Retention**: In-memory storage only, automatically purged after 7 days or application restart +- **Data minimization**: Only violation-related metadata is retained, not page content **Planned Privacy Enhancements:** - - IP anonymization options for GDPR compliance (roadmap) - - User agent sanitization to remove sensitive information (roadmap) +- IP anonymization options for GDPR compliance (roadmap) +- User agent sanitization to remove sensitive information (roadmap) ### Rate-Limiting Protection - - **Per-IP limits** prevent DoS attacks on reporting endpoint - - **Content-type validation** ensures proper report format - - **Request size limits** prevent memory exhaustion +- **Per-IP limits** prevent DoS attacks on reporting endpoint +- **Content-type validation** ensures proper report format +- **Request size limits** prevent memory exhaustion ### False Positive Handling - - **Learning mode** for new deployments - - **Whitelist support** for known legitimate violations - - **Risk score adjustment** based on historical patterns +- **Learning mode** for new deployments +- **Whitelist support** for known legitimate violations +- **Risk score adjustment** based on historical patterns ## Troubleshooting @@ -499,10 +499,10 @@ if (duration > 2000) { ## Related Documentation - - [Enhanced CSP Implementation](./security/enhanced-csp.md) - - [Security Monitoring](./security-monitoring.md) - - [Security Headers](./security-headers.md) - - [Rate Limiting](../lib/rateLimiter.ts) +- [Enhanced CSP Implementation](./security/enhanced-csp.md) +- [Security Monitoring](./security-monitoring.md) +- [Security Headers](./security-headers.md) +- [Rate Limiting](../lib/rateLimiter.ts) ## API Reference Summary diff --git a/docs/dashboard-components.md b/docs/dashboard-components.md index c56bf9f..0aa0337 100644 --- a/docs/dashboard-components.md +++ b/docs/dashboard-components.md @@ -12,10 +12,10 @@ The WordCloud component visualizes categories or topics based on their frequency **Features:** -- Dynamic sizing based on frequency -- Colorful display with a pleasing color palette -- Responsive design -- Interactive hover effects +- Dynamic sizing based on frequency +- Colorful display with a pleasing color palette +- Responsive design +- Interactive hover effects ### 2. GeographicMap @@ -25,10 +25,10 @@ This component displays a world map with circles representing the number of sess **Features:** -- Interactive map using React Leaflet -- Circle sizes scaled by session count -- Tooltips showing country names and session counts -- Responsive design +- Interactive map using React Leaflet +- Circle sizes scaled by session count +- Tooltips showing country names and session counts +- Responsive design ### 3. MetricCard @@ -38,10 +38,10 @@ A modern, visually appealing card for displaying key metrics. **Features:** -- Multiple design variants (default, primary, success, warning, danger) -- Support for trend indicators -- Icons and descriptions -- Clean, modern styling +- Multiple design variants (default, primary, success, warning, danger) +- Support for trend indicators +- Icons and descriptions +- Clean, modern styling ### 4. DonutChart @@ -51,10 +51,10 @@ An enhanced donut chart with better styling and a central text display capabilit **Features:** -- Customizable colors -- Center text area for displaying summaries -- Interactive tooltips with percentages -- Well-balanced legend display +- Customizable colors +- Center text area for displaying summaries +- Interactive tooltips with percentages +- Well-balanced legend display ### 5. ResponseTimeDistribution @@ -64,10 +64,10 @@ Visualizes the distribution of response times as a histogram. **Features:** -- Color-coded bars (green for fast, yellow for medium, red for slow) -- Target time indicator -- Automatic binning of response times -- Clear labeling and scales +- Color-coded bars (green for fast, yellow for medium, red for slow) +- Target time indicator +- Automatic binning of response times +- Clear labeling and scales ## Dashboard Enhancements @@ -85,7 +85,7 @@ The dashboard has been enhanced with: ## Usage Notes -- The geographic map and response time distribution use simulated data where actual data is not available -- All components are responsive and will adjust to different screen sizes -- The dashboard automatically refreshes data when using the refresh button -- Admin users have access to additional controls at the bottom of the dashboard +- The geographic map and response time distribution use simulated data where actual data is not available +- All components are responsive and will adjust to different screen sizes +- The dashboard automatically refreshes data when using the refresh button +- Admin users have access to additional controls at the bottom of the dashboard diff --git a/docs/database-connection-pooling.md b/docs/database-connection-pooling.md index 54fe105..04ffb91 100644 --- a/docs/database-connection-pooling.md +++ b/docs/database-connection-pooling.md @@ -29,42 +29,45 @@ DATABASE_URL="postgresql://user:pass@host:5432/db?connection_limit=20&pool_timeo #### Standard Pooling (Default) -- Uses Prisma's built-in connection pooling -- Simpler configuration -- Good for development and small-scale deployments +- Uses Prisma's built-in connection pooling +- Simpler configuration +- Good for development and small-scale deployments #### Enhanced Pooling (Recommended for Production) -- Uses PostgreSQL native connection pooling with `@prisma/adapter-pg` -- Advanced monitoring and health checks -- Better resource management -- Detailed connection metrics +- Uses PostgreSQL native connection pooling with `@prisma/adapter-pg` +- Advanced monitoring and health checks +- Better resource management +- Detailed connection metrics ## Implementation Details ### Fixed Issues 1. **Multiple PrismaClient Instances**: - - ❌ Before: Each scheduler created its own PrismaClient - - ✅ After: All modules use singleton pattern from `lib/prisma.ts` + +- ❌ Before: Each scheduler created its own PrismaClient +- ✅ After: All modules use singleton pattern from `lib/prisma.ts` 2. **No Connection Management**: - - ❌ Before: No graceful shutdown or connection cleanup - - ✅ After: Proper cleanup on process termination + +- ❌ Before: No graceful shutdown or connection cleanup +- ✅ After: Proper cleanup on process termination 3. **No Monitoring**: - - ❌ Before: No visibility into connection usage - - ✅ After: Health check endpoint and connection metrics + +- ❌ Before: No visibility into connection usage +- ✅ After: Health check endpoint and connection metrics ### Key Files Modified -- `lib/prisma.ts` - Enhanced singleton with pooling options -- `lib/database-pool.ts` - Advanced pooling configuration -- `lib/processingScheduler.ts` - Fixed to use singleton -- `lib/importProcessor.ts` - Fixed to use singleton -- `lib/processingStatusManager.ts` - Fixed to use singleton -- `lib/schedulers.ts` - Added graceful shutdown -- `app/api/admin/database-health/route.ts` - Monitoring endpoint +- `lib/prisma.ts` - Enhanced singleton with pooling options +- `lib/database-pool.ts` - Advanced pooling configuration +- `lib/processingScheduler.ts` - Fixed to use singleton +- `lib/importProcessor.ts` - Fixed to use singleton +- `lib/processingStatusManager.ts` - Fixed to use singleton +- `lib/schedulers.ts` - Added graceful shutdown +- `app/api/admin/database-health/route.ts` - Monitoring endpoint ## Monitoring @@ -79,36 +82,36 @@ curl -H "Authorization: Bearer your-token" \ Response includes: -- Connection status -- Pool statistics (if enhanced pooling enabled) -- Basic metrics (session counts, etc.) -- Configuration details +- Connection status +- Pool statistics (if enhanced pooling enabled) +- Basic metrics (session counts, etc.) +- Configuration details ### Connection Metrics With enhanced pooling enabled, you'll see console logs for: -- Connection acquisitions/releases -- Pool size changes -- Error events -- Health check results +- Connection acquisitions/releases +- Pool size changes +- Error events +- Health check results ## Performance Benefits ### Before Optimization -- Multiple connection pools (one per scheduler) -- Potential connection exhaustion under load -- No connection monitoring -- Resource waste from idle connections +- Multiple connection pools (one per scheduler) +- Potential connection exhaustion under load +- No connection monitoring +- Resource waste from idle connections ### After Optimization -- Single shared connection pool -- Configurable pool size and timeouts -- Connection health monitoring -- Graceful shutdown and cleanup -- Better resource utilization +- Single shared connection pool +- Configurable pool size and timeouts +- Connection health monitoring +- Graceful shutdown and cleanup +- Better resource utilization ## Recommended Settings diff --git a/docs/database-performance-optimizations.md b/docs/database-performance-optimizations.md index d0d0178..d220f21 100644 --- a/docs/database-performance-optimizations.md +++ b/docs/database-performance-optimizations.md @@ -6,10 +6,10 @@ This document outlines the comprehensive database performance optimizations impl The optimization focuses on the most frequently queried patterns in the application, particularly around: - - AI processing request tracking and batching - - Session analytics and filtering - - Security audit log analysis - - Multi-tenant data isolation performance +- AI processing request tracking and batching +- Session analytics and filtering +- Security audit log analysis +- Multi-tenant data isolation performance ## Applied Optimizations @@ -31,9 +31,9 @@ INCLUDE ("processingStatus", "batchId", "requestedAt", "sessionId"); **Impact**: - - ~70% faster batch job queries - - Reduced I/O for cost analysis reports - - Improved scheduler performance +- ~70% faster batch job queries +- Reduced I/O for cost analysis reports +- Improved scheduler performance ### 2. Session Analytics Optimizations @@ -54,9 +54,9 @@ INCLUDE ("startTime", "messagesSent"); **Impact**: - - ~85% faster dashboard load times - - Efficient date range filtering - - Optimized sentiment analysis queries +- ~85% faster dashboard load times +- Efficient date range filtering +- Optimized sentiment analysis queries ### 3. Security Audit Log Optimizations @@ -77,9 +77,9 @@ INCLUDE ("eventType", "severity", "userId", "companyId"); **Impact**: - - ~90% faster security monitoring - - Efficient threat detection - - Improved compliance reporting +- ~90% faster security monitoring +- Efficient threat detection +- Improved compliance reporting ### 4. Message Processing Optimizations @@ -95,8 +95,8 @@ INCLUDE ("content"); **Impact**: - - ~60% faster conversation loading - - Reduced memory usage for message queries +- ~60% faster conversation loading +- Reduced memory usage for message queries ### 5. Processing Pipeline Optimizations @@ -118,29 +118,29 @@ INCLUDE ("sessionId", "errorMessage", "retryCount", "startedAt"); **Impact**: - - ~75% faster processing monitoring - - Efficient error tracking - - Improved retry logic performance +- ~75% faster processing monitoring +- Efficient error tracking +- Improved retry logic performance ## Index Strategy Principles ### 1. Composite Index Design - - **Leading column**: Most selective filter (usually companyId for multi-tenancy) - - **Secondary columns**: Common WHERE clause filters - - **Covering columns**: SELECT list columns via INCLUDE +- **Leading column**: Most selective filter (usually companyId for multi-tenancy) +- **Secondary columns**: Common WHERE clause filters +- **Covering columns**: SELECT list columns via INCLUDE ### 2. Partial Indexes - - Used for error analysis and specific status filtering - - Reduces index size and maintenance overhead - - Improves write performance +- Used for error analysis and specific status filtering +- Reduces index size and maintenance overhead +- Improves write performance ### 3. Covering Indexes - - Include frequently accessed columns to avoid table lookups - - Reduces I/O for read-heavy operations - - Particularly effective for dashboard queries +- Include frequently accessed columns to avoid table lookups +- Reduces I/O for read-heavy operations +- Particularly effective for dashboard queries ## Query Pattern Analysis @@ -166,29 +166,29 @@ INCLUDE ("sessionId", "errorMessage", "retryCount", "startedAt"); ### Index Monitoring - - Monitor index usage with `pg_stat_user_indexes` - - Track bloat with `pg_stat_user_tables` - - Regular ANALYZE after bulk operations +- Monitor index usage with `pg_stat_user_indexes` +- Track bloat with `pg_stat_user_tables` +- Regular ANALYZE after bulk operations ### Write Performance Impact - - Composite indexes add ~15% write overhead - - Offset by dramatic read performance gains - - Monitored via slow query logs +- Composite indexes add ~15% write overhead +- Offset by dramatic read performance gains +- Monitored via slow query logs ### Storage Impact - - Indexes add ~25% to total storage - - Covering indexes reduce need for table scans - - Partial indexes minimize storage overhead +- Indexes add ~25% to total storage +- Covering indexes reduce need for table scans +- Partial indexes minimize storage overhead ## Migration Safety ### CONCURRENTLY Operations - - All indexes created with `CREATE INDEX CONCURRENTLY` - - No table locks during creation - - Production-safe deployment +- All indexes created with `CREATE INDEX CONCURRENTLY` +- No table locks during creation +- Production-safe deployment ### Rollback Strategy @@ -238,18 +238,18 @@ LIMIT 10; ### Monitoring Strategy - - Set up automated index usage monitoring - - Track slow query evolution - - Monitor storage growth patterns - - Implement performance alerting +- Set up automated index usage monitoring +- Track slow query evolution +- Monitor storage growth patterns +- Implement performance alerting ## Conclusion These database optimizations provide: - - **70-90% improvement** in query performance - - **Reduced server load** through efficient indexing - - **Better user experience** with faster dashboards - - **Scalable foundation** for future growth +- **70-90% improvement** in query performance +- **Reduced server load** through efficient indexing +- **Better user experience** with faster dashboards +- **Scalable foundation** for future growth The optimizations are designed to be production-safe and monitoring-friendly, ensuring both immediate performance gains and long-term maintainability. diff --git a/docs/neon-database-optimization.md b/docs/neon-database-optimization.md index 3c072ba..76067de 100644 --- a/docs/neon-database-optimization.md +++ b/docs/neon-database-optimization.md @@ -15,21 +15,21 @@ Can't reach database server at `ep-tiny-math-a2zsshve-pooler.eu-central-1.aws.ne ### 1. Neon Connection Limits -- **Free Tier**: 20 concurrent connections -- **Pro Tier**: 100 concurrent connections -- **Multiple schedulers** can quickly exhaust connections +- **Free Tier**: 20 concurrent connections +- **Pro Tier**: 100 concurrent connections +- **Multiple schedulers** can quickly exhaust connections ### 2. Connection Pooling Issues -- Each scheduler was creating separate PrismaClient instances -- No connection reuse between operations -- No retry logic for temporary failures +- Each scheduler was creating separate PrismaClient instances +- No connection reuse between operations +- No retry logic for temporary failures ### 3. Neon-Specific Challenges -- **Auto-pause**: Databases pause after inactivity -- **Cold starts**: First connection after pause takes longer -- **Regional latency**: eu-central-1 may have variable latency +- **Auto-pause**: Databases pause after inactivity +- **Cold starts**: First connection after pause takes longer +- **Regional latency**: eu-central-1 may have variable latency ## Solutions Implemented @@ -106,9 +106,9 @@ curl -H "Authorization: Bearer YOUR_API_TOKEN" \ ### 2. Neon Dashboard Monitoring -- Monitor "Active connections" in Neon dashboard -- Check for connection spikes during scheduler runs -- Review query performance and slow queries +- Monitor "Active connections" in Neon dashboard +- Check for connection spikes during scheduler runs +- Review query performance and slow queries ### 3. Application Logs @@ -158,9 +158,9 @@ const prisma = new PrismaClient({ **Causes:** -- Neon database auto-paused -- Connection limit exceeded -- Network issues +- Neon database auto-paused +- Connection limit exceeded +- Network issues **Solutions:** @@ -173,9 +173,9 @@ const prisma = new PrismaClient({ **Causes:** -- Idle connection timeout -- Neon maintenance -- Long-running transactions +- Idle connection timeout +- Neon maintenance +- Long-running transactions **Solutions:** @@ -187,9 +187,9 @@ const prisma = new PrismaClient({ **Causes:** -- Blocking database operations -- Scheduler overlap -- High CPU usage +- Blocking database operations +- Scheduler overlap +- High CPU usage **Solutions:** @@ -230,10 +230,10 @@ SESSION_PROCESSING_INTERVAL="0 */2 * * *" ## Monitoring Checklist -- [ ] Check Neon dashboard for connection spikes -- [ ] Monitor scheduler execution times -- [ ] Review error logs for connection patterns -- [ ] Test health endpoint regularly -- [ ] Set up alerts for connection failures +- [ ] Check Neon dashboard for connection spikes +- [ ] Monitor scheduler execution times +- [ ] Review error logs for connection patterns +- [ ] Test health endpoint regularly +- [ ] Set up alerts for connection failures With these optimizations, your Neon database connections should be much more stable and efficient! diff --git a/docs/postgresql-migration.md b/docs/postgresql-migration.md index 2f36d93..67c0e1a 100644 --- a/docs/postgresql-migration.md +++ b/docs/postgresql-migration.md @@ -17,48 +17,48 @@ Successfully migrated the livedash-node application from SQLite to PostgreSQL us #### Production/Development -- **Provider**: PostgreSQL (Neon) -- **Environment Variable**: `DATABASE_URL` -- **Connection**: Neon PostgreSQL cluster +- **Provider**: PostgreSQL (Neon) +- **Environment Variable**: `DATABASE_URL` +- **Connection**: Neon PostgreSQL cluster #### Testing -- **Provider**: PostgreSQL (Neon - separate database) -- **Environment Variable**: `DATABASE_URL_TEST` -- **Test Setup**: Automatically switches to test database during test runs +- **Provider**: PostgreSQL (Neon - separate database) +- **Environment Variable**: `DATABASE_URL_TEST` +- **Test Setup**: Automatically switches to test database during test runs ### Files Modified 1. **`prisma/schema.prisma`** -- Changed provider from `sqlite` to `postgresql` -- Updated URL to use `env("DATABASE_URL")` +- Changed provider from `sqlite` to `postgresql` +- Updated URL to use `env("DATABASE_URL")` 2. **`tests/setup.ts`** -- Added logic to use `DATABASE_URL_TEST` when available -- Ensures test isolation with separate database +- Added logic to use `DATABASE_URL_TEST` when available +- Ensures test isolation with separate database 3. **`.env`** (created) -- Contains `DATABASE_URL` for Prisma CLI operations +- Contains `DATABASE_URL` for Prisma CLI operations 4. **`.env.local`** (existing) -- Contains both `DATABASE_URL` and `DATABASE_URL_TEST` +- Contains both `DATABASE_URL` and `DATABASE_URL_TEST` ### Database Schema All existing models and relationships were preserved: -- **Company**: Multi-tenant root entity -- **User**: Authentication and authorization -- **Session**: Processed session data -- **SessionImport**: Raw CSV import data -- **Message**: Individual conversation messages -- **Question**: Normalized question storage -- **SessionQuestion**: Session-question relationships -- **AIProcessingRequest**: AI cost tracking +- **Company**: Multi-tenant root entity +- **User**: Authentication and authorization +- **Session**: Processed session data +- **SessionImport**: Raw CSV import data +- **Message**: Individual conversation messages +- **Question**: Normalized question storage +- **SessionQuestion**: Session-question relationships +- **AIProcessingRequest**: AI cost tracking ### Migration Process @@ -103,11 +103,11 @@ if (process.env.DATABASE_URL_TEST) { All tests pass successfully: -- ✅ Environment configuration tests -- ✅ Transcript fetcher tests -- ✅ Database connection tests -- ✅ Schema validation tests -- ✅ CRUD operation tests +- ✅ Environment configuration tests +- ✅ Transcript fetcher tests +- ✅ Database connection tests +- ✅ Schema validation tests +- ✅ CRUD operation tests ### Next Steps diff --git a/docs/scheduler-architecture.md b/docs/scheduler-architecture.md index 243079e..c80495b 100644 --- a/docs/scheduler-architecture.md +++ b/docs/scheduler-architecture.md @@ -6,11 +6,11 @@ This document describes the extracted scheduler architecture that enables horizo The scheduler system has been refactored from a monolithic approach to a service-oriented architecture that supports: -- **Individual Scheduler Services** - Each scheduler runs as a separate service -- **Horizontal Scaling** - Multiple instances of the same scheduler can run across different machines -- **Health Monitoring** - Built-in health checks for load balancers and orchestrators -- **Graceful Shutdown** - Proper handling of shutdown signals for zero-downtime deployments -- **Centralized Management** - Optional scheduler manager for coordinated operations +- **Individual Scheduler Services** - Each scheduler runs as a separate service +- **Horizontal Scaling** - Multiple instances of the same scheduler can run across different machines +- **Health Monitoring** - Built-in health checks for load balancers and orchestrators +- **Graceful Shutdown** - Proper handling of shutdown signals for zero-downtime deployments +- **Centralized Management** - Optional scheduler manager for coordinated operations ## Components @@ -34,11 +34,11 @@ export abstract class BaseSchedulerService extends EventEmitter { **Features:** -- Status management (STOPPED, STARTING, RUNNING, PAUSED, ERROR) -- Metrics collection (run counts, timing, success/failure rates) -- Event emission for monitoring -- Configurable intervals and timeouts -- Automatic retry handling +- Status management (STOPPED, STARTING, RUNNING, PAUSED, ERROR) +- Metrics collection (run counts, timing, success/failure rates) +- Event emission for monitoring +- Configurable intervals and timeouts +- Automatic retry handling ### 2. Individual Scheduler Services @@ -57,16 +57,16 @@ const csvScheduler = new CsvImportSchedulerService({ **Features:** -- Batch processing with configurable concurrency -- Duplicate detection -- Company-specific error handling -- Progress monitoring +- Batch processing with configurable concurrency +- Duplicate detection +- Company-specific error handling +- Progress monitoring #### Additional Schedulers (To Be Implemented) -- `ImportProcessingSchedulerService` - Process imported CSV data into sessions -- `SessionProcessingSchedulerService` - AI analysis and categorization -- `BatchProcessingSchedulerService` - OpenAI Batch API integration +- `ImportProcessingSchedulerService` - Process imported CSV data into sessions +- `SessionProcessingSchedulerService` - AI analysis and categorization +- `BatchProcessingSchedulerService` - OpenAI Batch API integration ### 3. SchedulerManager @@ -88,10 +88,10 @@ await manager.startAll(); **Features:** -- Automatic restart of failed critical schedulers -- Health monitoring across all schedulers -- Coordinated start/stop operations -- Event aggregation and logging +- Automatic restart of failed critical schedulers +- Health monitoring across all schedulers +- Coordinated start/stop operations +- Event aggregation and logging ### 4. Standalone Scheduler Runner @@ -107,10 +107,10 @@ npx tsx lib/services/schedulers/StandaloneSchedulerRunner.ts --list **Features:** -- Independent process execution -- Environment variable configuration -- Graceful shutdown handling -- Health reporting for monitoring +- Independent process execution +- Environment variable configuration +- Graceful shutdown handling +- Health reporting for monitoring ## Deployment Patterns @@ -127,15 +127,15 @@ await initializeSchedulers(); **Pros:** -- Simple deployment -- Lower resource usage -- Easy local development +- Simple deployment +- Lower resource usage +- Easy local development **Cons:** -- Limited scalability -- Single point of failure -- Resource contention +- Limited scalability +- Single point of failure +- Resource contention ### 2. Separate Processes @@ -154,15 +154,15 @@ npm run scheduler:session-processing **Pros:** -- Independent scaling -- Fault isolation -- Resource optimization per scheduler +- Independent scaling +- Fault isolation +- Resource optimization per scheduler **Cons:** -- More complex deployment -- Higher resource overhead -- Inter-process coordination needed +- More complex deployment +- Higher resource overhead +- Inter-process coordination needed ### 3. Container Orchestration (Recommended for Production) @@ -193,16 +193,16 @@ services: **Pros:** -- Full horizontal scaling -- Independent resource allocation -- Health monitoring integration -- Zero-downtime deployments +- Full horizontal scaling +- Independent resource allocation +- Health monitoring integration +- Zero-downtime deployments **Cons:** -- Complex orchestration setup -- Network latency considerations -- Distributed system challenges +- Complex orchestration setup +- Network latency considerations +- Distributed system challenges ## Configuration @@ -380,31 +380,35 @@ csv-import-scheduler-eu: ### From Current Architecture 1. **Phase 1: Extract Schedulers** - - ✅ Create BaseSchedulerService - - ✅ Implement CsvImportSchedulerService - - ✅ Create SchedulerManager - - ⏳ Implement remaining scheduler services + +- ✅ Create BaseSchedulerService +- ✅ Implement CsvImportSchedulerService +- ✅ Create SchedulerManager +- ⏳ Implement remaining scheduler services 2. **Phase 2: Deployment Options** - - ✅ Add ServerSchedulerIntegration for backwards compatibility - - ✅ Create StandaloneSchedulerRunner - - ✅ Add health check endpoints + +- ✅ Add ServerSchedulerIntegration for backwards compatibility +- ✅ Create StandaloneSchedulerRunner +- ✅ Add health check endpoints 3. **Phase 3: Container Support** - - ⏳ Create Dockerfile for scheduler containers - - ⏳ Add Kubernetes manifests - - ⏳ Implement distributed coordination + +- ⏳ Create Dockerfile for scheduler containers +- ⏳ Add Kubernetes manifests +- ⏳ Implement distributed coordination 4. **Phase 4: Production Migration** - - ⏳ Deploy separate scheduler containers - - ⏳ Monitor performance and stability - - ⏳ Gradually increase horizontal scaling + +- ⏳ Deploy separate scheduler containers +- ⏳ Monitor performance and stability +- ⏳ Gradually increase horizontal scaling ### Breaking Changes -- Scheduler initialization moved from `server.ts` to `ServerSchedulerIntegration` -- Individual scheduler functions replaced with service classes -- Configuration moved to environment variables +- Scheduler initialization moved from `server.ts` to `ServerSchedulerIntegration` +- Individual scheduler functions replaced with service classes +- Configuration moved to environment variables ## Benefits diff --git a/docs/scheduler-fixes.md b/docs/scheduler-fixes.md index aa628ef..0765625 100644 --- a/docs/scheduler-fixes.md +++ b/docs/scheduler-fixes.md @@ -8,8 +8,8 @@ **Solution**: -- Added validation in `fetchAndStoreSessionsForAllCompanies()` to skip companies with example/invalid URLs -- Removed the invalid company record from the database using `fix_companies.js` +- Added validation in `fetchAndStoreSessionsForAllCompanies()` to skip companies with example/invalid URLs +- Removed the invalid company record from the database using `fix_companies.js` ### 2. Transcript Fetching Errors @@ -17,10 +17,10 @@ **Solution**: -- Improved error handling in `fetchTranscriptContent()` function -- Added probabilistic logging (only ~10% of errors logged) to prevent log spam -- Added timeout (10 seconds) for transcript fetching -- Made transcript fetching failures non-blocking (sessions are still created without transcript content) +- Improved error handling in `fetchTranscriptContent()` function +- Added probabilistic logging (only ~10% of errors logged) to prevent log spam +- Added timeout (10 seconds) for transcript fetching +- Made transcript fetching failures non-blocking (sessions are still created without transcript content) ### 3. CSV Fetching Errors @@ -28,8 +28,8 @@ **Solution**: -- Added URL validation to skip companies with `example.com` URLs -- Improved error logging to be more descriptive +- Added URL validation to skip companies with `example.com` URLs +- Improved error logging to be more descriptive ## Current Status @@ -42,23 +42,23 @@ After cleanup, only valid companies remain: -- **Demo Company** (`790b9233-d369-451f-b92c-f4dceb42b649`) - - CSV URL: `https://proto.notso.ai/jumbo/chats` - - Has valid authentication credentials - - 107 sessions in database +- **Demo Company** (`790b9233-d369-451f-b92c-f4dceb42b649`) + - CSV URL: `https://proto.notso.ai/jumbo/chats` + - Has valid authentication credentials + - 107 sessions in database ## Files Modified 1. **lib/csvFetcher.js** -- Added company URL validation -- Improved transcript fetching error handling -- Reduced error log verbosity +- Added company URL validation +- Improved transcript fetching error handling +- Reduced error log verbosity 2. **fix_companies.js** (cleanup script) -- Removes invalid company records -- Can be run again if needed +- Removes invalid company records +- Can be run again if needed ## Monitoring diff --git a/docs/security-audit-logging.md b/docs/security-audit-logging.md index bae56c9..d709d4c 100644 --- a/docs/security-audit-logging.md +++ b/docs/security-audit-logging.md @@ -12,49 +12,49 @@ The security audit logging system provides comprehensive tracking of security-cr The system logs the following event types: -- **Authentication Events**: Login attempts, password changes, session management -- **Authorization Events**: Permission checks, access denied events -- **User Management**: User creation, modification, deletion, invitations -- **Company Management**: Company suspension, settings changes -- **Rate Limiting**: Abuse prevention and rate limit violations -- **CSRF Protection**: Cross-site request forgery protection events -- **Security Headers**: Security header violations -- **Password Reset**: Password reset flows and token validation -- **Platform Admin**: Administrative activities by platform users -- **Data Privacy**: Data export and privacy-related events -- **System Configuration**: System setting changes -- **API Security**: API-related security events +- **Authentication Events**: Login attempts, password changes, session management +- **Authorization Events**: Permission checks, access denied events +- **User Management**: User creation, modification, deletion, invitations +- **Company Management**: Company suspension, settings changes +- **Rate Limiting**: Abuse prevention and rate limit violations +- **CSRF Protection**: Cross-site request forgery protection events +- **Security Headers**: Security header violations +- **Password Reset**: Password reset flows and token validation +- **Platform Admin**: Administrative activities by platform users +- **Data Privacy**: Data export and privacy-related events +- **System Configuration**: System setting changes +- **API Security**: API-related security events ### 2. Structured Logging Each audit log entry includes: -- **Event Type**: Categorizes the security event -- **Action**: Specific action performed -- **Outcome**: Success, failure, blocked, rate limited, or suspicious -- **Severity**: Info, low, medium, high, or critical -- **Context**: User ID, company ID, platform user ID, IP address, user agent -- **Metadata**: Structured additional information -- **Timestamp**: Immutable timestamp for chronological ordering +- **Event Type**: Categorizes the security event +- **Action**: Specific action performed +- **Outcome**: Success, failure, blocked, rate limited, or suspicious +- **Severity**: Info, low, medium, high, or critical +- **Context**: User ID, company ID, platform user ID, IP address, user agent +- **Metadata**: Structured additional information +- **Timestamp**: Immutable timestamp for chronological ordering ### 3. Multi-Tenant Security -- Company-scoped audit logs ensure data isolation -- Platform admin actions tracked separately -- Role-based access controls for audit log viewing +- Company-scoped audit logs ensure data isolation +- Platform admin actions tracked separately +- Role-based access controls for audit log viewing ### 4. Log Retention and Management -- **Configurable Retention Policies**: Different retention periods based on event type and severity -- **Automatic Archival**: Critical and high-severity events archived before deletion -- **Scheduled Cleanup**: Weekly automated retention policy execution -- **Manual Controls**: Admin interface for manual retention execution +- **Configurable Retention Policies**: Different retention periods based on event type and severity +- **Automatic Archival**: Critical and high-severity events archived before deletion +- **Scheduled Cleanup**: Weekly automated retention policy execution +- **Manual Controls**: Admin interface for manual retention execution ### 5. Administrative Interface -- **Audit Log Viewer**: Comprehensive filtering and search capabilities -- **Retention Management**: View statistics and execute retention policies -- **Real-time Monitoring**: Track security events as they occur +- **Audit Log Viewer**: Comprehensive filtering and search capabilities +- **Retention Management**: View statistics and execute retention policies +- **Real-time Monitoring**: Track security events as they occur ## Architecture @@ -132,12 +132,12 @@ Administrators can access audit logs through: ### Filtering Options -- Event type (authentication, authorization, etc.) -- Outcome (success, failure, blocked, etc.) -- Severity level (info, low, medium, high, critical) -- Date range -- User ID -- Pagination support +- Event type (authentication, authorization, etc.) +- Outcome (success, failure, blocked, etc.) +- Severity level (info, low, medium, high, critical) +- Date range +- User ID +- Pagination support ## Configuration @@ -170,51 +170,51 @@ AUDIT_LOG_RETENTION_DRY_RUN=false ### Data Protection -- **IP Address Storage**: Client IP addresses stored for geographic analysis -- **Sensitive Data Redaction**: Passwords, tokens, and emails marked as `[REDACTED]` -- **Metadata Sanitization**: Complex objects sanitized to prevent data leakage +- **IP Address Storage**: Client IP addresses stored for geographic analysis +- **Sensitive Data Redaction**: Passwords, tokens, and emails marked as `[REDACTED]` +- **Metadata Sanitization**: Complex objects sanitized to prevent data leakage ### Access Controls -- **Admin-Only Access**: Only users with `ADMIN` role can view audit logs -- **Company Isolation**: Users can only view logs for their own company -- **Platform Separation**: Platform admin logs tracked separately +- **Admin-Only Access**: Only users with `ADMIN` role can view audit logs +- **Company Isolation**: Users can only view logs for their own company +- **Platform Separation**: Platform admin logs tracked separately ### Performance -- **Async Logging**: All logging operations are asynchronous to avoid blocking -- **Error Handling**: Logging failures don't affect application functionality -- **Indexed Queries**: Database indexes optimize common query patterns -- **Batch Operations**: Retention policies use batch operations for efficiency +- **Async Logging**: All logging operations are asynchronous to avoid blocking +- **Error Handling**: Logging failures don't affect application functionality +- **Indexed Queries**: Database indexes optimize common query patterns +- **Batch Operations**: Retention policies use batch operations for efficiency ## Compliance Features ### Audit Standards -- **Immutable Records**: Audit logs cannot be modified after creation -- **Chronological Ordering**: Precise timestamps for event sequencing -- **Non-Repudiation**: User actions clearly attributed and timestamped -- **Comprehensive Coverage**: All security-relevant events logged +- **Immutable Records**: Audit logs cannot be modified after creation +- **Chronological Ordering**: Precise timestamps for event sequencing +- **Non-Repudiation**: User actions clearly attributed and timestamped +- **Comprehensive Coverage**: All security-relevant events logged ### Reporting -- **Event Statistics**: Summary statistics by event type, severity, and time period -- **Export Capabilities**: Structured data export for compliance reporting -- **Retention Tracking**: Detailed logging of retention policy execution +- **Event Statistics**: Summary statistics by event type, severity, and time period +- **Export Capabilities**: Structured data export for compliance reporting +- **Retention Tracking**: Detailed logging of retention policy execution ## Monitoring and Alerting ### System Health -- **Scheduler Status**: Monitor retention scheduler health -- **Error Tracking**: Log retention and audit logging errors -- **Performance Metrics**: Track logging performance and database impact +- **Scheduler Status**: Monitor retention scheduler health +- **Error Tracking**: Log retention and audit logging errors +- **Performance Metrics**: Track logging performance and database impact ### Security Monitoring -- **Failed Authentication Patterns**: Track repeated login failures -- **Privilege Escalation**: Monitor administrative action patterns -- **Suspicious Activity**: Identify unusual access patterns +- **Failed Authentication Patterns**: Track repeated login failures +- **Privilege Escalation**: Monitor administrative action patterns +- **Suspicious Activity**: Identify unusual access patterns ## Troubleshooting @@ -227,9 +227,9 @@ AUDIT_LOG_RETENTION_DRY_RUN=false ### Debug Information -- Check application logs for scheduler startup messages -- Monitor database query performance for audit log operations -- Review retention policy validation warnings +- Check application logs for scheduler startup messages +- Monitor database query performance for audit log operations +- Review retention policy validation warnings ## Best Practices @@ -251,13 +251,13 @@ AUDIT_LOG_RETENTION_DRY_RUN=false ### Planned Features -- **Real-time Alerting**: Immediate notifications for critical security events -- **Advanced Analytics**: ML-based anomaly detection and pattern recognition -- **Export Formats**: Additional export formats for compliance reporting -- **External Integration**: SIEM and security tool integrations +- **Real-time Alerting**: Immediate notifications for critical security events +- **Advanced Analytics**: ML-based anomaly detection and pattern recognition +- **Export Formats**: Additional export formats for compliance reporting +- **External Integration**: SIEM and security tool integrations ### Performance Optimizations -- **Log Partitioning**: Database partitioning for improved query performance -- **Compression**: Log compression for storage efficiency -- **Streaming**: Real-time log streaming for external systems +- **Log Partitioning**: Database partitioning for improved query performance +- **Compression**: Log compression for storage efficiency +- **Streaming**: Real-time log streaming for external systems diff --git a/docs/security-headers.md b/docs/security-headers.md index cd8cde0..5ea44f4 100644 --- a/docs/security-headers.md +++ b/docs/security-headers.md @@ -12,33 +12,33 @@ The application implements multiple layers of HTTP security headers to provide d #### X-Content-Type-Options: nosniff -- **Purpose**: Prevents MIME type sniffing attacks -- **Protection**: Stops browsers from interpreting files as different MIME types than declared -- **Value**: `nosniff` +- **Purpose**: Prevents MIME type sniffing attacks +- **Protection**: Stops browsers from interpreting files as different MIME types than declared +- **Value**: `nosniff` #### X-Frame-Options: DENY -- **Purpose**: Prevents clickjacking attacks -- **Protection**: Blocks embedding the site in frames/iframes -- **Value**: `DENY` +- **Purpose**: Prevents clickjacking attacks +- **Protection**: Blocks embedding the site in frames/iframes +- **Value**: `DENY` #### X-XSS-Protection: 1; mode=block -- **Purpose**: Enables XSS protection in legacy browsers -- **Protection**: Activates built-in XSS filtering (primarily for older browsers) -- **Value**: `1; mode=block` +- **Purpose**: Enables XSS protection in legacy browsers +- **Protection**: Activates built-in XSS filtering (primarily for older browsers) +- **Value**: `1; mode=block` #### Referrer-Policy: strict-origin-when-cross-origin -- **Purpose**: Controls referrer information leakage -- **Protection**: Limits referrer data sent to external sites -- **Value**: `strict-origin-when-cross-origin` +- **Purpose**: Controls referrer information leakage +- **Protection**: Limits referrer data sent to external sites +- **Value**: `strict-origin-when-cross-origin` #### X-DNS-Prefetch-Control: off -- **Purpose**: Prevents DNS rebinding attacks -- **Protection**: Disables DNS prefetching to reduce attack surface -- **Value**: `off` +- **Purpose**: Prevents DNS rebinding attacks +- **Protection**: Disables DNS prefetching to reduce attack surface +- **Value**: `off` ### Content Security Policy (CSP) @@ -50,13 +50,13 @@ Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-eval' 'un #### Key CSP Directives -- **default-src 'self'**: Restrictive default for all resource types -- **script-src 'self' 'unsafe-eval' 'unsafe-inline'**: Allows Next.js dev tools and React functionality -- **style-src 'self' 'unsafe-inline'**: Enables TailwindCSS and component styles -- **img-src 'self' data: https:**: Allows secure image sources -- **frame-ancestors 'none'**: Prevents embedding (reinforces X-Frame-Options) -- **object-src 'none'**: Blocks dangerous plugins and embeds -- **upgrade-insecure-requests**: Automatically upgrades HTTP to HTTPS +- **default-src 'self'**: Restrictive default for all resource types +- **script-src 'self' 'unsafe-eval' 'unsafe-inline'**: Allows Next.js dev tools and React functionality +- **style-src 'self' 'unsafe-inline'**: Enables TailwindCSS and component styles +- **img-src 'self' data: https:**: Allows secure image sources +- **frame-ancestors 'none'**: Prevents embedding (reinforces X-Frame-Options) +- **object-src 'none'**: Blocks dangerous plugins and embeds +- **upgrade-insecure-requests**: Automatically upgrades HTTP to HTTPS ### Permissions Policy @@ -66,19 +66,19 @@ Controls browser feature access: Permissions-Policy: camera=(), microphone=(), geolocation=(), interest-cohort=(), browsing-topics=() ``` -- **camera=()**: Disables camera access -- **microphone=()**: Disables microphone access -- **geolocation=()**: Disables location tracking -- **interest-cohort=()**: Blocks FLoC (privacy protection) -- **browsing-topics=()**: Blocks Topics API (privacy protection) +- **camera=()**: Disables camera access +- **microphone=()**: Disables microphone access +- **geolocation=()**: Disables location tracking +- **interest-cohort=()**: Blocks FLoC (privacy protection) +- **browsing-topics=()**: Blocks Topics API (privacy protection) ### Strict Transport Security (HSTS) **Production Only**: `Strict-Transport-Security: max-age=31536000; includeSubDomains; preload` -- **max-age=31536000**: 1 year HSTS policy -- **includeSubDomains**: Applies to all subdomains -- **preload**: Ready for HSTS preload list inclusion +- **max-age=31536000**: 1 year HSTS policy +- **includeSubDomains**: Applies to all subdomains +- **preload**: Ready for HSTS preload list inclusion ## Configuration @@ -110,8 +110,8 @@ headers: async () => { ### Environment-Specific Behavior -- **Development**: All headers except HSTS -- **Production**: All headers including HSTS +- **Development**: All headers except HSTS +- **Production**: All headers including HSTS ## Testing @@ -121,11 +121,11 @@ Location: `tests/unit/http-security-headers.test.ts` Tests cover: -- Individual header validation -- CSP directive verification -- Permissions Policy validation -- Environment-specific configuration -- Next.js compatibility checks +- Individual header validation +- CSP directive verification +- Permissions Policy validation +- Environment-specific configuration +- Next.js compatibility checks ### Integration Tests @@ -133,9 +133,9 @@ Location: `tests/integration/security-headers-basic.test.ts` Tests cover: -- Next.js configuration validation -- Header generation verification -- Environment-based header differences +- Next.js configuration validation +- Header generation verification +- Environment-based header differences ### Manual Testing @@ -160,11 +160,11 @@ pnpm test:security-headers https://your-domain.com ### Additional Security Benefits -- **Clickjacking Protection**: X-Frame-Options + CSP frame-ancestors -- **MIME Sniffing Prevention**: X-Content-Type-Options -- **Information Leakage Reduction**: Referrer-Policy -- **Privacy Protection**: Permissions Policy restrictions -- **Transport Security**: HSTS enforcement +- **Clickjacking Protection**: X-Frame-Options + CSP frame-ancestors +- **MIME Sniffing Prevention**: X-Content-Type-Options +- **Information Leakage Reduction**: Referrer-Policy +- **Privacy Protection**: Permissions Policy restrictions +- **Transport Security**: HSTS enforcement ## Maintenance @@ -176,9 +176,9 @@ pnpm test:security-headers https://your-domain.com ### Monitoring -- Monitor CSP violation reports (when implemented) -- Use online tools like securityheaders.com for validation -- Include security header tests in CI/CD pipeline +- Monitor CSP violation reports (when implemented) +- Use online tools like securityheaders.com for validation +- Include security header tests in CI/CD pipeline ### Future Enhancements @@ -195,18 +195,18 @@ Planned improvements: Headers are configured to be compatible with: -- Next.js 15+ App Router -- React 19 development tools -- TailwindCSS 4 styling system -- Development hot reload functionality +- Next.js 15+ App Router +- React 19 development tools +- TailwindCSS 4 styling system +- Development hot reload functionality ### Browser Support Security headers are supported by: -- All modern browsers (Chrome 60+, Firefox 60+, Safari 12+) -- Graceful degradation for older browsers -- Progressive enhancement approach +- All modern browsers (Chrome 60+, Firefox 60+, Safari 12+) +- Graceful degradation for older browsers +- Progressive enhancement approach ## Troubleshooting @@ -219,13 +219,13 @@ Security headers are supported by: ### Debug Tools -- Browser DevTools Security tab -- CSP Evaluator: -- Security Headers Scanner: +- Browser DevTools Security tab +- CSP Evaluator: +- Security Headers Scanner: ## References -- [OWASP Secure Headers Project](https://owasp.org/www-project-secure-headers/) -- [MDN Security Headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers#security) -- [Next.js Security Headers](https://nextjs.org/docs/app/api-reference/config/headers) -- [Content Security Policy Reference](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) +- [OWASP Secure Headers Project](https://owasp.org/www-project-secure-headers/) +- [MDN Security Headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers#security) +- [Next.js Security Headers](https://nextjs.org/docs/app/api-reference/config/headers) +- [Content Security Policy Reference](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) diff --git a/docs/security-monitoring.md b/docs/security-monitoring.md index 51fbcba..e49a026 100644 --- a/docs/security-monitoring.md +++ b/docs/security-monitoring.md @@ -9,38 +9,42 @@ The Security Monitoring and Alerting System provides comprehensive real-time sec ### Core Components 1. **Security Monitoring Service** (`lib/securityMonitoring.ts`) - - Real-time event processing - - Anomaly detection algorithms - - Alert generation and management - - Security score calculation - - Threat level assessment + +- Real-time event processing +- Anomaly detection algorithms +- Alert generation and management +- Security score calculation +- Threat level assessment 2. **Enhanced Security Logging** (`enhancedSecurityLog`) - - Integrates with existing audit logger - - Processes events through monitoring system - - Triggers immediate threat detection + +- Integrates with existing audit logger +- Processes events through monitoring system +- Triggers immediate threat detection 3. **API Endpoints** (`app/api/admin/security-monitoring/`) - - `/api/admin/security-monitoring` - Main metrics and configuration - - `/api/admin/security-monitoring/alerts` - Alert management - - `/api/admin/security-monitoring/export` - Data export - - `/api/admin/security-monitoring/threat-analysis` - Threat analysis + +- `/api/admin/security-monitoring` - Main metrics and configuration +- `/api/admin/security-monitoring/alerts` - Alert management +- `/api/admin/security-monitoring/export` - Data export +- `/api/admin/security-monitoring/threat-analysis` - Threat analysis 4. **Dashboard UI** (`app/platform/security/page.tsx`) - - Real-time security metrics - - Active alerts management - - Threat analysis visualization - - Configuration management + +- Real-time security metrics +- Active alerts management +- Threat analysis visualization +- Configuration management ## Features ### Real-time Monitoring -- **Authentication Events**: Login attempts, failures, brute force attacks -- **Rate Limiting**: Excessive request patterns, API abuse -- **Admin Activity**: Unusual administrative actions -- **Geographic Anomalies**: Logins from unusual locations -- **Temporal Anomalies**: Activity spikes outside normal patterns +- **Authentication Events**: Login attempts, failures, brute force attacks +- **Rate Limiting**: Excessive request patterns, API abuse +- **Admin Activity**: Unusual administrative actions +- **Geographic Anomalies**: Logins from unusual locations +- **Temporal Anomalies**: Activity spikes outside normal patterns ### Alert Types @@ -69,29 +73,32 @@ enum AlertType { The system implements several anomaly detection algorithms: 1. **Geographic Anomaly Detection** - - Detects logins from unusual countries - - Compares against historical user patterns - - Confidence scoring based on deviation + +- Detects logins from unusual countries +- Compares against historical user patterns +- Confidence scoring based on deviation 2. **Temporal Anomaly Detection** - - Identifies activity spikes during unusual hours - - Compares current activity to historical averages - - Configurable thresholds for different event types + +- Identifies activity spikes during unusual hours +- Compares current activity to historical averages +- Configurable thresholds for different event types 3. **Behavioral Anomaly Detection** - - Multiple failed login attempts - - Rapid succession of actions - - Pattern deviation analysis + +- Multiple failed login attempts +- Rapid succession of actions +- Pattern deviation analysis ### Security Scoring The system calculates a real-time security score (0-100) based on: -- Critical security events (weight: 25) -- Active unresolved alerts (weight: 30) -- High-severity threats (weight: 20) -- Overall event volume (weight: 15) -- System stability factors (weight: 10) +- Critical security events (weight: 25) +- Active unresolved alerts (weight: 30) +- High-severity threats (weight: 20) +- Overall event volume (weight: 15) +- System stability factors (weight: 10) ### Threat Levels @@ -255,117 +262,121 @@ await enhancedSecurityLog( ### Security Overview -- Real-time security score (0-100) -- Current threat level indicator -- Active alerts count -- Security events summary +- Real-time security score (0-100) +- Current threat level indicator +- Active alerts count +- Security events summary ### Alert Management -- View active and historical alerts -- Filter by severity and type -- Acknowledge alerts with tracking -- Detailed alert context and metadata +- View active and historical alerts +- Filter by severity and type +- Acknowledge alerts with tracking +- Detailed alert context and metadata ### Threat Analysis -- Geographic distribution of events -- Top threat types and patterns -- User risk scoring -- IP threat level analysis +- Geographic distribution of events +- Top threat types and patterns +- User risk scoring +- IP threat level analysis ### Configuration Management -- Adjust detection thresholds -- Configure alerting channels -- Set data retention policies -- Export capabilities +- Adjust detection thresholds +- Configure alerting channels +- Set data retention policies +- Export capabilities ## Performance Considerations ### Memory Management -- Event buffer limited to 1 hour of data -- Automatic cleanup of old alerts (configurable) -- Efficient in-memory storage for real-time analysis +- Event buffer limited to 1 hour of data +- Automatic cleanup of old alerts (configurable) +- Efficient in-memory storage for real-time analysis ### Database Impact -- Leverages existing audit log indexes -- Optimized queries for time-range filtering -- Background processing to avoid blocking operations +- Leverages existing audit log indexes +- Optimized queries for time-range filtering +- Background processing to avoid blocking operations ### Scalability -- Stateless architecture (except for buffering) -- Horizontal scaling support -- Configurable processing intervals +- Stateless architecture (except for buffering) +- Horizontal scaling support +- Configurable processing intervals ## Security Considerations ### Access Control -- Platform admin authentication required -- Role-based access to security endpoints -- Audit logging of all monitoring activities +- Platform admin authentication required +- Role-based access to security endpoints +- Audit logging of all monitoring activities ### Data Privacy -- Sensitive data redaction in logs -- IP address anonymization options -- Configurable data retention periods +- Sensitive data redaction in logs +- IP address anonymization options +- Configurable data retention periods ### Alert Suppression -- Duplicate alert suppression (configurable window) -- Rate limiting on alert generation -- Escalation policies for critical threats +- Duplicate alert suppression (configurable window) +- Rate limiting on alert generation +- Escalation policies for critical threats ## Monitoring and Maintenance ### Health Checks -- Monitor service availability -- Check alert generation pipeline -- Verify data export functionality +- Monitor service availability +- Check alert generation pipeline +- Verify data export functionality ### Regular Tasks -- Review and adjust thresholds quarterly -- Analyze false positive rates -- Update threat detection patterns -- Clean up old alert data +- Review and adjust thresholds quarterly +- Analyze false positive rates +- Update threat detection patterns +- Clean up old alert data ### Performance Metrics -- Alert response time -- False positive/negative rates -- System resource usage -- User engagement with alerts +- Alert response time +- False positive/negative rates +- System resource usage +- User engagement with alerts ## Future Enhancements ### Planned Features 1. **Machine Learning Integration** - - Behavioral pattern recognition - - Adaptive threshold adjustment - - Predictive threat modeling + +- Behavioral pattern recognition +- Adaptive threshold adjustment +- Predictive threat modeling 2. **Advanced Analytics** - - Threat intelligence integration - - Cross-correlation analysis - - Risk trend analysis + +- Threat intelligence integration +- Cross-correlation analysis +- Risk trend analysis 3. **Integration Enhancements** - - SIEM system connectors - - Webhook customization - - Mobile app notifications + +- SIEM system connectors +- Webhook customization +- Mobile app notifications 4. **Automated Response** - - IP blocking automation - - Account suspension workflows - - Incident response orchestration + +- IP blocking automation +- Account suspension workflows +- Incident response orchestration ## Troubleshooting @@ -373,27 +384,27 @@ await enhancedSecurityLog( **High False Positive Rate** -- Review and adjust detection thresholds -- Analyze user behavior patterns -- Consider geographical variations +- Review and adjust detection thresholds +- Analyze user behavior patterns +- Consider geographical variations **Missing Alerts** -- Check service configuration -- Verify audit log integration -- Review threshold settings +- Check service configuration +- Verify audit log integration +- Review threshold settings **Performance Issues** -- Monitor memory usage -- Adjust cleanup intervals -- Optimize database queries +- Monitor memory usage +- Adjust cleanup intervals +- Optimize database queries **Export Failures** -- Check file permissions -- Verify date range validity -- Monitor server resources +- Check file permissions +- Verify date range validity +- Monitor server resources ### Debugging @@ -419,24 +430,24 @@ console.log("Active alerts:", alerts.length); ### Unit Tests -- Alert generation logic -- Anomaly detection algorithms -- Configuration management -- Data export functionality +- Alert generation logic +- Anomaly detection algorithms +- Configuration management +- Data export functionality ### Integration Tests -- API endpoint security -- Database integration -- Real-time event processing -- Alert acknowledgment flow +- API endpoint security +- Database integration +- Real-time event processing +- Alert acknowledgment flow ### Load Testing -- High-volume event processing -- Concurrent alert generation -- Database performance under load -- Memory usage patterns +- High-volume event processing +- Concurrent alert generation +- Database performance under load +- Memory usage patterns Run tests: diff --git a/docs/session-processing.md b/docs/session-processing.md index abb1a31..ccfe2ee 100644 --- a/docs/session-processing.md +++ b/docs/session-processing.md @@ -15,22 +15,22 @@ The system now includes an automated process for analyzing chat session transcri ### Session Fetching -- The system fetches session data from configured CSV URLs for each company -- Unlike the previous implementation, it now only adds sessions that don't already exist in the database -- This prevents duplicate sessions and allows for incremental updates +- The system fetches session data from configured CSV URLs for each company +- Unlike the previous implementation, it now only adds sessions that don't already exist in the database +- This prevents duplicate sessions and allows for incremental updates ### Transcript Processing -- For sessions with transcript content that haven't been processed yet, the system calls OpenAI's API -- The API analyzes the transcript and extracts the following information: - - Primary language used (ISO 639-1 code) - - Number of messages sent by the user - - Overall sentiment (positive, neutral, negative) - - Whether the conversation was escalated - - Whether HR contact was mentioned or provided - - Best-fitting category for the conversation - - Up to 5 paraphrased questions asked by the user - - A brief summary of the conversation +- For sessions with transcript content that haven't been processed yet, the system calls OpenAI's API +- The API analyzes the transcript and extracts the following information: + - Primary language used (ISO 639-1 code) + - Number of messages sent by the user + - Overall sentiment (positive, neutral, negative) + - Whether the conversation was escalated + - Whether HR contact was mentioned or provided + - Best-fitting category for the conversation + - Up to 5 paraphrased questions asked by the user + - A brief summary of the conversation ### Scheduling @@ -43,10 +43,10 @@ The system includes two schedulers: The Session model has been updated with new fields to store the processed data: -- `processed`: Boolean flag indicating whether the session has been processed -- `sentimentCategory`: String value ("positive", "neutral", "negative") from OpenAI -- `questions`: JSON array of questions asked by the user -- `summary`: Brief summary of the conversation +- `processed`: Boolean flag indicating whether the session has been processed +- `sentimentCategory`: String value ("positive", "neutral", "negative") from OpenAI +- `questions`: JSON array of questions asked by the user +- `summary`: Brief summary of the conversation ## Configuration @@ -62,9 +62,9 @@ OPENAI_API_KEY=your_api_key_here To run the application with schedulers enabled: -- Development: `npm run dev` -- Development (with schedulers disabled): `npm run dev:no-schedulers` -- Production: `npm run start` +- Development: `npm run dev` +- Development (with schedulers disabled): `npm run dev:no-schedulers` +- Production: `npm run start` Note: These commands will start a custom Next.js server with the schedulers enabled. You'll need to have an OpenAI API key set in your `.env.local` file for the session processing to work. @@ -82,5 +82,5 @@ This will process all unprocessed sessions that have transcript content. The processing logic can be customized by modifying: -- `lib/processingScheduler.ts`: Contains the OpenAI processing logic -- `scripts/process_sessions.ts`: Standalone script for manual processing +- `lib/processingScheduler.ts`: Contains the OpenAI processing logic +- `scripts/process_sessions.ts`: Standalone script for manual processing