Implement data integration tasks with Celery, including periodic fetching and manual refresh of chat data; add utility functions for data processing and transcript handling; create views and URLs for manual data refresh; establish Redis and Celery configuration; enhance error handling and logging; introduce scripts for data cleanup and fixing dashboard data; update documentation for Redis and Celery setup and troubleshooting.

This commit is contained in:
2025-05-18 13:33:11 +00:00
parent e8f2d2adc2
commit 8bbbb109bd
63 changed files with 4601 additions and 164 deletions

172
docs/CELERY_REDIS.md Normal file
View File

@ -0,0 +1,172 @@
# Redis and Celery Configuration
This document explains how to set up and use Redis and Celery for background task processing in the LiveGraphs application.
## Overview
The data integration module uses Celery to handle:
- Periodic data fetching from external APIs
- Processing and storing CSV data
- Downloading and parsing transcript files
- Manual data refresh triggered by users
## Installation
### Redis (Recommended)
Redis is the recommended message broker for Celery due to its performance and reliability:
#### Ubuntu/Debian
```bash
sudo apt update
sudo apt install redis-server
sudo systemctl start redis-server
sudo systemctl enable redis-server
# Verify that Redis is running
redis-cli ping # Should output PONG
```
After installation, check if Redis is properly configured:
1. Open Redis configuration file:
```bash
sudo nano /etc/redis/redis.conf
```
2. Ensure the following settings:
```bash
# For development (localhost only)
bind 127.0.0.1
# For production (accept connections from specific IP)
# bind 127.0.0.1 your.server.ip.address
# Protected mode (recommended)
protected-mode yes
# Port
port 6379
```
3. Restart Redis after any changes:
```bash
sudo systemctl restart redis-server
```
#### macOS
```bash
brew install redis
brew services start redis
```
#### Windows
Download and install from [microsoftarchive/redis](https://github.com/microsoftarchive/redis/releases)
### SQLite Fallback
If Redis is not available, the application will automatically fall back to using SQLite for Celery tasks. This works well for development but is not recommended for production.
## Configuration
### Environment Variables
Set these environment variables in your `.env` file or deployment environment:
```env
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
# Task Scheduling
CHAT_DATA_FETCH_INTERVAL=3600 # In seconds (1 hour)
FETCH_DATA_TIMEOUT=300 # In seconds (5 minutes)
```
### Testing Redis Connection
To test if Redis is properly configured:
```bash
cd dashboard_project
python manage.py test_redis
```
### Testing Celery
To test if Celery is working correctly:
```bash
# Start a Celery worker in one terminal
make celery
# In another terminal, run the test task
cd dashboard_project
python manage.py test_celery
```
## Running with Docker
The included `docker-compose.yml` file sets up Redis, Celery worker, and Celery beat for you:
```bash
docker-compose up -d
```
## Running in Development
Development requires multiple terminal windows:
1. **Django Development Server**:
```bash
make run
```
2. **Redis Server** (if needed):
```bash
make run-redis
```
3. **Celery Worker**:
```bash
make celery
```
4. **Celery Beat** (for scheduled tasks):
```bash
make celery-beat
```
Or use the combined command:
```bash
make run-all
```
## Common Issues
### Redis Connection Failures
If you see connection errors:
1. Check that Redis is running: `redis-cli ping` should return `PONG`
2. Verify firewall settings are not blocking port 6379
3. Check Redis binding in `/etc/redis/redis.conf` (should be `bind 127.0.0.1` for local dev)
### Celery Workers Not Processing Tasks
1. Ensure the worker is running with the correct app name: `celery -A dashboard_project worker`
2. Check the Celery logs for errors
3. Verify broker URL settings in both code and environment variables

142
docs/TROUBLESHOOTING.md Normal file
View File

@ -0,0 +1,142 @@
# Redis and Celery Troubleshooting Guide
This guide provides detailed steps to diagnose and fix issues with Redis and Celery in the LiveGraphs project.
## Diagnosing Redis Connection Issues
### Check if Redis is Running
```bash
# Check Redis server status
sudo systemctl status redis-server
# Try to ping Redis
redis-cli ping # Should return PONG
```
### Test Redis Connectivity
Use our built-in test tool:
```bash
cd dashboard_project
python manage.py test_redis
```
If this fails, check the following:
1. Redis might not be running. Start it with:
```bash
sudo systemctl start redis-server
```
2. Connection credentials may be incorrect. Check your environment variables:
```bash
echo $REDIS_URL
echo $CELERY_BROKER_URL
echo $CELERY_RESULT_BACKEND
```
3. Redis might be binding only to a specific interface. Check `/etc/redis/redis.conf`:
```bash
grep "bind" /etc/redis/redis.conf
```
4. Firewall rules might be blocking Redis. If you're connecting remotely:
```bash
sudo ufw status # Check if firewall is enabled
sudo ufw allow 6379/tcp # Allow Redis port if needed
```
## Fixing CSV Data Processing Issues
If you see the error `zip() argument 2 is shorter than argument 1`, it means the data format doesn't match the expected headers. We've implemented a fix that:
1. Pads shorter rows with empty strings
2. Uses more flexible date format parsing
3. Provides better error handling
After these changes, your data should be processed correctly regardless of format variations.
## Testing Celery Tasks
To verify if your Celery configuration is working:
```bash
# Start a Celery worker in one terminal
cd dashboard_project
celery -A dashboard_project worker --loglevel=info
# In another terminal, run the test task
cd dashboard_project
python manage.py test_celery
```
If the task isn't completing, check:
1. Look for errors in the Celery worker terminal
2. Verify broker URL settings match in both terminals:
```bash
echo $CELERY_BROKER_URL
```
3. Check if Redis is accessible from both terminals:
```bash
redis-cli ping
```
## Checking Scheduled Tasks
To verify if scheduled tasks are configured correctly:
```bash
# List all scheduled tasks
cd dashboard_project
python manage.py celery inspect scheduled
```
Common issues with scheduled tasks:
1. **Celery Beat not running**: Start it with:
```bash
cd dashboard_project
celery -A dashboard_project beat
```
2. **Task registered but not running**: Check worker logs for any errors
3. **Wrong schedule**: Check the interval in settings.py and CELERY_BEAT_SCHEDULE
## Data Source Configuration
If data sources aren't being processed correctly:
1. Verify active data sources exist:
```bash
cd dashboard_project
python manage.py shell -c "from data_integration.models import ExternalDataSource; print(ExternalDataSource.objects.filter(is_active=True).count())"
```
2. Create a default data source if needed:
```bash
cd dashboard_project
python manage.py create_default_datasource
```
3. Check source URLs and credentials in the admin interface or environment variables.
## Manually Triggering Data Refresh
To manually trigger a data refresh for testing:
```bash
cd dashboard_project
python manage.py shell -c "from data_integration.tasks import periodic_fetch_chat_data; periodic_fetch_chat_data()"
```
This will execute the task directly without going through Celery, which is useful for debugging.