mirror of
https://github.com/kjanat/livegraphs-django.git
synced 2026-01-16 09:22:09 +01:00
Implement data integration tasks with Celery, including periodic fetching and manual refresh of chat data; add utility functions for data processing and transcript handling; create views and URLs for manual data refresh; establish Redis and Celery configuration; enhance error handling and logging; introduce scripts for data cleanup and fixing dashboard data; update documentation for Redis and Celery setup and troubleshooting.
This commit is contained in:
172
docs/CELERY_REDIS.md
Normal file
172
docs/CELERY_REDIS.md
Normal file
@ -0,0 +1,172 @@
|
||||
# Redis and Celery Configuration
|
||||
|
||||
This document explains how to set up and use Redis and Celery for background task processing in the LiveGraphs application.
|
||||
|
||||
## Overview
|
||||
|
||||
The data integration module uses Celery to handle:
|
||||
|
||||
- Periodic data fetching from external APIs
|
||||
- Processing and storing CSV data
|
||||
- Downloading and parsing transcript files
|
||||
- Manual data refresh triggered by users
|
||||
|
||||
## Installation
|
||||
|
||||
### Redis (Recommended)
|
||||
|
||||
Redis is the recommended message broker for Celery due to its performance and reliability:
|
||||
|
||||
#### Ubuntu/Debian
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install redis-server
|
||||
sudo systemctl start redis-server
|
||||
sudo systemctl enable redis-server
|
||||
|
||||
# Verify that Redis is running
|
||||
redis-cli ping # Should output PONG
|
||||
```
|
||||
|
||||
After installation, check if Redis is properly configured:
|
||||
|
||||
1. Open Redis configuration file:
|
||||
|
||||
```bash
|
||||
sudo nano /etc/redis/redis.conf
|
||||
```
|
||||
|
||||
2. Ensure the following settings:
|
||||
|
||||
```bash
|
||||
# For development (localhost only)
|
||||
bind 127.0.0.1
|
||||
|
||||
# For production (accept connections from specific IP)
|
||||
# bind 127.0.0.1 your.server.ip.address
|
||||
|
||||
# Protected mode (recommended)
|
||||
protected-mode yes
|
||||
|
||||
# Port
|
||||
port 6379
|
||||
```
|
||||
|
||||
3. Restart Redis after any changes:
|
||||
```bash
|
||||
sudo systemctl restart redis-server
|
||||
```
|
||||
|
||||
#### macOS
|
||||
|
||||
```bash
|
||||
brew install redis
|
||||
brew services start redis
|
||||
```
|
||||
|
||||
#### Windows
|
||||
|
||||
Download and install from [microsoftarchive/redis](https://github.com/microsoftarchive/redis/releases)
|
||||
|
||||
### SQLite Fallback
|
||||
|
||||
If Redis is not available, the application will automatically fall back to using SQLite for Celery tasks. This works well for development but is not recommended for production.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Set these environment variables in your `.env` file or deployment environment:
|
||||
|
||||
```env
|
||||
# Redis Configuration
|
||||
REDIS_HOST=localhost
|
||||
REDIS_PORT=6379
|
||||
REDIS_DB=0
|
||||
CELERY_BROKER_URL=redis://localhost:6379/0
|
||||
CELERY_RESULT_BACKEND=redis://localhost:6379/0
|
||||
|
||||
# Task Scheduling
|
||||
CHAT_DATA_FETCH_INTERVAL=3600 # In seconds (1 hour)
|
||||
FETCH_DATA_TIMEOUT=300 # In seconds (5 minutes)
|
||||
```
|
||||
|
||||
### Testing Redis Connection
|
||||
|
||||
To test if Redis is properly configured:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py test_redis
|
||||
```
|
||||
|
||||
### Testing Celery
|
||||
|
||||
To test if Celery is working correctly:
|
||||
|
||||
```bash
|
||||
# Start a Celery worker in one terminal
|
||||
make celery
|
||||
|
||||
# In another terminal, run the test task
|
||||
cd dashboard_project
|
||||
python manage.py test_celery
|
||||
```
|
||||
|
||||
## Running with Docker
|
||||
|
||||
The included `docker-compose.yml` file sets up Redis, Celery worker, and Celery beat for you:
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
## Running in Development
|
||||
|
||||
Development requires multiple terminal windows:
|
||||
|
||||
1. **Django Development Server**:
|
||||
|
||||
```bash
|
||||
make run
|
||||
```
|
||||
|
||||
2. **Redis Server** (if needed):
|
||||
|
||||
```bash
|
||||
make run-redis
|
||||
```
|
||||
|
||||
3. **Celery Worker**:
|
||||
|
||||
```bash
|
||||
make celery
|
||||
```
|
||||
|
||||
4. **Celery Beat** (for scheduled tasks):
|
||||
```bash
|
||||
make celery-beat
|
||||
```
|
||||
|
||||
Or use the combined command:
|
||||
|
||||
```bash
|
||||
make run-all
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Redis Connection Failures
|
||||
|
||||
If you see connection errors:
|
||||
|
||||
1. Check that Redis is running: `redis-cli ping` should return `PONG`
|
||||
2. Verify firewall settings are not blocking port 6379
|
||||
3. Check Redis binding in `/etc/redis/redis.conf` (should be `bind 127.0.0.1` for local dev)
|
||||
|
||||
### Celery Workers Not Processing Tasks
|
||||
|
||||
1. Ensure the worker is running with the correct app name: `celery -A dashboard_project worker`
|
||||
2. Check the Celery logs for errors
|
||||
3. Verify broker URL settings in both code and environment variables
|
||||
142
docs/TROUBLESHOOTING.md
Normal file
142
docs/TROUBLESHOOTING.md
Normal file
@ -0,0 +1,142 @@
|
||||
# Redis and Celery Troubleshooting Guide
|
||||
|
||||
This guide provides detailed steps to diagnose and fix issues with Redis and Celery in the LiveGraphs project.
|
||||
|
||||
## Diagnosing Redis Connection Issues
|
||||
|
||||
### Check if Redis is Running
|
||||
|
||||
```bash
|
||||
# Check Redis server status
|
||||
sudo systemctl status redis-server
|
||||
|
||||
# Try to ping Redis
|
||||
redis-cli ping # Should return PONG
|
||||
```
|
||||
|
||||
### Test Redis Connectivity
|
||||
|
||||
Use our built-in test tool:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py test_redis
|
||||
```
|
||||
|
||||
If this fails, check the following:
|
||||
|
||||
1. Redis might not be running. Start it with:
|
||||
|
||||
```bash
|
||||
sudo systemctl start redis-server
|
||||
```
|
||||
|
||||
2. Connection credentials may be incorrect. Check your environment variables:
|
||||
|
||||
```bash
|
||||
echo $REDIS_URL
|
||||
echo $CELERY_BROKER_URL
|
||||
echo $CELERY_RESULT_BACKEND
|
||||
```
|
||||
|
||||
3. Redis might be binding only to a specific interface. Check `/etc/redis/redis.conf`:
|
||||
|
||||
```bash
|
||||
grep "bind" /etc/redis/redis.conf
|
||||
```
|
||||
|
||||
4. Firewall rules might be blocking Redis. If you're connecting remotely:
|
||||
```bash
|
||||
sudo ufw status # Check if firewall is enabled
|
||||
sudo ufw allow 6379/tcp # Allow Redis port if needed
|
||||
```
|
||||
|
||||
## Fixing CSV Data Processing Issues
|
||||
|
||||
If you see the error `zip() argument 2 is shorter than argument 1`, it means the data format doesn't match the expected headers. We've implemented a fix that:
|
||||
|
||||
1. Pads shorter rows with empty strings
|
||||
2. Uses more flexible date format parsing
|
||||
3. Provides better error handling
|
||||
|
||||
After these changes, your data should be processed correctly regardless of format variations.
|
||||
|
||||
## Testing Celery Tasks
|
||||
|
||||
To verify if your Celery configuration is working:
|
||||
|
||||
```bash
|
||||
# Start a Celery worker in one terminal
|
||||
cd dashboard_project
|
||||
celery -A dashboard_project worker --loglevel=info
|
||||
|
||||
# In another terminal, run the test task
|
||||
cd dashboard_project
|
||||
python manage.py test_celery
|
||||
```
|
||||
|
||||
If the task isn't completing, check:
|
||||
|
||||
1. Look for errors in the Celery worker terminal
|
||||
2. Verify broker URL settings match in both terminals:
|
||||
```bash
|
||||
echo $CELERY_BROKER_URL
|
||||
```
|
||||
3. Check if Redis is accessible from both terminals:
|
||||
```bash
|
||||
redis-cli ping
|
||||
```
|
||||
|
||||
## Checking Scheduled Tasks
|
||||
|
||||
To verify if scheduled tasks are configured correctly:
|
||||
|
||||
```bash
|
||||
# List all scheduled tasks
|
||||
cd dashboard_project
|
||||
python manage.py celery inspect scheduled
|
||||
```
|
||||
|
||||
Common issues with scheduled tasks:
|
||||
|
||||
1. **Celery Beat not running**: Start it with:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
celery -A dashboard_project beat
|
||||
```
|
||||
|
||||
2. **Task registered but not running**: Check worker logs for any errors
|
||||
|
||||
3. **Wrong schedule**: Check the interval in settings.py and CELERY_BEAT_SCHEDULE
|
||||
|
||||
## Data Source Configuration
|
||||
|
||||
If data sources aren't being processed correctly:
|
||||
|
||||
1. Verify active data sources exist:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py shell -c "from data_integration.models import ExternalDataSource; print(ExternalDataSource.objects.filter(is_active=True).count())"
|
||||
```
|
||||
|
||||
2. Create a default data source if needed:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py create_default_datasource
|
||||
```
|
||||
|
||||
3. Check source URLs and credentials in the admin interface or environment variables.
|
||||
|
||||
## Manually Triggering Data Refresh
|
||||
|
||||
To manually trigger a data refresh for testing:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py shell -c "from data_integration.tasks import periodic_fetch_chat_data; periodic_fetch_chat_data()"
|
||||
```
|
||||
|
||||
This will execute the task directly without going through Celery, which is useful for debugging.
|
||||
Reference in New Issue
Block a user