mirror of
https://github.com/kjanat/livegraphs-django.git
synced 2026-01-16 09:02:11 +01:00
Enhance data integration and transcript parsing
- Improved date parsing in fetch_and_store_chat_data to support multiple formats and added error logging for unparseable dates. - Enhanced parse_and_store_transcript_messages to handle empty transcripts and expanded message pattern recognition for both User and Assistant. - Implemented intelligent splitting of transcripts based on detected patterns and timestamps, with fallback mechanisms for unrecognized formats. - Updated documentation for Celery and Redis setup, troubleshooting, and project structure. - Added markdown linting configuration and scripts for code formatting. - Updated Nginx configuration to change the web server port. - Added xlsxwriter dependency for Excel file handling in project requirements.
This commit is contained in:
@ -6,10 +6,10 @@ This document explains how to set up and use Redis and Celery for background tas
|
||||
|
||||
The data integration module uses Celery to handle:
|
||||
|
||||
- Periodic data fetching from external APIs
|
||||
- Processing and storing CSV data
|
||||
- Downloading and parsing transcript files
|
||||
- Manual data refresh triggered by users
|
||||
- Periodic data fetching from external APIs
|
||||
- Processing and storing CSV data
|
||||
- Downloading and parsing transcript files
|
||||
- Manual data refresh triggered by users
|
||||
|
||||
## Installation
|
||||
|
||||
@ -31,32 +31,33 @@ redis-cli ping # Should output PONG
|
||||
|
||||
After installation, check if Redis is properly configured:
|
||||
|
||||
1. Open Redis configuration file:
|
||||
1. Open Redis configuration file:
|
||||
|
||||
```bash
|
||||
sudo nano /etc/redis/redis.conf
|
||||
```
|
||||
```bash
|
||||
sudo nano /etc/redis/redis.conf
|
||||
```
|
||||
|
||||
2. Ensure the following settings:
|
||||
2. Ensure the following settings:
|
||||
|
||||
```bash
|
||||
# For development (localhost only)
|
||||
bind 127.0.0.1
|
||||
```bash
|
||||
# For development (localhost only)
|
||||
bind 127.0.0.1
|
||||
|
||||
# For production (accept connections from specific IP)
|
||||
# bind 127.0.0.1 your.server.ip.address
|
||||
# For production (accept connections from specific IP)
|
||||
# bind 127.0.0.1 your.server.ip.address
|
||||
|
||||
# Protected mode (recommended)
|
||||
protected-mode yes
|
||||
# Protected mode (recommended)
|
||||
protected-mode yes
|
||||
|
||||
# Port
|
||||
port 6379
|
||||
```
|
||||
# Port
|
||||
port 6379
|
||||
```
|
||||
|
||||
3. Restart Redis after any changes:
|
||||
```bash
|
||||
sudo systemctl restart redis-server
|
||||
```
|
||||
3. Restart Redis after any changes:
|
||||
|
||||
```bash
|
||||
sudo systemctl restart redis-server
|
||||
```
|
||||
|
||||
#### macOS
|
||||
|
||||
@ -79,7 +80,7 @@ If Redis is not available, the application will automatically fall back to using
|
||||
|
||||
Set these environment variables in your `.env` file or deployment environment:
|
||||
|
||||
```env
|
||||
```sh
|
||||
# Redis Configuration
|
||||
REDIS_HOST=localhost
|
||||
REDIS_PORT=6379
|
||||
@ -126,28 +127,29 @@ docker-compose up -d
|
||||
|
||||
Development requires multiple terminal windows:
|
||||
|
||||
1. **Django Development Server**:
|
||||
1. **Django Development Server**:
|
||||
|
||||
```bash
|
||||
make run
|
||||
```
|
||||
```bash
|
||||
make run
|
||||
```
|
||||
|
||||
2. **Redis Server** (if needed):
|
||||
2. **Redis Server** (if needed):
|
||||
|
||||
```bash
|
||||
make run-redis
|
||||
```
|
||||
```bash
|
||||
make run-redis
|
||||
```
|
||||
|
||||
3. **Celery Worker**:
|
||||
3. **Celery Worker**:
|
||||
|
||||
```bash
|
||||
make celery
|
||||
```
|
||||
```bash
|
||||
make celery
|
||||
```
|
||||
|
||||
4. **Celery Beat** (for scheduled tasks):
|
||||
```bash
|
||||
make celery-beat
|
||||
```
|
||||
4. **Celery Beat** (for scheduled tasks):
|
||||
|
||||
```bash
|
||||
make celery-beat
|
||||
```
|
||||
|
||||
Or use the combined command:
|
||||
|
||||
@ -161,12 +163,12 @@ make run-all
|
||||
|
||||
If you see connection errors:
|
||||
|
||||
1. Check that Redis is running: `redis-cli ping` should return `PONG`
|
||||
2. Verify firewall settings are not blocking port 6379
|
||||
3. Check Redis binding in `/etc/redis/redis.conf` (should be `bind 127.0.0.1` for local dev)
|
||||
1. Check that Redis is running: `redis-cli ping` should return `PONG`
|
||||
2. Verify firewall settings are not blocking port 6379
|
||||
3. Check Redis binding in `/etc/redis/redis.conf` (should be `bind 127.0.0.1` for local dev)
|
||||
|
||||
### Celery Workers Not Processing Tasks
|
||||
|
||||
1. Ensure the worker is running with the correct app name: `celery -A dashboard_project worker`
|
||||
2. Check the Celery logs for errors
|
||||
3. Verify broker URL settings in both code and environment variables
|
||||
1. Ensure the worker is running with the correct app name: `celery -A dashboard_project worker`
|
||||
2. Check the Celery logs for errors
|
||||
3. Verify broker URL settings in both code and environment variables
|
||||
|
||||
@ -25,39 +25,40 @@ python manage.py test_redis
|
||||
|
||||
If this fails, check the following:
|
||||
|
||||
1. Redis might not be running. Start it with:
|
||||
1. Redis might not be running. Start it with:
|
||||
|
||||
```bash
|
||||
sudo systemctl start redis-server
|
||||
```
|
||||
```bash
|
||||
sudo systemctl start redis-server
|
||||
```
|
||||
|
||||
2. Connection credentials may be incorrect. Check your environment variables:
|
||||
2. Connection credentials may be incorrect. Check your environment variables:
|
||||
|
||||
```bash
|
||||
echo $REDIS_URL
|
||||
echo $CELERY_BROKER_URL
|
||||
echo $CELERY_RESULT_BACKEND
|
||||
```
|
||||
```bash
|
||||
echo $REDIS_URL
|
||||
echo $CELERY_BROKER_URL
|
||||
echo $CELERY_RESULT_BACKEND
|
||||
```
|
||||
|
||||
3. Redis might be binding only to a specific interface. Check `/etc/redis/redis.conf`:
|
||||
3. Redis might be binding only to a specific interface. Check `/etc/redis/redis.conf`:
|
||||
|
||||
```bash
|
||||
grep "bind" /etc/redis/redis.conf
|
||||
```
|
||||
```bash
|
||||
grep "bind" /etc/redis/redis.conf
|
||||
```
|
||||
|
||||
4. Firewall rules might be blocking Redis. If you're connecting remotely:
|
||||
```bash
|
||||
sudo ufw status # Check if firewall is enabled
|
||||
sudo ufw allow 6379/tcp # Allow Redis port if needed
|
||||
```
|
||||
4. Firewall rules might be blocking Redis. If you're connecting remotely:
|
||||
|
||||
```bash
|
||||
sudo ufw status # Check if firewall is enabled
|
||||
sudo ufw allow 6379/tcp # Allow Redis port if needed
|
||||
```
|
||||
|
||||
## Fixing CSV Data Processing Issues
|
||||
|
||||
If you see the error `zip() argument 2 is shorter than argument 1`, it means the data format doesn't match the expected headers. We've implemented a fix that:
|
||||
|
||||
1. Pads shorter rows with empty strings
|
||||
2. Uses more flexible date format parsing
|
||||
3. Provides better error handling
|
||||
1. Pads shorter rows with empty strings
|
||||
2. Uses more flexible date format parsing
|
||||
3. Provides better error handling
|
||||
|
||||
After these changes, your data should be processed correctly regardless of format variations.
|
||||
|
||||
@ -77,15 +78,18 @@ python manage.py test_celery
|
||||
|
||||
If the task isn't completing, check:
|
||||
|
||||
1. Look for errors in the Celery worker terminal
|
||||
2. Verify broker URL settings match in both terminals:
|
||||
```bash
|
||||
echo $CELERY_BROKER_URL
|
||||
```
|
||||
3. Check if Redis is accessible from both terminals:
|
||||
```bash
|
||||
redis-cli ping
|
||||
```
|
||||
1. Look for errors in the Celery worker terminal
|
||||
2. Verify broker URL settings match in both terminals:
|
||||
|
||||
```bash
|
||||
echo $CELERY_BROKER_URL
|
||||
```
|
||||
|
||||
3. Check if Redis is accessible from both terminals:
|
||||
|
||||
```bash
|
||||
redis-cli ping
|
||||
```
|
||||
|
||||
## Checking Scheduled Tasks
|
||||
|
||||
@ -99,36 +103,36 @@ python manage.py celery inspect scheduled
|
||||
|
||||
Common issues with scheduled tasks:
|
||||
|
||||
1. **Celery Beat not running**: Start it with:
|
||||
1. **Celery Beat not running**: Start it with:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
celery -A dashboard_project beat
|
||||
```
|
||||
```bash
|
||||
cd dashboard_project
|
||||
celery -A dashboard_project beat
|
||||
```
|
||||
|
||||
2. **Task registered but not running**: Check worker logs for any errors
|
||||
2. **Task registered but not running**: Check worker logs for any errors
|
||||
|
||||
3. **Wrong schedule**: Check the interval in settings.py and CELERY_BEAT_SCHEDULE
|
||||
3. **Wrong schedule**: Check the interval in settings.py and CELERY_BEAT_SCHEDULE
|
||||
|
||||
## Data Source Configuration
|
||||
|
||||
If data sources aren't being processed correctly:
|
||||
|
||||
1. Verify active data sources exist:
|
||||
1. Verify active data sources exist:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py shell -c "from data_integration.models import ExternalDataSource; print(ExternalDataSource.objects.filter(is_active=True).count())"
|
||||
```
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py shell -c "from data_integration.models import ExternalDataSource; print(ExternalDataSource.objects.filter(is_active=True).count())"
|
||||
```
|
||||
|
||||
2. Create a default data source if needed:
|
||||
2. Create a default data source if needed:
|
||||
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py create_default_datasource
|
||||
```
|
||||
```bash
|
||||
cd dashboard_project
|
||||
python manage.py create_default_datasource
|
||||
```
|
||||
|
||||
3. Check source URLs and credentials in the admin interface or environment variables.
|
||||
3. Check source URLs and credentials in the admin interface or environment variables.
|
||||
|
||||
## Manually Triggering Data Refresh
|
||||
|
||||
|
||||
Reference in New Issue
Block a user