Real-Time Ride Dispatch System with Operational AI
A production-style distributed backend built to implement every concept tested in the "Design Uber/Lyft" system design interview — not as a clone, but as a working, deployable system with a live interactive demo.
RideFlow is a backend engineering project, not a product. The goal is to build and demonstrate every component that appears in the "Design Uber" system design interview — and explain why each decision was made.
This is not:
- An Uber clone with user management or authentication
- A chatbot or LLM wrapper
- A tutorial project with simplified patterns
This is:
- A working geospatial dispatch engine with SELECT FOR UPDATE SKIP LOCKED
- An 8-state ride lifecycle enforced by a proper finite state machine
- A real-time WebSocket system backed by Redis Pub/Sub fan-out
- An AI layer (DBSCAN clustering) that detects demand hotspots live and recommends driver repositioning
- A system you can run, explain, and defend in an interview
| Page | URL | Purpose |
|---|---|---|
| Playground | /playground |
4-step simulation with live AI analysis |
| Rider | /rider |
Book a ride, watch dispatch in real time |
| Driver | /driver |
Receive requests, complete trips |
| Admin | /admin |
Fleet ops view with AI Operations panel |
| Architecture | /architecture |
Full system design walkthrough |
RIDER APP DRIVER APP ADMIN DASHBOARD PLAYGROUND
(book rides) (receive rides) (ops + AI alerts) (simulation)
| | | |
+------------------+---------------------+--------------------+
|
API GATEWAY (FastAPI)
|
+---------------------+---------------------+
| | |
DISPATCH SERVICE WEBSOCKET SERVICE AI SERVICE
(Celery workers) (Redis Pub/Sub) (DBSCAN loop)
| | |
+---------------------+---------------------+
|
DATA LAYER
+--------------+------------------+
| |
PostgreSQL + PostGIS Redis
- rides, drivers - driver locations (TTL)
- ride state + event log - pub/sub channels
- dispatch logs - demo driver ID set
- demand_predictions
| Concept | Implementation | Problem it solves |
|---|---|---|
| Geospatial driver lookup | PostGIS ST_DWithin + GiST spatial index |
"How do you find the nearest driver?" |
| Race condition prevention | SELECT FOR UPDATE SKIP LOCKED |
"How do you prevent double-assignment?" |
| Real-time fan-out | WebSocket + Redis Pub/Sub | "How do updates reach clients without polling?" |
| Ride lifecycle | 8-state FSM with enforced transitions | "How do you manage ride state?" |
| Async dispatch | Celery workers + Redis broker | "How do you keep the API fast under load?" |
| Driver liveness | Redis HASH with 30s TTL | "How does a driver go offline automatically?" |
| Demand AI | DBSCAN clustering on pickup coordinates | "How does the system detect demand hotspots as they form?" |
| Driver reposition | PostGIS ST_Distance nearest-driver query |
"How does the system suggest driver repositioning?" |
| Fault recovery | Celery Beat sweeper — cancels rides stuck > 5 min | "What happens if a worker crashes mid-dispatch?" |
| Component | Technology |
|---|---|
| API framework | FastAPI + AsyncIO |
| Task queue | Celery + Redis broker |
| Primary database | PostgreSQL 16 + PostGIS |
| Cache / location store | Redis |
| ORM + migrations | SQLAlchemy (async) + Alembic |
| AI / clustering | scikit-learn (DBSCAN) + NumPy |
| Component | Technology |
|---|---|
| Framework | React 18 + TypeScript |
| Build tool | Vite |
| Maps | Leaflet.js + react-leaflet (OpenStreetMap, no API key) |
| Real-time | Native WebSocket API |
| Styling | Custom CSS with CSS variables (light/dark themes) |
| Component | Technology |
|---|---|
| Containerization | Docker + Docker Compose |
| Image registry | AWS ECR |
| Cloud runtime | AWS ECS Fargate |
| HTTPS layer | AWS CloudFront |
| Load balancing | AWS ALB |
| Primary database | AWS RDS PostgreSQL 18.3 |
| Cache + broker | AWS ElastiCache Redis |
| Logs | AWS CloudWatch Logs |
| Frontend deployment | Vercel |
- Docker Desktop (version 24+)
- Docker Compose (included with Docker Desktop)
- Node.js 20+ (only if you want to run frontend locally)
No local Python installation required.
git clone https://github.com/CodeTirtho97/RideFlow.git
cd rideflow-ai
docker-compose up --buildBackend is available at http://localhost:8000 (Swagger: http://localhost:8000/docs).
Frontend options:
- Use the deployed app:
https://rideflow-v1.vercel.app - Or run locally:
cd frontend
npm install
npm run devOpen http://localhost:3000 for local frontend.
Services started:
| Service | Port | Description |
|---|---|---|
| Backend | 8000 | FastAPI (API + WebSocket) |
| Celery worker | — | Dispatch task worker |
| Celery Beat | — | Periodic task scheduler (stuck-ride sweeper, every 60 s) |
| PostgreSQL | 5432 | Primary database |
| Redis | 6379 | Cache + broker + pub/sub |
# Backend
DATABASE_URL=postgresql+asyncpg://rideflow:rideflow_dev@localhost:5432/rideflow
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/1
DEMO_MODE=true
# Optional (for deployed frontend access)
CORS_ORIGINS=https://rideflow-v1.vercel.appNavigate to http://localhost:3000/playground.
- Select a preset — Light, Moderate, or Dense traffic scenario
- Step 1: Seed Drivers — places drivers at real Bengaluru GPS coordinates
- Step 2: Start Movement — begins random-walk location heartbeats every 2s
- Step 3: Fire Requests — fires all ride requests simultaneously, triggering parallel Celery dispatch
- Step 4: Start AI Loop — runs DBSCAN every 8s on unmatched rides, publishes hotspot alerts via Redis → WebSocket
Open /admin in another tab to see the AI Operations panel update in real time.
| Preset | Drivers | Requests | Radius | What it shows |
|---|---|---|---|---|
| Light Traffic | 10 | 8 | 10 km | Happy path — clean dispatch, everyone matched |
| Moderate Traffic | 35 | 35 | 7 km | Balanced — retries, radius expansion 3→5 km |
| Dense — Peak Hour | 70 | 100 | 5 km | Saturation — surge pricing, cancellations, 2–4 AI hotspot clusters |
The Dense preset simulates Whitefield, Bengaluru at peak hour — 100 ride requests in a 5 km zone with 70 drivers. DBSCAN detects 2–4 demand clusters and recommends which idle drivers to reposition.
| Factor | Value |
|---|---|
| Time compression | 1 real second ≈ 1 minute travel time |
| Location update interval | Every 2 seconds |
| Driver movement step | ±0.0008° (~88m per update) |
| Search radius | 3 km → 5 km on expansion |
| AI loop interval | Every 8 seconds |
| DBSCAN epsilon | 1.5 km cluster radius |
| DBSCAN min_samples | 3 requests to form a cluster |
rideflow-ai/
├── backend/
│ ├── app/
│ │ ├── api/
│ │ │ ├── rides.py # Ride CRUD + state transitions
│ │ │ ├── drivers.py # Driver registration + location
│ │ │ ├── metrics.py # System-wide metrics endpoint
│ │ │ ├── websocket.py # WebSocket endpoints (/ws/admin, /ws/ride, /ws/driver)
│ │ │ ├── demo.py # Demo simulation endpoints (DEMO_MODE only)
│ │ │ └── ai.py # AI prediction loop endpoint (DEMO_MODE only)
│ │ ├── services/
│ │ │ ├── ai/
│ │ │ │ └── demand_prediction.py # DBSCAN clustering + hotspot detection
│ │ │ ├── dispatch/
│ │ │ │ ├── surge.py # Surge multiplier calculation
│ │ │ │ └── retry.py # Retry policy + radius expansion
│ │ │ ├── ride/
│ │ │ │ └── state_machine.py # 8-state FSM with enforced transitions
│ │ │ ├── driver/
│ │ │ │ ├── location.py # Redis HASH location writes
│ │ │ │ └── status.py # TTL-based availability
│ │ │ └── websocket/
│ │ │ ├── manager.py # WebSocket connection registry
│ │ │ └── pubsub.py # Redis Pub/Sub subscriber + router
│ │ ├── workers/
│ │ │ ├── dispatch_task.py # Celery dispatch task (find, lock, assign)
│ │ │ └── tasks.py # health_check + sweep_stuck_rides (Beat, every 60s)
│ │ ├── models/
│ │ │ ├── ride.py # Ride, RideEvent, DispatchLog, DemandPrediction
│ │ │ └── driver.py # Driver model
│ │ └── core/
│ │ ├── config.py # Settings from env vars
│ │ ├── database.py # Async PostgreSQL + session factory
│ │ └── redis_client.py # Redis connection pool
│ ├── alembic/ # Database migrations
│ ├── tests/
│ │ ├── conftest.py # DB/Redis fixtures + test helpers
│ │ ├── test_sweeper.py # Stuck-ride watchdog tests (7 cases)
│ │ └── test_dispatch_concurrency.py # SELECT FOR UPDATE SKIP LOCKED under load
│ ├── pytest.ini
│ ├── requirements.txt
├── infrastructure/
│ └── Dockerfile.backend
│
├── frontend/
│ └── src/
│ ├── pages/
│ │ ├── LandingPage.tsx
│ │ ├── DemoPage.tsx # Playground — 4-step simulation
│ │ ├── RiderDashboard.tsx
│ │ ├── DriverDashboard.tsx
│ │ ├── AdminDashboard.tsx # Fleet ops + AI Operations panel
│ │ └── ArchitecturePage.tsx
│ ├── components/
│ │ ├── DispatchMap.tsx # Leaflet map (drivers, trips, hotspot circles)
│ │ ├── EventLog.tsx # Live dispatch event feed
│ │ ├── AppNav.tsx
│ │ └── Toast.tsx
│ ├── hooks/
│ │ ├── useWebSocket.ts # WS connection + reconnect + message routing
│ │ └── useTheme.ts # Light/dark mode toggle
│ └── api/
│ └── client.ts # Axios API client + typed interfaces
│
└── docker-compose.yml
Interactive docs at http://localhost:8000/docs (Swagger UI, auto-generated by FastAPI).
# Rides
POST /api/v1/rides Create ride request
GET /api/v1/rides/{id} Get ride + current state
PATCH /api/v1/rides/{id}/cancel Cancel a ride
PATCH /api/v1/rides/{id}/decline_offer Driver declines offer (returns ride to searching_driver)
PATCH /api/v1/rides/{id}/arrive Driver arriving
PATCH /api/v1/rides/{id}/start Trip started
PATCH /api/v1/rides/{id}/complete Trip completed
# Drivers
POST /api/v1/drivers Register driver
PATCH /api/v1/drivers/{id}/location Update GPS location
PATCH /api/v1/drivers/{id}/status Toggle availability
# Metrics
GET /api/v1/metrics System-wide counts by status
# WebSocket
WS /ws/ride/{ride_id} Rider real-time updates
WS /ws/driver/{driver_id} Driver real-time updates
WS /ws/admin Admin + AI alerts stream
# Demo (DEMO_MODE=true only)
POST /api/demo/seed Seed drivers at Bengaluru coordinates
POST /api/demo/move Start location movement loop
POST /api/demo/requests Fire bulk ride requests
POST /api/demo/ai/run Start DBSCAN hotspot detection loop
POST /api/demo/ai/stop Stop AI loop
POST /api/demo/reset Clear all demo data
GET /api/demo/presets Available simulation presets
The AI service runs as a background loop triggered from the Playground demo (Step 4).
How it works:
- Queries all unmatched rides (
requested+searching_driverstatus) from PostgreSQL - Extracts pickup coordinates
(lat, lng) - Runs DBSCAN with
eps=1.5km,min_samples=3to find geographic clusters - For each cluster: calculates demand, idle driver count (
ST_DWithin), shortage, confidence - Queries 3 nearest idle drivers per hotspot using
ST_Distance - Computes surge multiplier, deploy recommendation, and ETA to resolve
- Publishes all hotspots as one batch to
ai:alertsRedis channel - WebSocket fans out to Admin Dashboard and Playground simultaneously
- Repeats every 8 seconds; stops when no unmatched rides remain
What surfaces in the UI:
- Red gradient circles on the Playground map (one per hotspot cluster)
- Orange blinking rings on the 3 nearest idle driver markers (reposition targets)
- AI Hotspot Analysis card: zone status, shortage, fare impact, deploy count, nearest drivers
- Admin AI Operations card: fleet-level summary + per-zone recommendations
# Build and start all services
docker-compose up --build
# Start without rebuilding
docker-compose up
# Run in background
docker-compose up -d
# View logs
docker-compose logs -f backend
docker-compose logs -f celery-worker
# Stop
docker-compose down
# Full wipe (removes volumes / database)
docker-compose down -v
# Rebuild single service
docker-compose up --build backendThe local Docker Compose setup maps 1-to-1 to AWS managed services. The same container images run in both environments — the only difference is the endpoints in environment variables.
| Service | Why it's used |
|---|---|
| Amazon ECR | Private Docker image registry. Stores rideflow-backend:latest; ECS pulls from here on every task launch and deployment. |
| Amazon ECS Fargate | Serverless container runtime for API, Worker, and Beat. No EC2 instances to patch — pay per task-second; each service scales independently. |
| Application Load Balancer (ALB) | Routes HTTP traffic to healthy API tasks. Integrates with ECS service health checks and auto-scaling target tracking. |
| Amazon CloudFront | HTTPS CDN layer in front of the HTTP ALB. Provides a *.cloudfront.net TLS endpoint so the HTTPS Vercel frontend can call the backend without mixed-content errors. |
| Amazon RDS PostgreSQL 18.3 | Managed PostgreSQL with automated backups and patching. PostGIS extension enables ST_DWithin geospatial driver lookup and ST_Distance repositioning queries. |
| Amazon ElastiCache Redis | Managed Redis used for three separate concerns: driver location TTL hashes (db 0), Celery task broker (db 1), and Redis Pub/Sub fan-out to WebSocket clients (db 0). |
| Amazon CloudWatch Logs | Central log sink for all containers. All ECS task stdout/stderr streams to /ecs/rideflow-api with per-service prefixes (ecs/, worker/, beat/). |
Internet (HTTPS)
│
▼
CloudFront ←── free *.cloudfront.net TLS; sits in front of the HTTP ALB
│
▼
ALB ←── routes to healthy API tasks; health check on /api/health
│
▼
ECS Fargate — API service (FastAPI) ←── uvicorn, port 8000
│
├── ECS Fargate — Celery Worker ←── dispatch tasks, scales independently
│
└── ECS Fargate — Celery Beat ←── desiredCount=1 (singleton scheduler)
│
▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ RDS PostgreSQL + PostGIS│ │ ElastiCache Redis │
│ (private subnet) │ │ db0: locations + pubsub │
└──────────────────────────┘ │ db1: Celery broker │
└──────────────────────────┘
| Docker Compose service | AWS service | Notes |
|---|---|---|
backend |
ECS Fargate (rideflow-api task def) |
ALB target group on port 8000 |
celery-worker |
ECS Fargate (rideflow-worker task def) |
Scale independently from the API |
celery-beat |
ECS Fargate (rideflow-beat task def) |
desiredCount=1 — must be singleton |
postgres |
RDS PostgreSQL 18.3 | PostGIS enabled; runs in private VPC |
redis |
ElastiCache Redis (cluster mode off) | Two logical databases: 0 and 1 |
Replace the environment: block in Docker Compose with these for ECS task definitions or a Parameter Store secret:
DATABASE_URL=postgresql+asyncpg://<user>:<pass>@<rds-endpoint>:5432/rideflow
REDIS_URL=redis://<elasticache-endpoint>:6379/0
CELERY_BROKER_URL=redis://<elasticache-endpoint>:6379/1
DEMO_MODE=false
CORS_ORIGINS=https://<your-frontend-url>Celery Beat must run as a single instance.
If two Beat processes run simultaneously, every scheduled task (including the stuck-ride sweeper) fires twice, causing double-cancellations. Set desiredCount=1 on the Beat ECS service. ECS restarts it automatically on crash — the only cost is missed sweep cycles during the brief restart window, which is acceptable.
RDS requires PostGIS to be enabled manually. After provisioning the RDS instance, connect once and run:
CREATE EXTENSION IF NOT EXISTS postgis;All subsequent Alembic migrations assume PostGIS is available.
Run migrations as a one-off ECS task before starting the API.
aws ecs run-task \
--cluster rideflow-ai \
--task-definition rideflow-api \
--overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}'Security group rules (minimum):
| Source | Destination | Port | Purpose |
|---|---|---|---|
| ALB | ECS API tasks | 8000 | HTTP traffic |
| ECS tasks | RDS | 5432 | Database |
| ECS tasks | ElastiCache | 6379 | Redis |
| Internet | ALB | 443 | HTTPS |
RDS and ElastiCache must not have public internet access.
# 1. Build and push image to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
docker build -t rideflow-backend -f infrastructure/Dockerfile.backend .
docker tag rideflow-backend:latest <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
# 2. Create ECS cluster
aws ecs create-cluster --cluster-name rideflow
# 3. Register task definitions (API, Worker, Beat) — same image, different commands:
# API: uvicorn app.main:app --host 0.0.0.0 --port 8000
# Worker: celery -A app.core.celery_app worker --loglevel=info --concurrency=4
# Beat: celery -A app.core.celery_app beat --loglevel=info
# 4. Create ECS services
aws ecs create-service --cluster rideflow-ai --service-name rideflow-api --desired-count 2 ...
aws ecs create-service --cluster rideflow-ai --service-name rideflow-worker --desired-count 2 ...
aws ecs create-service --cluster rideflow-ai --service-name rideflow-beat --desired-count 1 ...
# 5. Run DB migrations
aws ecs run-task --cluster rideflow-ai --task-definition rideflow-api \
--overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}'Tests require the database and Redis to be running:
docker-compose up -d postgres redisThen run inside the backend container (preferred — avoids local Python setup):
docker-compose exec backend pytest tests/ -vOr locally if you have a Python environment with dependencies installed:
cd backend
DATABASE_URL=postgresql+asyncpg://rideflow:rideflow_dev@localhost:5432/rideflow \
REDIS_URL=redis://localhost:6379/0 \
CELERY_BROKER_URL=redis://localhost:6379/1 \
pytest tests/ -v| File | What it tests |
|---|---|
tests/test_sweeper.py |
Stuck-ride watchdog: ignores fresh/terminal/partially-assigned rides; cancels orphaned searching_driver; re-enqueues stuck requested; correct counts |
tests/test_dispatch_concurrency.py |
SELECT FOR UPDATE SKIP LOCKED under load: 5 simultaneous dispatches against 1 driver → exactly 1 assignment; 3 vs 3 → no double-assignment |
# Connect to PostgreSQL
docker-compose exec postgres psql -U rideflow -d rideflow
# Useful queries
SELECT status, COUNT(*) FROM rides GROUP BY status;
SELECT id, name, status, ST_AsText(location::geometry) FROM drivers;
SELECT * FROM dispatch_logs ORDER BY created_at DESC LIMIT 10;
SELECT event_type, created_at FROM ride_events WHERE ride_id = '<id>';
# Run migrations
docker-compose exec backend alembic upgrade headThese are deliberate tradeoffs for a demo/portfolio deployment, not oversights.
| Area | Current Design | Why | Production Upgrade |
|---|---|---|---|
| Single-node PostgreSQL | No HA, no read replicas | Sufficient for demo; HA adds operational complexity out of scope | RDS Multi-AZ + read replicas for reporting queries |
| Surge formula | Heuristic demand/supply ratio per geohash | No historical data to calibrate; transparent and tweakable | ML model trained on time-of-day, weather, and event signals |
| Redis Pub/Sub | Fire-and-forget, no persistence | Simplest fanout; events that fire while no subscriber is connected are lost | Redis Streams for durable, replayable event delivery |
| Single WebSocket server | All connections on one process | Fine for demo; WS state cannot be shared across processes | Dedicated gateway tier with shared session store (Redis) |
| No authentication | Open API endpoints | Demo purposes; auth would add 30+ endpoints and obscure dispatch patterns | JWT or session-based auth on all write endpoints |
| DBSCAN (detection, not prediction) | Reacts to current demand density | No historical data for forecasting model | Feed time-of-day, weather, and events into a supervised regression model |
These gaps were identified and fixed during development — not left as upgrade paths:
- Worker crash recovery — Celery Beat sweeper cancels rides stuck in
searching_driver/driver_offeredbeyond 5 minutes and frees any partially-claimed driver - Ride creation idempotency —
Idempotency-Keyheader returns existing ride on duplicate request without creating duplicates - WebSocket missed-event replay —
?last_event_id=<uuid>on reconnect replays events from the durableride_eventslog before resuming the live stream - Driver decline path —
driver_offeredFSM state with per-offer Redis key;PATCH /rides/{id}/decline_offersignals the dispatch worker to try the next candidate - AI loop isolation — DBSCAN loop moved from
asyncio.create_task(tied to one uvicorn worker) to Celery Beat with a Redis flag; survives API restarts and is safe with multiple workers
Use this flow whenever you push new backend code and need to update ECS.
# 1. Rebuild and push updated image to ECR
aws ecr get-login-password --region <region> | \
docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
docker build -t rideflow-backend -f infrastructure/Dockerfile.backend .
docker tag rideflow-backend:latest \
<account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
# 2. Run any new Alembic migrations (e.g. 002_ride_idempotency_key)
aws ecs run-task \
--cluster rideflow-ai \
--task-definition rideflow-api \
--overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}' \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[<subnet-id>],securityGroups=[<sg-id>],assignPublicIp=ENABLED}"
# 3. Force a new deployment on every ECS service (picks up the new image)
aws ecs update-service --cluster rideflow-ai --service rideflow-api --force-new-deployment
aws ecs update-service --cluster rideflow-ai --service rideflow-worker --force-new-deployment
aws ecs update-service --cluster rideflow-ai --service rideflow-beat --force-new-deployment
# 4. Watch rollout — wait for runningCount to reach desiredCount on each service
aws ecs describe-services --cluster rideflow-ai \
--services rideflow-api rideflow-worker rideflow-beat \
--query 'services[*].{name:serviceName,desired:desiredCount,running:runningCount}'MIT License. See LICENSE.