RideFlow

Real-Time Ride Dispatch System with Operational AI

A production-style distributed backend built to implement every concept tested in the "Design Uber/Lyft" system design interview — not as a clone, but as a working, deployable system with a live interactive demo.

What This Project Is

RideFlow is a backend engineering project, not a product. The goal is to build and demonstrate every component that appears in the "Design Uber" system design interview — and explain why each decision was made.

This is not:

An Uber clone with user management or authentication
A chatbot or LLM wrapper
A tutorial project with simplified patterns

This is:

A working geospatial dispatch engine with SELECT FOR UPDATE SKIP LOCKED
An 8-state ride lifecycle enforced by a proper finite state machine
A real-time WebSocket system backed by Redis Pub/Sub fan-out
An AI layer (DBSCAN clustering) that detects demand hotspots live and recommends driver repositioning
A system you can run, explain, and defend in an interview

Live Demo

rideflow-v1.vercel.app

Page	URL	Purpose
Playground	`/playground`	4-step simulation with live AI analysis
Rider	`/rider`	Book a ride, watch dispatch in real time
Driver	`/driver`	Receive requests, complete trips
Admin	`/admin`	Fleet ops view with AI Operations panel
Architecture	`/architecture`	Full system design walkthrough

System Overview

RIDER APP          DRIVER APP         ADMIN DASHBOARD        PLAYGROUND
(book rides)       (receive rides)    (ops + AI alerts)      (simulation)
      |                  |                     |                    |
      +------------------+---------------------+--------------------+
                                    |
                         API GATEWAY (FastAPI)
                                    |
              +---------------------+---------------------+
              |                     |                     |
      DISPATCH SERVICE       WEBSOCKET SERVICE       AI SERVICE
      (Celery workers)       (Redis Pub/Sub)         (DBSCAN loop)
              |                     |                     |
              +---------------------+---------------------+
                                    |
                               DATA LAYER
                    +--------------+------------------+
                    |                                 |
          PostgreSQL + PostGIS                     Redis
          - rides, drivers                         - driver locations (TTL)
          - ride state + event log                 - pub/sub channels
          - dispatch logs                          - demo driver ID set
          - demand_predictions

Key System Design Concepts

Concept	Implementation	Problem it solves
Geospatial driver lookup	PostGIS `ST_DWithin` + GiST spatial index	"How do you find the nearest driver?"
Race condition prevention	`SELECT FOR UPDATE SKIP LOCKED`	"How do you prevent double-assignment?"
Real-time fan-out	WebSocket + Redis Pub/Sub	"How do updates reach clients without polling?"
Ride lifecycle	8-state FSM with enforced transitions	"How do you manage ride state?"
Async dispatch	Celery workers + Redis broker	"How do you keep the API fast under load?"
Driver liveness	Redis HASH with 30s TTL	"How does a driver go offline automatically?"
Demand AI	DBSCAN clustering on pickup coordinates	"How does the system detect demand hotspots as they form?"
Driver reposition	PostGIS `ST_Distance` nearest-driver query	"How does the system suggest driver repositioning?"
Fault recovery	Celery Beat sweeper — cancels rides stuck > 5 min	"What happens if a worker crashes mid-dispatch?"

Tech Stack

Backend

Component	Technology
API framework	FastAPI + AsyncIO
Task queue	Celery + Redis broker
Primary database	PostgreSQL 16 + PostGIS
Cache / location store	Redis
ORM + migrations	SQLAlchemy (async) + Alembic
AI / clustering	scikit-learn (DBSCAN) + NumPy

Frontend

Component	Technology
Framework	React 18 + TypeScript
Build tool	Vite
Maps	Leaflet.js + react-leaflet (OpenStreetMap, no API key)
Real-time	Native WebSocket API
Styling	Custom CSS with CSS variables (light/dark themes)

Infrastructure

Component	Technology
Containerization	Docker + Docker Compose
Image registry	AWS ECR
Cloud runtime	AWS ECS Fargate
HTTPS layer	AWS CloudFront
Load balancing	AWS ALB
Primary database	AWS RDS PostgreSQL 18.3
Cache + broker	AWS ElastiCache Redis
Logs	AWS CloudWatch Logs
Frontend deployment	Vercel

Getting Started

Prerequisites

Docker Desktop (version 24+)
Docker Compose (included with Docker Desktop)
Node.js 20+ (only if you want to run frontend locally)

No local Python installation required.

Run Locally

git clone https://github.com/CodeTirtho97/RideFlow.git
cd rideflow-ai

docker-compose up --build

Backend is available at http://localhost:8000 (Swagger: http://localhost:8000/docs).

Frontend options:

Use the deployed app: https://rideflow-v1.vercel.app
Or run locally:

cd frontend
npm install
npm run dev

Open http://localhost:3000 for local frontend.

Services started:

Service	Port	Description
Backend	8000	FastAPI (API + WebSocket)
Celery worker	—	Dispatch task worker
Celery Beat	—	Periodic task scheduler (stuck-ride sweeper, every 60 s)
PostgreSQL	5432	Primary database
Redis	6379	Cache + broker + pub/sub

Environment Variables

# Backend
DATABASE_URL=postgresql+asyncpg://rideflow:rideflow_dev@localhost:5432/rideflow
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/1
DEMO_MODE=true

# Optional (for deployed frontend access)
CORS_ORIGINS=https://rideflow-v1.vercel.app

Running the Playground Demo

Navigate to http://localhost:3000/playground.

Select a preset — Light, Moderate, or Dense traffic scenario
Step 1: Seed Drivers — places drivers at real Bengaluru GPS coordinates
Step 2: Start Movement — begins random-walk location heartbeats every 2s
Step 3: Fire Requests — fires all ride requests simultaneously, triggering parallel Celery dispatch
Step 4: Start AI Loop — runs DBSCAN every 8s on unmatched rides, publishes hotspot alerts via Redis → WebSocket

Open /admin in another tab to see the AI Operations panel update in real time.

Simulation Presets

Preset	Drivers	Requests	Radius	What it shows
Light Traffic	10	8	10 km	Happy path — clean dispatch, everyone matched
Moderate Traffic	35	35	7 km	Balanced — retries, radius expansion 3→5 km
Dense — Peak Hour	70	100	5 km	Saturation — surge pricing, cancellations, 2–4 AI hotspot clusters

The Dense preset simulates Whitefield, Bengaluru at peak hour — 100 ride requests in a 5 km zone with 70 drivers. DBSCAN detects 2–4 demand clusters and recommends which idle drivers to reposition.

Demo Scale Reference

Factor	Value
Time compression	1 real second ≈ 1 minute travel time
Location update interval	Every 2 seconds
Driver movement step	±0.0008° (~88m per update)
Search radius	3 km → 5 km on expansion
AI loop interval	Every 8 seconds
DBSCAN epsilon	1.5 km cluster radius
DBSCAN min_samples	3 requests to form a cluster

Project Structure

rideflow-ai/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   │   ├── rides.py           # Ride CRUD + state transitions
│   │   │   ├── drivers.py         # Driver registration + location
│   │   │   ├── metrics.py         # System-wide metrics endpoint
│   │   │   ├── websocket.py       # WebSocket endpoints (/ws/admin, /ws/ride, /ws/driver)
│   │   │   ├── demo.py            # Demo simulation endpoints (DEMO_MODE only)
│   │   │   └── ai.py              # AI prediction loop endpoint (DEMO_MODE only)
│   │   ├── services/
│   │   │   ├── ai/
│   │   │   │   └── demand_prediction.py   # DBSCAN clustering + hotspot detection
│   │   │   ├── dispatch/
│   │   │   │   ├── surge.py               # Surge multiplier calculation
│   │   │   │   └── retry.py               # Retry policy + radius expansion
│   │   │   ├── ride/
│   │   │   │   └── state_machine.py       # 8-state FSM with enforced transitions
│   │   │   ├── driver/
│   │   │   │   ├── location.py            # Redis HASH location writes
│   │   │   │   └── status.py              # TTL-based availability
│   │   │   └── websocket/
│   │   │       ├── manager.py             # WebSocket connection registry
│   │   │       └── pubsub.py              # Redis Pub/Sub subscriber + router
│   │   ├── workers/
│   │   │   ├── dispatch_task.py           # Celery dispatch task (find, lock, assign)
│   │   │   └── tasks.py                   # health_check + sweep_stuck_rides (Beat, every 60s)
│   │   ├── models/
│   │   │   ├── ride.py                    # Ride, RideEvent, DispatchLog, DemandPrediction
│   │   │   └── driver.py                  # Driver model
│   │   └── core/
│   │       ├── config.py                  # Settings from env vars
│   │       ├── database.py                # Async PostgreSQL + session factory
│   │       └── redis_client.py            # Redis connection pool
│   ├── alembic/                           # Database migrations
│   ├── tests/
│   │   ├── conftest.py                    # DB/Redis fixtures + test helpers
│   │   ├── test_sweeper.py                # Stuck-ride watchdog tests (7 cases)
│   │   └── test_dispatch_concurrency.py   # SELECT FOR UPDATE SKIP LOCKED under load
│   ├── pytest.ini
│   ├── requirements.txt
├── infrastructure/
│   └── Dockerfile.backend
│
├── frontend/
│   └── src/
│       ├── pages/
│       │   ├── LandingPage.tsx
│       │   ├── DemoPage.tsx               # Playground — 4-step simulation
│       │   ├── RiderDashboard.tsx
│       │   ├── DriverDashboard.tsx
│       │   ├── AdminDashboard.tsx         # Fleet ops + AI Operations panel
│       │   └── ArchitecturePage.tsx
│       ├── components/
│       │   ├── DispatchMap.tsx            # Leaflet map (drivers, trips, hotspot circles)
│       │   ├── EventLog.tsx               # Live dispatch event feed
│       │   ├── AppNav.tsx
│       │   └── Toast.tsx
│       ├── hooks/
│       │   ├── useWebSocket.ts            # WS connection + reconnect + message routing
│       │   └── useTheme.ts                # Light/dark mode toggle
│       └── api/
│           └── client.ts                  # Axios API client + typed interfaces
│
└── docker-compose.yml

API Reference

Interactive docs at http://localhost:8000/docs (Swagger UI, auto-generated by FastAPI).

# Rides
POST   /api/v1/rides                   Create ride request
GET    /api/v1/rides/{id}              Get ride + current state
PATCH  /api/v1/rides/{id}/cancel          Cancel a ride
PATCH  /api/v1/rides/{id}/decline_offer  Driver declines offer (returns ride to searching_driver)
PATCH  /api/v1/rides/{id}/arrive         Driver arriving
PATCH  /api/v1/rides/{id}/start        Trip started
PATCH  /api/v1/rides/{id}/complete     Trip completed

# Drivers
POST   /api/v1/drivers                 Register driver
PATCH  /api/v1/drivers/{id}/location   Update GPS location
PATCH  /api/v1/drivers/{id}/status     Toggle availability

# Metrics
GET    /api/v1/metrics                 System-wide counts by status

# WebSocket
WS     /ws/ride/{ride_id}              Rider real-time updates
WS     /ws/driver/{driver_id}          Driver real-time updates
WS     /ws/admin                       Admin + AI alerts stream

# Demo (DEMO_MODE=true only)
POST   /api/demo/seed                  Seed drivers at Bengaluru coordinates
POST   /api/demo/move                  Start location movement loop
POST   /api/demo/requests              Fire bulk ride requests
POST   /api/demo/ai/run                Start DBSCAN hotspot detection loop
POST   /api/demo/ai/stop               Stop AI loop
POST   /api/demo/reset                 Clear all demo data
GET    /api/demo/presets               Available simulation presets

AI Layer — DBSCAN Demand Detection

The AI service runs as a background loop triggered from the Playground demo (Step 4).

How it works:

Queries all unmatched rides (requested + searching_driver status) from PostgreSQL
Extracts pickup coordinates (lat, lng)
Runs DBSCAN with eps=1.5km, min_samples=3 to find geographic clusters
For each cluster: calculates demand, idle driver count (ST_DWithin), shortage, confidence
Queries 3 nearest idle drivers per hotspot using ST_Distance
Computes surge multiplier, deploy recommendation, and ETA to resolve
Publishes all hotspots as one batch to ai:alerts Redis channel
WebSocket fans out to Admin Dashboard and Playground simultaneously
Repeats every 8 seconds; stops when no unmatched rides remain

What surfaces in the UI:

Red gradient circles on the Playground map (one per hotspot cluster)
Orange blinking rings on the 3 nearest idle driver markers (reposition targets)
AI Hotspot Analysis card: zone status, shortage, fare impact, deploy count, nearest drivers
Admin AI Operations card: fleet-level summary + per-zone recommendations

Docker Commands

# Build and start all services
docker-compose up --build

# Start without rebuilding
docker-compose up

# Run in background
docker-compose up -d

# View logs
docker-compose logs -f backend
docker-compose logs -f celery-worker

# Stop
docker-compose down

# Full wipe (removes volumes / database)
docker-compose down -v

# Rebuild single service
docker-compose up --build backend

AWS Deployment

The local Docker Compose setup maps 1-to-1 to AWS managed services. The same container images run in both environments — the only difference is the endpoints in environment variables.

AWS Services

Service	Why it's used
Amazon ECR	Private Docker image registry. Stores `rideflow-backend:latest`; ECS pulls from here on every task launch and deployment.
Amazon ECS Fargate	Serverless container runtime for API, Worker, and Beat. No EC2 instances to patch — pay per task-second; each service scales independently.
Application Load Balancer (ALB)	Routes HTTP traffic to healthy API tasks. Integrates with ECS service health checks and auto-scaling target tracking.
Amazon CloudFront	HTTPS CDN layer in front of the HTTP ALB. Provides a `*.cloudfront.net` TLS endpoint so the HTTPS Vercel frontend can call the backend without mixed-content errors.
Amazon RDS PostgreSQL 18.3	Managed PostgreSQL with automated backups and patching. PostGIS extension enables `ST_DWithin` geospatial driver lookup and `ST_Distance` repositioning queries.
Amazon ElastiCache Redis	Managed Redis used for three separate concerns: driver location TTL hashes (db 0), Celery task broker (db 1), and Redis Pub/Sub fan-out to WebSocket clients (db 0).
Amazon CloudWatch Logs	Central log sink for all containers. All ECS task stdout/stderr streams to `/ecs/rideflow-api` with per-service prefixes (`ecs/`, `worker/`, `beat/`).

Architecture

Internet (HTTPS)
    │
    ▼
CloudFront  ←── free *.cloudfront.net TLS; sits in front of the HTTP ALB
    │
    ▼
ALB  ←── routes to healthy API tasks; health check on /api/health
    │
    ▼
ECS Fargate — API service (FastAPI)     ←── uvicorn, port 8000
    │
    ├── ECS Fargate — Celery Worker     ←── dispatch tasks, scales independently
    │
    └── ECS Fargate — Celery Beat       ←── desiredCount=1 (singleton scheduler)
                │
                ▼
    ┌──────────────────────────┐   ┌──────────────────────────┐
    │  RDS PostgreSQL + PostGIS│   │  ElastiCache Redis       │
    │  (private subnet)        │   │  db0: locations + pubsub │
    └──────────────────────────┘   │  db1: Celery broker      │
                                   └──────────────────────────┘

Service Mapping

Docker Compose service	AWS service	Notes
`backend`	ECS Fargate (`rideflow-api` task def)	ALB target group on port 8000
`celery-worker`	ECS Fargate (`rideflow-worker` task def)	Scale independently from the API
`celery-beat`	ECS Fargate (`rideflow-beat` task def)	`desiredCount=1` — must be singleton
`postgres`	RDS PostgreSQL 18.3	PostGIS enabled; runs in private VPC
`redis`	ElastiCache Redis (cluster mode off)	Two logical databases: 0 and 1

Environment Variables (production)

Replace the environment: block in Docker Compose with these for ECS task definitions or a Parameter Store secret:

DATABASE_URL=postgresql+asyncpg://<user>:<pass>@<rds-endpoint>:5432/rideflow
REDIS_URL=redis://<elasticache-endpoint>:6379/0
CELERY_BROKER_URL=redis://<elasticache-endpoint>:6379/1
DEMO_MODE=false
CORS_ORIGINS=https://<your-frontend-url>

Key Operational Notes

Celery Beat must run as a single instance. If two Beat processes run simultaneously, every scheduled task (including the stuck-ride sweeper) fires twice, causing double-cancellations. Set desiredCount=1 on the Beat ECS service. ECS restarts it automatically on crash — the only cost is missed sweep cycles during the brief restart window, which is acceptable.

RDS requires PostGIS to be enabled manually. After provisioning the RDS instance, connect once and run:

CREATE EXTENSION IF NOT EXISTS postgis;

All subsequent Alembic migrations assume PostGIS is available.

Run migrations as a one-off ECS task before starting the API.

aws ecs run-task \
  --cluster rideflow-ai \
  --task-definition rideflow-api \
  --overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}'

Security group rules (minimum):

Source	Destination	Port	Purpose
ALB	ECS API tasks	8000	HTTP traffic
ECS tasks	RDS	5432	Database
ECS tasks	ElastiCache	6379	Redis
Internet	ALB	443	HTTPS

RDS and ElastiCache must not have public internet access.

Deployment Steps (conceptual)

# 1. Build and push image to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
docker build -t rideflow-backend -f infrastructure/Dockerfile.backend .
docker tag rideflow-backend:latest <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest

# 2. Create ECS cluster
aws ecs create-cluster --cluster-name rideflow

# 3. Register task definitions (API, Worker, Beat) — same image, different commands:
#    API:     uvicorn app.main:app --host 0.0.0.0 --port 8000
#    Worker:  celery -A app.core.celery_app worker --loglevel=info --concurrency=4
#    Beat:    celery -A app.core.celery_app beat   --loglevel=info

# 4. Create ECS services
aws ecs create-service --cluster rideflow-ai --service-name rideflow-api    --desired-count 2 ...
aws ecs create-service --cluster rideflow-ai --service-name rideflow-worker --desired-count 2 ...
aws ecs create-service --cluster rideflow-ai --service-name rideflow-beat   --desired-count 1 ...

# 5. Run DB migrations
aws ecs run-task --cluster rideflow-ai --task-definition rideflow-api \
  --overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}'

Running Tests

Tests require the database and Redis to be running:

docker-compose up -d postgres redis

Then run inside the backend container (preferred — avoids local Python setup):

docker-compose exec backend pytest tests/ -v

Or locally if you have a Python environment with dependencies installed:

cd backend
DATABASE_URL=postgresql+asyncpg://rideflow:rideflow_dev@localhost:5432/rideflow \
REDIS_URL=redis://localhost:6379/0 \
CELERY_BROKER_URL=redis://localhost:6379/1 \
pytest tests/ -v

Test Coverage

File	What it tests
`tests/test_sweeper.py`	Stuck-ride watchdog: ignores fresh/terminal/partially-assigned rides; cancels orphaned `searching_driver`; re-enqueues stuck `requested`; correct counts
`tests/test_dispatch_concurrency.py`	`SELECT FOR UPDATE SKIP LOCKED` under load: 5 simultaneous dispatches against 1 driver → exactly 1 assignment; 3 vs 3 → no double-assignment

Database Commands

# Connect to PostgreSQL
docker-compose exec postgres psql -U rideflow -d rideflow

# Useful queries
SELECT status, COUNT(*) FROM rides GROUP BY status;
SELECT id, name, status, ST_AsText(location::geometry) FROM drivers;
SELECT * FROM dispatch_logs ORDER BY created_at DESC LIMIT 10;
SELECT event_type, created_at FROM ride_events WHERE ride_id = '<id>';

# Run migrations
docker-compose exec backend alembic upgrade head

Known Limitations & Design Tradeoffs

These are deliberate tradeoffs for a demo/portfolio deployment, not oversights.

Area	Current Design	Why	Production Upgrade
Single-node PostgreSQL	No HA, no read replicas	Sufficient for demo; HA adds operational complexity out of scope	RDS Multi-AZ + read replicas for reporting queries
Surge formula	Heuristic demand/supply ratio per geohash	No historical data to calibrate; transparent and tweakable	ML model trained on time-of-day, weather, and event signals
Redis Pub/Sub	Fire-and-forget, no persistence	Simplest fanout; events that fire while no subscriber is connected are lost	Redis Streams for durable, replayable event delivery
Single WebSocket server	All connections on one process	Fine for demo; WS state cannot be shared across processes	Dedicated gateway tier with shared session store (Redis)
No authentication	Open API endpoints	Demo purposes; auth would add 30+ endpoints and obscure dispatch patterns	JWT or session-based auth on all write endpoints
DBSCAN (detection, not prediction)	Reacts to current demand density	No historical data for forecasting model	Feed time-of-day, weather, and events into a supervised regression model

Implemented Recoveries

These gaps were identified and fixed during development — not left as upgrade paths:

Worker crash recovery — Celery Beat sweeper cancels rides stuck in searching_driver / driver_offered beyond 5 minutes and frees any partially-claimed driver
Ride creation idempotency — Idempotency-Key header returns existing ride on duplicate request without creating duplicates
WebSocket missed-event replay — ?last_event_id=<uuid> on reconnect replays events from the durable ride_events log before resuming the live stream
Driver decline path — driver_offered FSM state with per-offer Redis key; PATCH /rides/{id}/decline_offer signals the dispatch worker to try the next candidate
AI loop isolation — DBSCAN loop moved from asyncio.create_task (tied to one uvicorn worker) to Celery Beat with a Redis flag; survives API restarts and is safe with multiple workers

Re-deploying to AWS After Code Changes

Use this flow whenever you push new backend code and need to update ECS.

# 1. Rebuild and push updated image to ECR
aws ecr get-login-password --region <region> | \
  docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com

docker build -t rideflow-backend -f infrastructure/Dockerfile.backend .
docker tag rideflow-backend:latest \
  <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest

# 2. Run any new Alembic migrations (e.g. 002_ride_idempotency_key)
aws ecs run-task \
  --cluster rideflow-ai \
  --task-definition rideflow-api \
  --overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}' \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[<subnet-id>],securityGroups=[<sg-id>],assignPublicIp=ENABLED}"

# 3. Force a new deployment on every ECS service (picks up the new image)
aws ecs update-service --cluster rideflow-ai --service rideflow-api    --force-new-deployment
aws ecs update-service --cluster rideflow-ai --service rideflow-worker --force-new-deployment
aws ecs update-service --cluster rideflow-ai --service rideflow-beat   --force-new-deployment

# 4. Watch rollout — wait for runningCount to reach desiredCount on each service
aws ecs describe-services --cluster rideflow-ai \
  --services rideflow-api rideflow-worker rideflow-beat \
  --query 'services[*].{name:serviceName,desired:desiredCount,running:runningCount}'

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
infrastructure		infrastructure
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation