Skip to content

CodeTirtho97/RideFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RideFlow

Real-Time Ride Dispatch System with Operational AI

A production-style distributed backend built to implement every concept tested in the "Design Uber/Lyft" system design interview — not as a clone, but as a working, deployable system with a live interactive demo.

Tests Python FastAPI PostgreSQL Redis Docker scikit-learn License

image

What This Project Is

RideFlow is a backend engineering project, not a product. The goal is to build and demonstrate every component that appears in the "Design Uber" system design interview — and explain why each decision was made.

This is not:

  • An Uber clone with user management or authentication
  • A chatbot or LLM wrapper
  • A tutorial project with simplified patterns

This is:

  • A working geospatial dispatch engine with SELECT FOR UPDATE SKIP LOCKED
  • An 8-state ride lifecycle enforced by a proper finite state machine
  • A real-time WebSocket system backed by Redis Pub/Sub fan-out
  • An AI layer (DBSCAN clustering) that detects demand hotspots live and recommends driver repositioning
  • A system you can run, explain, and defend in an interview

Live Demo

rideflow-v1.vercel.app

Page URL Purpose
Playground /playground 4-step simulation with live AI analysis
Rider /rider Book a ride, watch dispatch in real time
Driver /driver Receive requests, complete trips
Admin /admin Fleet ops view with AI Operations panel
Architecture /architecture Full system design walkthrough

System Overview

RIDER APP          DRIVER APP         ADMIN DASHBOARD        PLAYGROUND
(book rides)       (receive rides)    (ops + AI alerts)      (simulation)
      |                  |                     |                    |
      +------------------+---------------------+--------------------+
                                    |
                         API GATEWAY (FastAPI)
                                    |
              +---------------------+---------------------+
              |                     |                     |
      DISPATCH SERVICE       WEBSOCKET SERVICE       AI SERVICE
      (Celery workers)       (Redis Pub/Sub)         (DBSCAN loop)
              |                     |                     |
              +---------------------+---------------------+
                                    |
                               DATA LAYER
                    +--------------+------------------+
                    |                                 |
          PostgreSQL + PostGIS                     Redis
          - rides, drivers                         - driver locations (TTL)
          - ride state + event log                 - pub/sub channels
          - dispatch logs                          - demo driver ID set
          - demand_predictions

Key System Design Concepts

Concept Implementation Problem it solves
Geospatial driver lookup PostGIS ST_DWithin + GiST spatial index "How do you find the nearest driver?"
Race condition prevention SELECT FOR UPDATE SKIP LOCKED "How do you prevent double-assignment?"
Real-time fan-out WebSocket + Redis Pub/Sub "How do updates reach clients without polling?"
Ride lifecycle 8-state FSM with enforced transitions "How do you manage ride state?"
Async dispatch Celery workers + Redis broker "How do you keep the API fast under load?"
Driver liveness Redis HASH with 30s TTL "How does a driver go offline automatically?"
Demand AI DBSCAN clustering on pickup coordinates "How does the system detect demand hotspots as they form?"
Driver reposition PostGIS ST_Distance nearest-driver query "How does the system suggest driver repositioning?"
Fault recovery Celery Beat sweeper — cancels rides stuck > 5 min "What happens if a worker crashes mid-dispatch?"

Tech Stack

Backend

Component Technology
API framework FastAPI + AsyncIO
Task queue Celery + Redis broker
Primary database PostgreSQL 16 + PostGIS
Cache / location store Redis
ORM + migrations SQLAlchemy (async) + Alembic
AI / clustering scikit-learn (DBSCAN) + NumPy

Frontend

Component Technology
Framework React 18 + TypeScript
Build tool Vite
Maps Leaflet.js + react-leaflet (OpenStreetMap, no API key)
Real-time Native WebSocket API
Styling Custom CSS with CSS variables (light/dark themes)

Infrastructure

Component Technology
Containerization Docker + Docker Compose
Image registry AWS ECR
Cloud runtime AWS ECS Fargate
HTTPS layer AWS CloudFront
Load balancing AWS ALB
Primary database AWS RDS PostgreSQL 18.3
Cache + broker AWS ElastiCache Redis
Logs AWS CloudWatch Logs
Frontend deployment Vercel

Getting Started

Prerequisites

  • Docker Desktop (version 24+)
  • Docker Compose (included with Docker Desktop)
  • Node.js 20+ (only if you want to run frontend locally)

No local Python installation required.

Run Locally

git clone https://github.com/CodeTirtho97/RideFlow.git
cd rideflow-ai

docker-compose up --build

Backend is available at http://localhost:8000 (Swagger: http://localhost:8000/docs).

Frontend options:

  • Use the deployed app: https://rideflow-v1.vercel.app
  • Or run locally:
cd frontend
npm install
npm run dev

Open http://localhost:3000 for local frontend.

Services started:

Service Port Description
Backend 8000 FastAPI (API + WebSocket)
Celery worker Dispatch task worker
Celery Beat Periodic task scheduler (stuck-ride sweeper, every 60 s)
PostgreSQL 5432 Primary database
Redis 6379 Cache + broker + pub/sub

Environment Variables

# Backend
DATABASE_URL=postgresql+asyncpg://rideflow:rideflow_dev@localhost:5432/rideflow
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/1
DEMO_MODE=true

# Optional (for deployed frontend access)
CORS_ORIGINS=https://rideflow-v1.vercel.app

Running the Playground Demo

Navigate to http://localhost:3000/playground.

  1. Select a preset — Light, Moderate, or Dense traffic scenario
  2. Step 1: Seed Drivers — places drivers at real Bengaluru GPS coordinates
  3. Step 2: Start Movement — begins random-walk location heartbeats every 2s
  4. Step 3: Fire Requests — fires all ride requests simultaneously, triggering parallel Celery dispatch
  5. Step 4: Start AI Loop — runs DBSCAN every 8s on unmatched rides, publishes hotspot alerts via Redis → WebSocket

Open /admin in another tab to see the AI Operations panel update in real time.

Simulation Presets

Preset Drivers Requests Radius What it shows
Light Traffic 10 8 10 km Happy path — clean dispatch, everyone matched
Moderate Traffic 35 35 7 km Balanced — retries, radius expansion 3→5 km
Dense — Peak Hour 70 100 5 km Saturation — surge pricing, cancellations, 2–4 AI hotspot clusters

The Dense preset simulates Whitefield, Bengaluru at peak hour — 100 ride requests in a 5 km zone with 70 drivers. DBSCAN detects 2–4 demand clusters and recommends which idle drivers to reposition.

Demo Scale Reference

Factor Value
Time compression 1 real second ≈ 1 minute travel time
Location update interval Every 2 seconds
Driver movement step ±0.0008° (~88m per update)
Search radius 3 km → 5 km on expansion
AI loop interval Every 8 seconds
DBSCAN epsilon 1.5 km cluster radius
DBSCAN min_samples 3 requests to form a cluster

Project Structure

rideflow-ai/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   │   ├── rides.py           # Ride CRUD + state transitions
│   │   │   ├── drivers.py         # Driver registration + location
│   │   │   ├── metrics.py         # System-wide metrics endpoint
│   │   │   ├── websocket.py       # WebSocket endpoints (/ws/admin, /ws/ride, /ws/driver)
│   │   │   ├── demo.py            # Demo simulation endpoints (DEMO_MODE only)
│   │   │   └── ai.py              # AI prediction loop endpoint (DEMO_MODE only)
│   │   ├── services/
│   │   │   ├── ai/
│   │   │   │   └── demand_prediction.py   # DBSCAN clustering + hotspot detection
│   │   │   ├── dispatch/
│   │   │   │   ├── surge.py               # Surge multiplier calculation
│   │   │   │   └── retry.py               # Retry policy + radius expansion
│   │   │   ├── ride/
│   │   │   │   └── state_machine.py       # 8-state FSM with enforced transitions
│   │   │   ├── driver/
│   │   │   │   ├── location.py            # Redis HASH location writes
│   │   │   │   └── status.py              # TTL-based availability
│   │   │   └── websocket/
│   │   │       ├── manager.py             # WebSocket connection registry
│   │   │       └── pubsub.py              # Redis Pub/Sub subscriber + router
│   │   ├── workers/
│   │   │   ├── dispatch_task.py           # Celery dispatch task (find, lock, assign)
│   │   │   └── tasks.py                   # health_check + sweep_stuck_rides (Beat, every 60s)
│   │   ├── models/
│   │   │   ├── ride.py                    # Ride, RideEvent, DispatchLog, DemandPrediction
│   │   │   └── driver.py                  # Driver model
│   │   └── core/
│   │       ├── config.py                  # Settings from env vars
│   │       ├── database.py                # Async PostgreSQL + session factory
│   │       └── redis_client.py            # Redis connection pool
│   ├── alembic/                           # Database migrations
│   ├── tests/
│   │   ├── conftest.py                    # DB/Redis fixtures + test helpers
│   │   ├── test_sweeper.py                # Stuck-ride watchdog tests (7 cases)
│   │   └── test_dispatch_concurrency.py   # SELECT FOR UPDATE SKIP LOCKED under load
│   ├── pytest.ini
│   ├── requirements.txt
├── infrastructure/
│   └── Dockerfile.backend
│
├── frontend/
│   └── src/
│       ├── pages/
│       │   ├── LandingPage.tsx
│       │   ├── DemoPage.tsx               # Playground — 4-step simulation
│       │   ├── RiderDashboard.tsx
│       │   ├── DriverDashboard.tsx
│       │   ├── AdminDashboard.tsx         # Fleet ops + AI Operations panel
│       │   └── ArchitecturePage.tsx
│       ├── components/
│       │   ├── DispatchMap.tsx            # Leaflet map (drivers, trips, hotspot circles)
│       │   ├── EventLog.tsx               # Live dispatch event feed
│       │   ├── AppNav.tsx
│       │   └── Toast.tsx
│       ├── hooks/
│       │   ├── useWebSocket.ts            # WS connection + reconnect + message routing
│       │   └── useTheme.ts                # Light/dark mode toggle
│       └── api/
│           └── client.ts                  # Axios API client + typed interfaces
│
└── docker-compose.yml

API Reference

Interactive docs at http://localhost:8000/docs (Swagger UI, auto-generated by FastAPI).

# Rides
POST   /api/v1/rides                   Create ride request
GET    /api/v1/rides/{id}              Get ride + current state
PATCH  /api/v1/rides/{id}/cancel          Cancel a ride
PATCH  /api/v1/rides/{id}/decline_offer  Driver declines offer (returns ride to searching_driver)
PATCH  /api/v1/rides/{id}/arrive         Driver arriving
PATCH  /api/v1/rides/{id}/start        Trip started
PATCH  /api/v1/rides/{id}/complete     Trip completed

# Drivers
POST   /api/v1/drivers                 Register driver
PATCH  /api/v1/drivers/{id}/location   Update GPS location
PATCH  /api/v1/drivers/{id}/status     Toggle availability

# Metrics
GET    /api/v1/metrics                 System-wide counts by status

# WebSocket
WS     /ws/ride/{ride_id}              Rider real-time updates
WS     /ws/driver/{driver_id}          Driver real-time updates
WS     /ws/admin                       Admin + AI alerts stream

# Demo (DEMO_MODE=true only)
POST   /api/demo/seed                  Seed drivers at Bengaluru coordinates
POST   /api/demo/move                  Start location movement loop
POST   /api/demo/requests              Fire bulk ride requests
POST   /api/demo/ai/run                Start DBSCAN hotspot detection loop
POST   /api/demo/ai/stop               Stop AI loop
POST   /api/demo/reset                 Clear all demo data
GET    /api/demo/presets               Available simulation presets

AI Layer — DBSCAN Demand Detection

The AI service runs as a background loop triggered from the Playground demo (Step 4).

How it works:

  1. Queries all unmatched rides (requested + searching_driver status) from PostgreSQL
  2. Extracts pickup coordinates (lat, lng)
  3. Runs DBSCAN with eps=1.5km, min_samples=3 to find geographic clusters
  4. For each cluster: calculates demand, idle driver count (ST_DWithin), shortage, confidence
  5. Queries 3 nearest idle drivers per hotspot using ST_Distance
  6. Computes surge multiplier, deploy recommendation, and ETA to resolve
  7. Publishes all hotspots as one batch to ai:alerts Redis channel
  8. WebSocket fans out to Admin Dashboard and Playground simultaneously
  9. Repeats every 8 seconds; stops when no unmatched rides remain

What surfaces in the UI:

  • Red gradient circles on the Playground map (one per hotspot cluster)
  • Orange blinking rings on the 3 nearest idle driver markers (reposition targets)
  • AI Hotspot Analysis card: zone status, shortage, fare impact, deploy count, nearest drivers
  • Admin AI Operations card: fleet-level summary + per-zone recommendations

Docker Commands

# Build and start all services
docker-compose up --build

# Start without rebuilding
docker-compose up

# Run in background
docker-compose up -d

# View logs
docker-compose logs -f backend
docker-compose logs -f celery-worker

# Stop
docker-compose down

# Full wipe (removes volumes / database)
docker-compose down -v

# Rebuild single service
docker-compose up --build backend

AWS Deployment

The local Docker Compose setup maps 1-to-1 to AWS managed services. The same container images run in both environments — the only difference is the endpoints in environment variables.

AWS Services

Service Why it's used
Amazon ECR Private Docker image registry. Stores rideflow-backend:latest; ECS pulls from here on every task launch and deployment.
Amazon ECS Fargate Serverless container runtime for API, Worker, and Beat. No EC2 instances to patch — pay per task-second; each service scales independently.
Application Load Balancer (ALB) Routes HTTP traffic to healthy API tasks. Integrates with ECS service health checks and auto-scaling target tracking.
Amazon CloudFront HTTPS CDN layer in front of the HTTP ALB. Provides a *.cloudfront.net TLS endpoint so the HTTPS Vercel frontend can call the backend without mixed-content errors.
Amazon RDS PostgreSQL 18.3 Managed PostgreSQL with automated backups and patching. PostGIS extension enables ST_DWithin geospatial driver lookup and ST_Distance repositioning queries.
Amazon ElastiCache Redis Managed Redis used for three separate concerns: driver location TTL hashes (db 0), Celery task broker (db 1), and Redis Pub/Sub fan-out to WebSocket clients (db 0).
Amazon CloudWatch Logs Central log sink for all containers. All ECS task stdout/stderr streams to /ecs/rideflow-api with per-service prefixes (ecs/, worker/, beat/).

Architecture

Internet (HTTPS)
    │
    ▼
CloudFront  ←── free *.cloudfront.net TLS; sits in front of the HTTP ALB
    │
    ▼
ALB  ←── routes to healthy API tasks; health check on /api/health
    │
    ▼
ECS Fargate — API service (FastAPI)     ←── uvicorn, port 8000
    │
    ├── ECS Fargate — Celery Worker     ←── dispatch tasks, scales independently
    │
    └── ECS Fargate — Celery Beat       ←── desiredCount=1 (singleton scheduler)
                │
                ▼
    ┌──────────────────────────┐   ┌──────────────────────────┐
    │  RDS PostgreSQL + PostGIS│   │  ElastiCache Redis       │
    │  (private subnet)        │   │  db0: locations + pubsub │
    └──────────────────────────┘   │  db1: Celery broker      │
                                   └──────────────────────────┘

Service Mapping

Docker Compose service AWS service Notes
backend ECS Fargate (rideflow-api task def) ALB target group on port 8000
celery-worker ECS Fargate (rideflow-worker task def) Scale independently from the API
celery-beat ECS Fargate (rideflow-beat task def) desiredCount=1must be singleton
postgres RDS PostgreSQL 18.3 PostGIS enabled; runs in private VPC
redis ElastiCache Redis (cluster mode off) Two logical databases: 0 and 1

Environment Variables (production)

Replace the environment: block in Docker Compose with these for ECS task definitions or a Parameter Store secret:

DATABASE_URL=postgresql+asyncpg://<user>:<pass>@<rds-endpoint>:5432/rideflow
REDIS_URL=redis://<elasticache-endpoint>:6379/0
CELERY_BROKER_URL=redis://<elasticache-endpoint>:6379/1
DEMO_MODE=false
CORS_ORIGINS=https://<your-frontend-url>

Key Operational Notes

Celery Beat must run as a single instance. If two Beat processes run simultaneously, every scheduled task (including the stuck-ride sweeper) fires twice, causing double-cancellations. Set desiredCount=1 on the Beat ECS service. ECS restarts it automatically on crash — the only cost is missed sweep cycles during the brief restart window, which is acceptable.

RDS requires PostGIS to be enabled manually. After provisioning the RDS instance, connect once and run:

CREATE EXTENSION IF NOT EXISTS postgis;

All subsequent Alembic migrations assume PostGIS is available.

Run migrations as a one-off ECS task before starting the API.

aws ecs run-task \
  --cluster rideflow-ai \
  --task-definition rideflow-api \
  --overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}'

Security group rules (minimum):

Source Destination Port Purpose
ALB ECS API tasks 8000 HTTP traffic
ECS tasks RDS 5432 Database
ECS tasks ElastiCache 6379 Redis
Internet ALB 443 HTTPS

RDS and ElastiCache must not have public internet access.

Deployment Steps (conceptual)

# 1. Build and push image to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
docker build -t rideflow-backend -f infrastructure/Dockerfile.backend .
docker tag rideflow-backend:latest <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest

# 2. Create ECS cluster
aws ecs create-cluster --cluster-name rideflow

# 3. Register task definitions (API, Worker, Beat) — same image, different commands:
#    API:     uvicorn app.main:app --host 0.0.0.0 --port 8000
#    Worker:  celery -A app.core.celery_app worker --loglevel=info --concurrency=4
#    Beat:    celery -A app.core.celery_app beat   --loglevel=info

# 4. Create ECS services
aws ecs create-service --cluster rideflow-ai --service-name rideflow-api    --desired-count 2 ...
aws ecs create-service --cluster rideflow-ai --service-name rideflow-worker --desired-count 2 ...
aws ecs create-service --cluster rideflow-ai --service-name rideflow-beat   --desired-count 1 ...

# 5. Run DB migrations
aws ecs run-task --cluster rideflow-ai --task-definition rideflow-api \
  --overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}'

Running Tests

Tests require the database and Redis to be running:

docker-compose up -d postgres redis

Then run inside the backend container (preferred — avoids local Python setup):

docker-compose exec backend pytest tests/ -v

Or locally if you have a Python environment with dependencies installed:

cd backend
DATABASE_URL=postgresql+asyncpg://rideflow:rideflow_dev@localhost:5432/rideflow \
REDIS_URL=redis://localhost:6379/0 \
CELERY_BROKER_URL=redis://localhost:6379/1 \
pytest tests/ -v

Test Coverage

File What it tests
tests/test_sweeper.py Stuck-ride watchdog: ignores fresh/terminal/partially-assigned rides; cancels orphaned searching_driver; re-enqueues stuck requested; correct counts
tests/test_dispatch_concurrency.py SELECT FOR UPDATE SKIP LOCKED under load: 5 simultaneous dispatches against 1 driver → exactly 1 assignment; 3 vs 3 → no double-assignment

Database Commands

# Connect to PostgreSQL
docker-compose exec postgres psql -U rideflow -d rideflow

# Useful queries
SELECT status, COUNT(*) FROM rides GROUP BY status;
SELECT id, name, status, ST_AsText(location::geometry) FROM drivers;
SELECT * FROM dispatch_logs ORDER BY created_at DESC LIMIT 10;
SELECT event_type, created_at FROM ride_events WHERE ride_id = '<id>';

# Run migrations
docker-compose exec backend alembic upgrade head

Known Limitations & Design Tradeoffs

These are deliberate tradeoffs for a demo/portfolio deployment, not oversights.

Area Current Design Why Production Upgrade
Single-node PostgreSQL No HA, no read replicas Sufficient for demo; HA adds operational complexity out of scope RDS Multi-AZ + read replicas for reporting queries
Surge formula Heuristic demand/supply ratio per geohash No historical data to calibrate; transparent and tweakable ML model trained on time-of-day, weather, and event signals
Redis Pub/Sub Fire-and-forget, no persistence Simplest fanout; events that fire while no subscriber is connected are lost Redis Streams for durable, replayable event delivery
Single WebSocket server All connections on one process Fine for demo; WS state cannot be shared across processes Dedicated gateway tier with shared session store (Redis)
No authentication Open API endpoints Demo purposes; auth would add 30+ endpoints and obscure dispatch patterns JWT or session-based auth on all write endpoints
DBSCAN (detection, not prediction) Reacts to current demand density No historical data for forecasting model Feed time-of-day, weather, and events into a supervised regression model

Implemented Recoveries

These gaps were identified and fixed during development — not left as upgrade paths:

  • Worker crash recovery — Celery Beat sweeper cancels rides stuck in searching_driver / driver_offered beyond 5 minutes and frees any partially-claimed driver
  • Ride creation idempotencyIdempotency-Key header returns existing ride on duplicate request without creating duplicates
  • WebSocket missed-event replay?last_event_id=<uuid> on reconnect replays events from the durable ride_events log before resuming the live stream
  • Driver decline pathdriver_offered FSM state with per-offer Redis key; PATCH /rides/{id}/decline_offer signals the dispatch worker to try the next candidate
  • AI loop isolation — DBSCAN loop moved from asyncio.create_task (tied to one uvicorn worker) to Celery Beat with a Redis flag; survives API restarts and is safe with multiple workers

Re-deploying to AWS After Code Changes

Use this flow whenever you push new backend code and need to update ECS.

# 1. Rebuild and push updated image to ECR
aws ecr get-login-password --region <region> | \
  docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com

docker build -t rideflow-backend -f infrastructure/Dockerfile.backend .
docker tag rideflow-backend:latest \
  <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/rideflow-backend:latest

# 2. Run any new Alembic migrations (e.g. 002_ride_idempotency_key)
aws ecs run-task \
  --cluster rideflow-ai \
  --task-definition rideflow-api \
  --overrides '{"containerOverrides":[{"name":"app","command":["alembic","upgrade","head"]}]}' \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[<subnet-id>],securityGroups=[<sg-id>],assignPublicIp=ENABLED}"

# 3. Force a new deployment on every ECS service (picks up the new image)
aws ecs update-service --cluster rideflow-ai --service rideflow-api    --force-new-deployment
aws ecs update-service --cluster rideflow-ai --service rideflow-worker --force-new-deployment
aws ecs update-service --cluster rideflow-ai --service rideflow-beat   --force-new-deployment

# 4. Watch rollout — wait for runningCount to reach desiredCount on each service
aws ecs describe-services --cluster rideflow-ai \
  --services rideflow-api rideflow-worker rideflow-beat \
  --query 'services[*].{name:serviceName,desired:desiredCount,running:runningCount}'

License

MIT License. See LICENSE.

About

Uber-like Real-Time Ride Dispatch System with AI-Driven Demand Prediction.

Topics

Resources

Stars

Watchers

Forks

Contributors