Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Odds ML Orchestration (Claude Code Team)

This repository is used to maintain an orchestrated pipeline for **sports odds statistical research and ML predictions** (spreads, moneylines, totals) with a strict **rolling 5-day freshness window**.

## Non-negotiables

- **Freshness SLO**: all datasets used for training and prediction must be derived from the last **5 days** of collected data. If inputs are stale, the pipeline must fail fast and trigger backfill.
- **Canonical markets**:
- **Spreads**: store **one canonical value per game** (favorite perspective OR `spread_magnitude` + `favorite_team`). Never average ±spread rows.
- **Totals**: store **one canonical total** per game (not separate over/under rows).
- **Moneylines**: convert American odds to **implied probability** for aggregation and ML.

See the project rules in `./.claude/rules/` for details.

## Repeatable Agent Team Template (copy/paste prompt)

Create an agent team for odds pipeline maintenance with these teammates and responsibilities. Require plan approval before implementing any schema or workflow changes. Put the lead into delegate mode after spawning.

- **TeamLead (delegate mode)**: coordination only, creates tasks, assigns owners, synthesizes results.
- **CollectorEngineer**: web scraping + API collectors, rate limits, idempotency, retries/backfills.
- **NormalizationSteward**: canonicalization of spreads/totals/moneylines; dedupe; invariants/tests.
- **DataFreshnessSRE**: rolling 5-day window enforcement; staleness detection; alerting/escalation.
- **MLTrainerEngineer**: feature views; training; evaluation; prediction artifacts.
- **CostQuotaAnalyst (optional)**: API credit/usage budgeting; schedule optimization.

Approval criteria for TeamLead:
- Reject any plan that changes market sign conventions.
- Reject any plan that allows stale inputs to silently pass.
- Reject any plan that introduces non-idempotent collectors.

## Operational contract (GitHub Actions)

GitHub Actions runs the scheduled pipeline. The code must support:
- **Collect** → **Normalize/Validate** → **Train** → **Predict/Report** → **Freshness Guard**
- Bounded backfill on staleness (5-day lookback).

47 changes: 47 additions & 0 deletions .claude/hooks/task_gate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from __future__ import annotations

import json
import os
import subprocess
import sys


def _run(cmd: list[str]) -> subprocess.CompletedProcess:
return subprocess.run(cmd, capture_output=True, text=True)


def main() -> int:
# Read hook JSON input (best-effort; TaskCompleted/TeammateIdle always fire).
try:
_ = json.loads(sys.stdin.read() or "{}")
except Exception:
pass

# Only run gates if the odds pipeline package is importable in this environment.
probe = _run([sys.executable, "-c", "import odds_pipeline"])
if probe.returncode != 0:
print("odds_pipeline not installed; skipping odds pipeline gates", file=sys.stderr)
return 0

v = _run([sys.executable, "-m", "odds_pipeline", "validate"])
if v.returncode != 0:
print("odds pipeline validation failed", file=sys.stderr)
print(v.stdout, file=sys.stderr)
print(v.stderr, file=sys.stderr)
return 2

# Only enforce freshness if DATABASE_URL is present (so local doc work isn't blocked).
if os.getenv("DATABASE_URL"):
f = _run([sys.executable, "-m", "odds_pipeline", "freshness-guard", "--window-days", "5"])
if f.returncode != 0:
print("freshness guard failed", file=sys.stderr)
print(f.stdout, file=sys.stderr)
print(f.stderr, file=sys.stderr)
return 2

return 0


if __name__ == "__main__":
raise SystemExit(main())

34 changes: 34 additions & 0 deletions .claude/rules/freshness-and-windows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
paths:
- "odds/**"
- ".github/workflows/odds-*.yml"
---

# Data freshness + rolling window rules (5 days)

This project’s ML outputs are only valid if inputs are **fresh** and the training/inference data is bounded to a **rolling 5-day window**.

## Freshness SLO

Fail the pipeline if any required input stream is stale.

Recommended defaults (tune per sport/market cadence):
- **Odds snapshots**: stale if `max(collected_at)` older than **180 minutes**
- **Scores/finals**: stale if `max(collected_at)` older than **24 hours**

## Rolling 5-day window

All downstream datasets (features, training rows, prediction features) must be computable from the last **5 days** of canonical + raw inputs.

Implementation requirements:
- Every compute job must accept `--window-days 5` (default 5).
- Normalization must support backfill with an explicit `--lookback-days 5`.
- Any retention/pruning job must never delete within the active 5-day window.

## Backfill on staleness

If freshness checks fail:
- run a bounded backfill (lookback 5 days)
- re-run normalization + validation
- re-check freshness before training/predicting

25 changes: 25 additions & 0 deletions .claude/rules/ml-training-contracts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
paths:
- "odds/**"
---

# ML training + prediction contracts

This project is designed so that models can be trained and evaluated deterministically from canonical data in the last 5 days.

## Requirements

- Training jobs must:
- log the dataset time window used
- record training timestamp and model version identifier
- output evaluation metrics (at minimum: calibration/accuracy proxies appropriate to the target)
- Prediction jobs must:
- refuse to run if freshness checks fail
- attach the model version + data window to every prediction artifact

## Targets

- **Spreads**: predict cover probability from the team perspective (requires consistent sign conventions).
- **Moneylines**: predict win probability (compare to implied probs for edge).
- **Totals**: predict over probability relative to the canonical total.

43 changes: 43 additions & 0 deletions .claude/rules/odds-normalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
paths:
- "odds/**"
---

# Odds normalization rules (canonical markets)

These rules prevent sign-convention errors and ensure math is consistent across collectors, analytics, and ML training.

## Key distinction

**Favorite/underdog is determined by spread sign; home/away is venue and independent.** Do not conflate them.

## Spreads (one canonical value per game)

Sportsbooks/APIs often return **two outcomes per event** with opposite signs (e.g., -7 and +7). Those represent the *same* market.

Store exactly **one canonical record per event/book/collected_at** using either:

- **Option A (allowed)**: store the **favorite spread** (always negative or 0).
- **Option B (preferred)**: store `spread_magnitude` (always positive) and explicit `favorite_team`/`underdog_team`.

Never average raw `point` values that include both + and -.

## Totals (one canonical value per game)

Over/Under are two prices on the same number. Store one `total` value plus `over_price`/`under_price`.

## Moneylines (use implied probability for math)

American odds must be converted to implied probability before any aggregation or modeling.

For American odds \(o\):

- If \(o < 0\): \(p = |o| / (|o| + 100)\)
- If \(o > 0\): \(p = 100 / (o + 100)\)

Never average American odds directly.

## Line movement convention (favorite perspective)

If tracking spread movement, compute deltas from the **favorite’s spread** (negative). This avoids mixing perspectives.

25 changes: 25 additions & 0 deletions .claude/rules/storage-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
paths:
- "odds/**"
---

# Storage and schema contract

GitHub Actions runners are ephemeral. **All pipeline state must live in persistent storage**.

## Required environment variables

- `DATABASE_URL`: Postgres connection string for the persistent store.
- `ODDS_API_KEY`: The Odds API key (or equivalent) for collectors.

## Schema principles

- **Raw tables**: append-only snapshots; never mutated in place.
- **Canonical tables**: derived from raw via normalization; can be re-derived deterministically.
- **Idempotency**: collectors must not create duplicates for the same `(source,event_id,market,bookmaker,collected_at)` tuple.
- **Time zone**: store timestamps in UTC and only convert for presentation.

## Market canonicalization

Canonical tables must follow the rules in `odds-normalization.md`.

29 changes: 29 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
},
"teammateMode": "in-process",
"hooks": {
"TaskCompleted": [
{
"hooks": [
{
"type": "command",
"command": "python \"$CLAUDE_PROJECT_DIR/.claude/hooks/task_gate.py\""
}
]
}
],
"TeammateIdle": [
{
"hooks": [
{
"type": "command",
"command": "python \"$CLAUDE_PROJECT_DIR/.claude/hooks/task_gate.py\""
}
]
}
]
}
}

39 changes: 39 additions & 0 deletions .claude/skills/betting-data-normalizing/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
name: betting-data-normalizing
description: Mandatory normalization rules for spreads, totals, and moneylines. Use for ANY sports betting analytics or ML work in this repo.
---

# Betting Data Normalizing (repo standard)

## Spreads

- APIs/books often return two outcomes per game with opposite signs (e.g., -7 and +7). They represent the **same** spread.\n
- Store **one canonical record per game**.\n
- Recommended representation:\n
- `spread_magnitude`: always positive\n
- `favorite_team` and `underdog_team`\n
- prices for each side\n
\n
Never average raw point values that mix negative and positive spreads.

## Totals

- Store **one total** per game plus `over_price` and `under_price`.\n
- Do not store separate Over/Under rows as separate totals.

## Moneylines

- Convert American odds to implied probability before doing math.\n
\n
If `odds < 0`:\n
`p = abs(odds) / (abs(odds) + 100)`\n
\n
If `odds > 0`:\n
`p = 100 / (odds + 100)`\n
\n
Never average American odds directly.

## Movement convention

Track spread movement from the **favorite’s perspective** (negative spread). This avoids mixing perspectives between teams.

28 changes: 28 additions & 0 deletions .claude/skills/odds-collecting/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: odds-collecting
description: Collect scores and odds in a rolling window with retries, deduplication, and freshness guarantees.
---

# Odds Collecting (repo standard)

## Goals

- Keep data fresh for a rolling **5-day** ML window.\n
- Make collectors **idempotent** and safe to rerun.\n
- Track costs/quotas and avoid redundant polling.\n

## Collector requirements

- Always accept explicit arguments:\n
- `--lookback-days` (default 5)\n
- `--sport` (e.g., `basketball_ncaab`)\n
- `--regions` and `--markets` when applicable\n
- Always write timestamps in UTC (`collected_at`).\n
- Use `event_id` (or equivalent) as the primary dedupe key.\n
- Handle rate limits with exponential backoff.\n

## Freshness

- Provide a `freshness_guard` command that fails when data is stale.\n
- On staleness, run bounded backfill (lookback 5 days), then re-normalize.\n

2 changes: 1 addition & 1 deletion .github/workflows/convetional-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ jobs:
pull-requests: read
steps:
- if: github.event_name != 'merge_group'
uses: amannn/action-semantic-pull-request@48f256284bd46cdaab1048c3721360e808335d50 # v6.1.1
uses: amannn/action-semantic-pull-request@v7
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
61 changes: 61 additions & 0 deletions .github/workflows/odds-collect.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Odds pipeline - collect odds + normalize

permissions: read-all

on:
workflow_dispatch:
inputs:
sport:
description: Sport key (e.g. basketball_ncaab)
required: false
default: basketball_ncaab
regions:
description: Regions (comma-separated)
required: false
default: us
markets:
description: Markets (comma-separated)
required: false
default: h2h,spreads,totals
schedule:
- cron: "*/15 * * * *"

jobs:
collect-odds:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> "$GITHUB_PATH"

- name: Init schema (idempotent)
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
working-directory: odds
run: uv run python -m odds_pipeline.schema

- name: Collect odds snapshots
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
ODDS_API_KEY: ${{ secrets.ODDS_API_KEY }}
working-directory: odds
run: >
uv run python -m odds_pipeline collect-odds
--sport "${{ inputs.sport || 'basketball_ncaab' }}"
--regions "${{ inputs.regions || 'us' }}"
--markets "${{ inputs.markets || 'h2h,spreads,totals' }}"

- name: Normalize (rolling window)
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
working-directory: odds
run: uv run python -m odds_pipeline normalize --window-days 5

- name: Validate invariants (fast)
working-directory: odds
run: uv run python -m odds_pipeline validate

Loading
Loading