omalleyandy
diff --git a/‎.claude/CLAUDE.md‎
Lines changed: 36 additions & 0 deletions b/‎.claude/CLAUDE.md‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎.claude/hooks/task_gate.py‎
Lines changed: 47 additions & 0 deletions b/‎.claude/hooks/task_gate.py‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎.claude/rules/freshness-and-windows.md‎
Lines changed: 34 additions & 0 deletions b/‎.claude/rules/freshness-and-windows.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎.claude/rules/ml-training-contracts.md‎
Lines changed: 25 additions & 0 deletions b/‎.claude/rules/ml-training-contracts.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎.claude/rules/odds-normalization.md‎
Lines changed: 43 additions & 0 deletions b/‎.claude/rules/odds-normalization.md‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎.claude/rules/storage-schema.md‎
Lines changed: 25 additions & 0 deletions b/‎.claude/rules/storage-schema.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎.claude/settings.json‎
Lines changed: 29 additions & 0 deletions b/‎.claude/settings.json‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎.claude/skills/betting-data-normalizing/SKILL.md‎
Lines changed: 39 additions & 0 deletions b/‎.claude/skills/betting-data-normalizing/SKILL.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎.claude/skills/odds-collecting/SKILL.md‎
Lines changed: 28 additions & 0 deletions b/‎.claude/skills/odds-collecting/SKILL.md‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎.github/workflows/odds-collect.yml‎
Lines changed: 59 additions & 0 deletions b/‎.github/workflows/odds-collect.yml‎
Lines changed: 59 additions & 0 deletions
@@ -0,0 +1,36 @@
+# Odds ML Orchestration (Claude Code Team)
+
+This repository is used to maintain an orchestrated pipeline for **sports odds statistical research and ML predictions** (spreads, moneylines, totals) with a strict **rolling 5-day freshness window**.
+
+## Non-negotiables
+
+- **Freshness SLO**: all datasets used for training and prediction must be derived from the last **5 days** of collected data. If inputs are stale, the pipeline must fail fast and trigger backfill.
+- **Canonical markets**:
+  - **Spreads**: store **one canonical value per game** (favorite perspective OR `spread_magnitude` + `favorite_team`). Never average ±spread rows.
+  - **Totals**: store **one canonical total** per game (not separate over/under rows).
+  - **Moneylines**: convert American odds to **implied probability** for aggregation and ML.
+
+See the project rules in `./.claude/rules/` for details.
+
+## Repeatable Agent Team Template (copy/paste prompt)
+
+Create an agent team for odds pipeline maintenance with these teammates and responsibilities. Require plan approval before implementing any schema or workflow changes. Put the lead into delegate mode after spawning.
+
+- **TeamLead (delegate mode)**: coordination only, creates tasks, assigns owners, synthesizes results.
+- **CollectorEngineer**: web scraping + API collectors, rate limits, idempotency, retries/backfills.
+- **NormalizationSteward**: canonicalization of spreads/totals/moneylines; dedupe; invariants/tests.
+- **DataFreshnessSRE**: rolling 5-day window enforcement; staleness detection; alerting/escalation.
+- **MLTrainerEngineer**: feature views; training; evaluation; prediction artifacts.
+- **CostQuotaAnalyst (optional)**: API credit/usage budgeting; schedule optimization.
+
+Approval criteria for TeamLead:
+- Reject any plan that changes market sign conventions.
+- Reject any plan that allows stale inputs to silently pass.
+- Reject any plan that introduces non-idempotent collectors.
+
+## Operational contract (GitHub Actions)
+
+GitHub Actions runs the scheduled pipeline. The code must support:
+- **Collect** → **Normalize/Validate** → **Train** → **Predict/Report** → **Freshness Guard**
+- Bounded backfill on staleness (5-day lookback).
+
@@ -0,0 +1,47 @@
+from __future__ import annotations
+
+import json
+import os
+import subprocess
+import sys
+
+
+def _run(cmd: list[str]) -> subprocess.CompletedProcess:
+    return subprocess.run(cmd, capture_output=True, text=True)
+
+
+def main() -> int:
+    # Read hook JSON input (best-effort; TaskCompleted/TeammateIdle always fire).
+    try:
+        _ = json.loads(sys.stdin.read() or "{}")
+    except Exception:
+        pass
+
+    # Only run gates if the odds pipeline package is importable in this environment.
+    probe = _run([sys.executable, "-c", "import odds_pipeline"])
+    if probe.returncode != 0:
+        print("odds_pipeline not installed; skipping odds pipeline gates", file=sys.stderr)
+        return 0
+
+    v = _run([sys.executable, "-m", "odds_pipeline", "validate"])
+    if v.returncode != 0:
+        print("odds pipeline validation failed", file=sys.stderr)
+        print(v.stdout, file=sys.stderr)
+        print(v.stderr, file=sys.stderr)
+        return 2
+
+    # Only enforce freshness if DATABASE_URL is present (so local doc work isn't blocked).
+    if os.getenv("DATABASE_URL"):
+        f = _run([sys.executable, "-m", "odds_pipeline", "freshness-guard", "--window-days", "5"])
+        if f.returncode != 0:
+            print("freshness guard failed", file=sys.stderr)
+            print(f.stdout, file=sys.stderr)
+            print(f.stderr, file=sys.stderr)
+            return 2
+
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
+
@@ -0,0 +1,34 @@
+---
+paths:
+  - "odds/**"
+  - ".github/workflows/odds-*.yml"
+---
+
+# Data freshness + rolling window rules (5 days)
+
+This project’s ML outputs are only valid if inputs are **fresh** and the training/inference data is bounded to a **rolling 5-day window**.
+
+## Freshness SLO
+
+Fail the pipeline if any required input stream is stale.
+
+Recommended defaults (tune per sport/market cadence):
+- **Odds snapshots**: stale if `max(collected_at)` older than **180 minutes**
+- **Scores/finals**: stale if `max(collected_at)` older than **24 hours**
+
+## Rolling 5-day window
+
+All downstream datasets (features, training rows, prediction features) must be computable from the last **5 days** of canonical + raw inputs.
+
+Implementation requirements:
+- Every compute job must accept `--window-days 5` (default 5).
+- Normalization must support backfill with an explicit `--lookback-days 5`.
+- Any retention/pruning job must never delete within the active 5-day window.
+
+## Backfill on staleness
+
+If freshness checks fail:
+- run a bounded backfill (lookback 5 days)
+- re-run normalization + validation
+- re-check freshness before training/predicting
+
@@ -0,0 +1,25 @@
+---
+paths:
+  - "odds/**"
+---
+
+# ML training + prediction contracts
+
+This project is designed so that models can be trained and evaluated deterministically from canonical data in the last 5 days.
+
+## Requirements
+
+- Training jobs must:
+  - log the dataset time window used
+  - record training timestamp and model version identifier
+  - output evaluation metrics (at minimum: calibration/accuracy proxies appropriate to the target)
+- Prediction jobs must:
+  - refuse to run if freshness checks fail
+  - attach the model version + data window to every prediction artifact
+
+## Targets
+
+- **Spreads**: predict cover probability from the team perspective (requires consistent sign conventions).
+- **Moneylines**: predict win probability (compare to implied probs for edge).
+- **Totals**: predict over probability relative to the canonical total.
+
@@ -0,0 +1,43 @@
+---
+paths:
+  - "odds/**"
+---
+
+# Odds normalization rules (canonical markets)
+
+These rules prevent sign-convention errors and ensure math is consistent across collectors, analytics, and ML training.
+
+## Key distinction
+
+**Favorite/underdog is determined by spread sign; home/away is venue and independent.** Do not conflate them.
+
+## Spreads (one canonical value per game)
+
+Sportsbooks/APIs often return **two outcomes per event** with opposite signs (e.g., -7 and +7). Those represent the *same* market.
+
+Store exactly **one canonical record per event/book/collected_at** using either:
+
+- **Option A (allowed)**: store the **favorite spread** (always negative or 0).
+- **Option B (preferred)**: store `spread_magnitude` (always positive) and explicit `favorite_team`/`underdog_team`.
+
+Never average raw `point` values that include both + and -.
+
+## Totals (one canonical value per game)
+
+Over/Under are two prices on the same number. Store one `total` value plus `over_price`/`under_price`.
+
+## Moneylines (use implied probability for math)
+
+American odds must be converted to implied probability before any aggregation or modeling.
+
+For American odds \(o\):
+
+- If \(o < 0\): \(p = |o| / (|o| + 100)\)
+- If \(o > 0\): \(p = 100 / (o + 100)\)
+
+Never average American odds directly.
+
+## Line movement convention (favorite perspective)
+
+If tracking spread movement, compute deltas from the **favorite’s spread** (negative). This avoids mixing perspectives.
+
@@ -0,0 +1,25 @@
+---
+paths:
+  - "odds/**"
+---
+
+# Storage and schema contract
+
+GitHub Actions runners are ephemeral. **All pipeline state must live in persistent storage**.
+
+## Required environment variables
+
+- `DATABASE_URL`: Postgres connection string for the persistent store.
+- `ODDS_API_KEY`: The Odds API key (or equivalent) for collectors.
+
+## Schema principles
+
+- **Raw tables**: append-only snapshots; never mutated in place.
+- **Canonical tables**: derived from raw via normalization; can be re-derived deterministically.
+- **Idempotency**: collectors must not create duplicates for the same `(source,event_id,market,bookmaker,collected_at)` tuple.
+- **Time zone**: store timestamps in UTC and only convert for presentation.
+
+## Market canonicalization
+
+Canonical tables must follow the rules in `odds-normalization.md`.
+
@@ -0,0 +1,29 @@
+{
+  "env": {
+    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
+  },
+  "teammateMode": "in-process",
+  "hooks": {
+    "TaskCompleted": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "python \"$CLAUDE_PROJECT_DIR/.claude/hooks/task_gate.py\""
+          }
+        ]
+      }
+    ],
+    "TeammateIdle": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "python \"$CLAUDE_PROJECT_DIR/.claude/hooks/task_gate.py\""
+          }
+        ]
+      }
+    ]
+  }
+}
+
@@ -0,0 +1,39 @@
+---
+name: betting-data-normalizing
+description: Mandatory normalization rules for spreads, totals, and moneylines. Use for ANY sports betting analytics or ML work in this repo.
+---
+
+# Betting Data Normalizing (repo standard)
+
+## Spreads
+
+- APIs/books often return two outcomes per game with opposite signs (e.g., -7 and +7). They represent the **same** spread.\n
+- Store **one canonical record per game**.\n
+- Recommended representation:\n
+  - `spread_magnitude`: always positive\n
+  - `favorite_team` and `underdog_team`\n
+  - prices for each side\n
+\n
+Never average raw point values that mix negative and positive spreads.
+
+## Totals
+
+- Store **one total** per game plus `over_price` and `under_price`.\n
+- Do not store separate Over/Under rows as separate totals.
+
+## Moneylines
+
+- Convert American odds to implied probability before doing math.\n
+\n
+If `odds < 0`:\n
+`p = abs(odds) / (abs(odds) + 100)`\n
+\n
+If `odds > 0`:\n
+`p = 100 / (odds + 100)`\n
+\n
+Never average American odds directly.
+
+## Movement convention
+
+Track spread movement from the **favorite’s perspective** (negative spread). This avoids mixing perspectives between teams.
+
@@ -0,0 +1,28 @@
+---
+name: odds-collecting
+description: Collect scores and odds in a rolling window with retries, deduplication, and freshness guarantees.
+---
+
+# Odds Collecting (repo standard)
+
+## Goals
+
+- Keep data fresh for a rolling **5-day** ML window.\n
+- Make collectors **idempotent** and safe to rerun.\n
+- Track costs/quotas and avoid redundant polling.\n
+
+## Collector requirements
+
+- Always accept explicit arguments:\n
+  - `--lookback-days` (default 5)\n
+  - `--sport` (e.g., `basketball_ncaab`)\n
+  - `--regions` and `--markets` when applicable\n
+- Always write timestamps in UTC (`collected_at`).\n
+- Use `event_id` (or equivalent) as the primary dedupe key.\n
+- Handle rate limits with exponential backoff.\n
+
+## Freshness
+
+- Provide a `freshness_guard` command that fails when data is stale.\n
+- On staleness, run bounded backfill (lookback 5 days), then re-normalize.\n
+
@@ -0,0 +1,59 @@
+name: Odds pipeline - collect odds + normalize
+
+permissions: read-all
+
+on:
+  workflow_dispatch:
+    inputs:
+      sport:
+        description: Sport key (e.g. basketball_ncaab)
+        required: false
+        default: basketball_ncaab
+      regions:
+        description: Regions (comma-separated)
+        required: false
+        default: us
+      markets:
+        description: Markets (comma-separated)
+        required: false
+        default: h2h,spreads,totals
+  schedule:
+    - cron: "*/15 * * * *"
+
+jobs:
+  collect-odds:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v4
+
+      - name: Set up uv
+        uses: astral-sh/setup-uv@v3
+
+      - name: Init schema (idempotent)
+        env:
+          DATABASE_URL: ${{ secrets.DATABASE_URL }}
+        working-directory: odds
+        run: uv run python -m odds_pipeline.schema
+
+      - name: Collect odds snapshots
+        env:
+          DATABASE_URL: ${{ secrets.DATABASE_URL }}
+          ODDS_API_KEY: ${{ secrets.ODDS_API_KEY }}
+        working-directory: odds
+        run: >
+          uv run python -m odds_pipeline collect-odds
+          --sport "${{ inputs.sport || 'basketball_ncaab' }}"
+          --regions "${{ inputs.regions || 'us' }}"
+          --markets "${{ inputs.markets || 'h2h,spreads,totals' }}"
+
+      - name: Normalize (rolling window)
+        env:
+          DATABASE_URL: ${{ secrets.DATABASE_URL }}
+        working-directory: odds
+        run: uv run python -m odds_pipeline normalize --window-days 5
+
+      - name: Validate invariants (fast)
+        working-directory: odds
+        run: uv run python -m odds_pipeline validate
+