Real-time anomaly detection platform for professional football. Transforms high-frequency telemetry (GPS, HR, accelerometry) into actionable coaching decisions using a shared-backbone LSTM autoencoder, regime-aware per-player calibration, and a multi-layer explainability suite — all engineered to maintain a < 200 ms inference SLA under distributed failure conditions.
- System Architecture
- Event Lifecycle
- Tech Stack
- File Map
- Quick Start
- Installation
- Configuration
- CLI Reference
- Data Schema
- ML Pipeline
- Explainability (XAI)
- Temporal State Compression
- Cache-Augmented Generation (Redis CAG)
- Reliability & Hardening
- Replay Consistency Guarantees
- Fairness & Recalibration
- Kinexon Real-Data Pilot Pipeline
- Logging & Observability
- Exit Codes
- Known Limitations & Roadmap
- References
Telemetry Stream (GPS/REST/WS/MQTT)
│
▼
┌─────────────────────────┐
│ Ingestion Layer │ GPS NMEA · SportRadar REST · WebSocket · MQTT (QoS 1)
└────────────┬────────────┘
│ RawPlayerObservation
▼
┌─────────────────────────┐
│ Pre-Accumulation │──→ [Reject timestamp reversals]
│ Temporal Guard │──→ [Detect epoch discontinuities]
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ LiveWindowAccumulator │ Per-player ring buffer; stride = window_size
│ (24-event windows) │ Emits one non-overlapping window per 24 events,
└────────────┬────────────┘ reducing overlap-induced persistence amplification
│
▼
┌─────────────────────────┐
│ Post-Window TVL │──→ [Physical plausibility validation]
│ Semantic Validation │ VALID · DEGRADED · INVALID
└────────────┬────────────┘
│ List[dict] window
▼
┌────────────────────────────────────────────────────┐
│ Pattern Analysis Engine │
│ ┌──────────────────────────────────────────────┐ │
│ │ SharedBackboneAutoencoder (LSTM + FiLM) │ │
│ │ · Shared encoder across all players │ │
│ │ · Per-player FiLM conditioning embeddings │ │
│ │ · Per-player normaliser (µ/σ per feature) │ │
│ └──────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────┐ │
│ │ RegimeAwareThresholdStore │ │
│ │ 9 regimes: Territory(3) × Intensity(3) │ │
│ │ · Per-regime DynamicThresholdTracker │ │
│ │ · Fallback to global tracker if under-cal │ │
│ └──────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Auxiliary Detectors │ │
│ │ · FatigueCurveAnalyzer (speed decay fit) │ │
│ │ · PositionalDriftAnalyzer (GPS centroid) │ │
│ │ · WorkloadTrendTracker (ACWR 0.8–1.5) │ │
│ └──────────────────────────────────────────────┘ │
└────────────────────┬───────────────────────────────┘
│ AnomalyResult
▼
┌─────────────────────────────────────────────────────┐
│ Explainability Suite (XAI) │
│ · Temporal Feature Ablation (F+2 model calls) │
│ · SHAP KernelExplainer (if shap installed) │
│ · SemanticInterpreter (symbolic reasoning) │
│ · LLMNLGEngine (Qwen2.5:14b) ─→ TemplateNLGEngine │
└────────────────────┬────────────────────────────────┘
│ SemanticFindings + SHAP attributions
▼
┌─────────────────────────────────────────────────────┐
│ Redis CAG Layer │ ◄── Cache-Augmented Generation
│ · Per-player SHAP attribution cache │
│ · SemanticFinding history (sorted sets, TTL-gated) │
│ · Augments SemanticInterpreter with cached context │
│ without re-running SHAP over past windows │
└────────────────────┬────────────────────────────────┘
│ Augmented findings
▼
┌─────────────────────────────────────────────────────┐
│ Temporal State Compression │
│ · Trajectory narrative builder │
│ · Escalation summary encoder │
│ · Episodic abstraction (episode_id-scoped) │
│ Compresses finding stream → structured LLM prompt │
└────────────────────┬────────────────────────────────┘
│ Compressed state + SHAPExplanation
▼
┌─────────────────────────────────────────────────────┐
│ Alert FSM (AlertManager) │
│ NONE → WARNING → SUSTAINED → CRITICAL │
│ HOLD (telemetry blackout) │
│ SAFE_MODE (system-wide scientific invalidation) │
└────────────────────┬────────────────────────────────┘
│ Recommendation + NDJSON alert
▼
Coach Dashboard / stdout
│
▼
┌─────────────────────────────────────────────────────┐
│ Feedback & Recalibration Loop │
│ · Coach override logging (OverrideRecord) │
│ · FairnessMonitor (position · age_group · nation.) │
│ · RecalibrationPipeline (7-day cadence) │
│ · MutationJournal (versioned threshold audit) │
└─────────────────────────────────────────────────────┘
A single telemetry event passes through ten distinct processing stages before reaching the coach. This diagram provides the mental model for navigating the codebase.
Raw Telemetry Event (GPS · HR · accelerometry)
│
│ player_external_id, ts, speed_ms, heart_rate_bpm, …
▼
┌───────────────────────┐
│ 1. Validity Gate │ Pre-accumulation timestamp guard
│ (TVL) │ Epoch discontinuity → buffer reset
└──────────┬────────────┘ INVALID → dropped | DEGRADED → flagged
│
▼
┌───────────────────────┐
│ 2. Sequence Window │ LiveWindowAccumulator ring buffer
│ (24-event stride) │ Emits one window per 24 raw packets
└──────────┬────────────┘ Post-window plausibility re-check (TVL)
│
▼
┌───────────────────────┐
│ 3. Shared Model │ SharedBackboneAutoencoder
│ (LSTM + FiLM) │ Regime-routed threshold comparison
└──────────┬────────────┘ EMA-smoothed anomaly score
│
▼
┌───────────────────────┐
│ 4. Attribution │ Temporal Feature Ablation (F+2 calls)
│ (SHAP / Ablation) │ SHAP KernelExplainer when available
└──────────┬────────────┘ Magnitude-proxy fallback (shap_compat)
│
▼
┌───────────────────────┐
│ 5. Semantic Findings │ SemanticInterpreter
│ │ SHAP weights → typed SemanticFinding
└──────────┬────────────┘ Domains: cardiovascular · locomotor ·
│ workload · tactical · persistence
▼
┌───────────────────────┐
│ 6. Redis CAG │ Augment current findings with
│ (Context Cache) │ cached SHAP history + prior findings
└──────────┬────────────┘ Deterministic, zero-retrieval-latency
│ per-player longitudinal context
▼
┌───────────────────────┐
│ 7. State Compression │ MatchStateManager
│ │ Finding stream → trajectory narrative
└──────────┬────────────┘ Motif detection · escalation summary ·
│ episodic abstraction (episode_id-scoped)
▼
┌───────────────────────┐
│ 8. Policy Engine │ AlertManager FSM
│ (Alert FSM) │ Hysteresis · cooldown · Safe Mode
└──────────┬────────────┘ Recommendation priority ladder
│
▼
┌───────────────────────┐
│ 9. NLG Layer │ LLMNLGEngine (Qwen2.5:14b, async)
│ │ Receives compressed state only —
└──────────┬────────────┘ not raw telemetry or full history
│ TemplateNLGEngine fallback (<1 ms)
▼
┌───────────────────────┐
│ 10. Coach Dashboard │ NDJSON alert → stdout
│ │ nlg_summary · top_features ·
└───────────────────────┘ counterfactual · latency_ms
SLA boundary: The 200 ms clock runs from stage 2 (window emission) through stage 8 (alert FSM output). Stages 9–10 are asynchronous and off the SLA clock.
| Library / Model | Version | Role |
|---|---|---|
PyTorch (torch, torch.nn, torch.optim, DataLoader) |
≥ 2.0 | Shared LSTM backbone, Transformer AE, batch training, checkpoint serialisation |
| scikit-learn | ≥ 1.3 | ROC-AUC / PR-AUC / precision@k evaluation; KMeans background summarisation for SHAP |
SciPy (stats.zscore, optimize.curve_fit, integrate.trapezoid) |
≥ 1.11 | Z-score baselines, exponential fatigue curve fitting, trapezoid distance integration |
SHAP (KernelExplainer, shap.kmeans) |
≥ 0.42 | Feature-level attribution (graceful magnitude-proxy fallback when unavailable) |
| Qwen2.5:14b via Ollama | Local HTTP | LLM NLG coaching summaries; configurable timeout, deterministic template fallback |
| Library | Role |
|---|---|
| NumPy | Sequence windowing, proxy SHAP computation, batch array ops |
| Pandas | CSV I/O, timestamp parsing/coercion, rolling baseline aggregation |
| Library | Protocol | Role |
|---|---|---|
| aiohttp | HTTP / REST | SportRadar / Opta API polling adapter; exponential-backoff retry |
| websockets | WebSocket | Live match event stream adapter |
asyncio-mqtt (aiomqtt) |
MQTT (QoS 1) | Wearable sensor bridge (HR, accelerometry) |
| pynmea2 | NMEA 0183 | GPS sentence parsing from serial port or TCP/gpsd |
| asyncio | — | Single-event-loop async I/O for all ingestion adapters |
| Component | Notes |
|---|---|
| PostgreSQL | Primary store; psycopg2 for sync ORM, asyncpg for async paths |
| SQLAlchemy | ORM models — Player, Session, PlayerEvent, audit logs |
| Redis | CAG backing store; per-player SHAP attribution cache and SemanticFinding history (sorted sets, TTL-gated); deterministic context augmentation without retrieval latency |
| Module | Usage |
|---|---|
argparse |
Five-subcommand CLI with typed arguments and defaults |
logging |
Structured logging; JSON formatter enabled by JSON_LOGS=1 |
hashlib |
Event fingerprinting for exactly-once semantics |
threading |
Lock for HardenedRollingThresholdStore thread safety |
collections.deque |
LiveWindowAccumulator per-player ring buffers |
dataclasses |
All domain objects (AnomalyResult, SHAPExplanation, WindowRegime, etc.) |
time.monotonic |
Alert cooldown gate; SLA latency measurement |
| Library | Behaviour when absent |
|---|---|
shap |
Falls back to shap_compat.py magnitude-proxy attribution |
torch |
Stub mode — no inference, pipeline still importable |
sklearn |
ROC-AUC / PR-AUC disabled; evaluate exits 3 |
tqdm |
Progress bars replaced with logger.info() calls |
pynmea2 |
GPS serial/TCP adapter disabled; REST + WS still work |
aiohttp |
REST polling adapter disabled |
redis |
CAG disabled; SemanticInterpreter operates without cached history |
| File | Class / Entry Point | Responsibility |
|---|---|---|
main.py |
main() |
Production CLI entrypoint (generate · train · evaluate · serve · audit) |
analysis/orchestrator.py |
PlayersDataAnalysisPipeline |
Wires ingestion → TVL → ML → XAI → CAG → compression → FSM → feedback; match lifecycle |
analysis/anomaly_detection.py |
SharedBackboneAutoencoder, PatternAnalysisEngine |
LSTM AE training + inference, threshold calibration, positional drift |
analysis/baseline.py |
BaselineBuilder, PlayerBaselineProfile |
28-day rolling baselines, fatigue curve fitting, ACWR tracking |
analysis/regime.py |
SessionRegimeClassifier, RegimeAwareThresholdStore |
9-regime (Territory × Intensity) window classification and threshold routing |
analysis/match_state.py |
MatchStateManager, SemanticMatchState |
Longitudinal match memory, motif detection, trend reasoning, state compression |
analysis/live_window_accumulator.py |
LiveWindowAccumulator |
Per-player ring buffer; emits fixed-stride inference windows |
analysis/telemetry_validity.py |
TelemetryValidityLayer |
Physical plausibility gate (VALID / DEGRADED / INVALID); replay-aware timestamp validation |
| File | Class | Responsibility |
|---|---|---|
explainability/xai_layer.py |
XAILayer, LLMNLGEngine, TemplateNLGEngine |
Temporal feature ablation, SHAP routing, Qwen2.5:14b NLG |
explainability/semantics_layer.py |
SemanticInterpreter |
Symbolic physiological reasoning — cardiovascular, locomotor, workload, tactical |
explainability/shap_compat.py |
compute_shap_values, build_kmeans_background |
SHAP with magnitude-proxy fallback; background deduplication guard |
explainability/episodic_context.py |
TemporalContextCompressor, CompressedTemporalContext, PlayerEpisode, TacticalEpisode |
Compresses SemanticFinding streams into trajectory narratives, escalation summaries, and episodic abstractions before LLM conditioning |
| File | Class | Responsibility |
|---|---|---|
cag/redis_client.py |
RedisCheckpointStore, EpisodeStore |
Per-player SHAP attribution cache and SemanticFinding history; sorted-set TTL management |
cag/redis_client.py |
RedisPubSubClient, RedisConnectionPool |
Pub/sub event streaming; connection pool management |
| File | Class | Responsibility |
|---|---|---|
utils/reliability/invariants.py |
SystemInvariantGuard |
Machine-enforced system invariants; triggers graded Safe Mode |
utils/reliability/safe_mode.py |
SafeModeController |
Four-level degradation: NORMAL → LEVEL_1 → LEVEL_2 → LEVEL_3 |
utils/reliability/determinism.py |
MutationJournal, TemporalCausalityGuard |
Versioned calibration log, strict event-time monotonicity |
utils/reliability/calibration_store.py |
HardenedRollingThresholdStore |
Quarantine buffers, drift monitoring, thread-safe threshold store |
utils/reliability/adaptation_engine.py |
DeterministicCalibrationManager |
Crash-safe, versioned calibration updates |
utils/reliability/queue_manager.py |
BoundedPriorityQueue |
Priority-aware backpressure; sheds LLM tasks before SHAP before inference |
| File | Class | Responsibility |
|---|---|---|
ingestion/pipeline.py |
GPSIngestionAdapter, SportRadarAPIAdapter, IngestionPipeline |
NMEA/TCP GPS, REST polling, WebSocket events, MQTT sensor bridge |
ingestion/dataset_discovery.py |
DatasetDiscoveryService |
Scans data/incoming/, classifies Kinexon export CSVs by column headers (never filename), extracts metadata, pairs positions/statistics/events by timestamp+roster overlap, organizes into data/raw_matches/<session_id>/ |
config/settings.py |
PlayersDataConfig |
All configuration via environment variables and typed dataclasses |
config/ollama_client.py |
OllamaClient |
Async HTTP wrapper for Qwen2.5:14b; response caching, timeout guard |
utils/schema.py |
ORM models | SQLAlchemy models for Player, Session, PlayerEvent, audit log |
utils/ema.py |
EMASmoother |
Exponential moving average for anomaly score smoothing (α = 0.25) |
utils/alert_manager.py |
AlertManager |
Deterministic FSM with hysteresis, cooldown gate, Safe Mode propagation |
utils/evaluation/episodes.py |
extract_episodes, match_episodes |
Binary → episode conversion; TP/FP/FN at episode level |
| File | Responsibility |
|---|---|
data/data_generator.py |
v4 Decision-Agent synthetic data simulator; realistic anomaly seeding |
See Kinexon Real-Data Pilot Pipeline for the full picture.
| File | Class / Entry Point | Responsibility |
|---|---|---|
ingestion/kinexon_adapter.py |
KinexonAdapter |
Parses real Kinexon positions.csv/statistics.csv exports into RawPlayerObservations |
ingestion/kinexon_resampler.py |
KinexonResampler |
Resamples raw Kinexon ticks into 15 s buckets (8 base columns) |
ingestion/kinexon_events_features.py |
merge_event_features() |
Merges 24 window-aggregated events.csv features onto the resampled data — the 8→32 feature completion step |
analysis/player_workload.py |
compute_player_workload_windows, assign_workload_status |
Model-free, per-tick coach workload aggregation (distance, sprint/accel/decel load, workload status) |
analysis/player_workload_event.py |
PlayerWorkloadEvent |
Model-free dataclass published to analytics.player_workload |
analysis/player_analytics_event.py |
PilotPlayerAnalyticsEvent, to_pilot_player_analytics_event() |
Model-output dataclass published to analytics.players (reconstruction_loss, confidence, SHAP, regime) |
analysis/pilot_pipeline.py |
build_pipeline_and_load(), build_pipeline_and_train(), score_window_and_build_event() |
Shared Kinexon loading / checkpoint-load / per-window scoring logic — single home, used by main.py publish and scripts/evaluate_pilot_model.py |
scripts/publish_player_workload.py |
main() |
One-shot batch publisher, model-free, 32-feature loader |
main.py publish --historical-replay |
cmd_publish() |
One-shot batch publisher, real promoted-checkpoint inference, 32-feature loader |
main.py publish --continuous |
cmd_publish() |
Continuous production runtime — paced replay through LiveWindowAccumulator, publishes incrementally |
scripts/evaluate_pilot_model.py |
main() |
Standalone deep-diagnostic evaluation report (calibration audit, SHAP examples, per-window CSV) |
# 1. Generate 2 seasons of synthetic training data
python main.py generate --seasons 2 --matchdays 38
# 2. Train shared backbone + calibrate per-player thresholds
python main.py train --sessions-per-player 60
# 3. Evaluate against ground truth labels (CI gate: AUC >= 0.70)
python main.py evaluate --out metrics/eval.json --min-auc 0.70
# 4. Stream live inference (NDJSON in -> NDJSON alerts out)
cat live_events.jsonl | python main.py serve
# 5. Replay historical data (interleaved multi-session streams)
cat historical_events.jsonl | python main.py serve --replay-mode
# 6. Run fairness audit + recalibration check
python main.py audit --log logs/inference_log.jsonl# Core ML & data
pip install torch scikit-learn numpy pandas scipy shap
# Ingestion adapters
pip install aiohttp websockets asyncio-mqtt pynmea2
# Database drivers
pip install sqlalchemy psycopg2-binary asyncpg
# CAG backing store
pip install redis
# Optional: progress bars
pip install tqdm
# LLM backend — install Ollama separately, then pull the model
# https://ollama.com
ollama pull qwen2.5:14bPython 3.10+ required. PyTorch CPU is sufficient for inference; GPU is recommended for training large squads.
All configuration is driven by environment variables and typed dataclasses in config/settings.py. The singleton CONFIG = PlayersDataConfig() is imported throughout the codebase.
| Variable | Default | Description |
|---|---|---|
DB_HOST |
localhost |
PostgreSQL host |
DB_PORT |
5432 |
PostgreSQL port |
DB_NAME |
players_data |
Database name |
DB_USER |
postgres |
Database user |
DB_PASSWORD |
`` | Database password |
REDIS_HOST |
localhost |
Redis host for CAG store |
REDIS_PORT |
6379 |
Redis port |
REDIS_CAG_TTL_S |
3600 |
TTL for cached SHAP and SemanticFinding entries (seconds) |
GPS_SERIAL_PORT |
/dev/ttyUSB0 |
Serial port for NMEA GPS |
GPS_TCP_HOST |
None |
TCP host for gpsd / NMEA-over-TCP |
GPS_TCP_PORT |
2947 |
TCP port for gpsd |
SPORTRADAR_API_KEY |
`` | SportRadar API key |
LIVE_WS_URL |
ws://localhost:8765 |
Live match event WebSocket URL |
MQTT_BROKER |
localhost |
MQTT broker host |
JSON_LOGS |
0 |
Set to 1 for structured JSON log output to stderr |
OLLAMA_NLG_TIMEOUT_S |
30.0 |
Timeout for Qwen2.5:14b async NLG calls (off SLA clock) |
| Dataclass | Field | Default | Notes |
|---|---|---|---|
SequenceWindowConfig |
window_seconds |
120 |
Rolling window length |
SequenceWindowConfig |
step_seconds |
15 |
Must match DT_OUT in data generator |
LSTMAutoencoderConfig |
hidden_size |
64 |
LSTM hidden units |
LSTMAutoencoderConfig |
latent_dim |
16 |
Bottleneck dimension |
LSTMAutoencoderConfig |
max_epochs |
250 |
With patience=20 early stopping |
AnomalyScoringConfig |
mad_multiplier |
5.0 |
MAD multiplier for small calibration sets (<150 windows) |
AnomalyScoringConfig |
threshold_quantile |
0.995 |
Quantile for large calibration sets (>=150 windows) |
AnomalyScoringConfig |
score_ema_alpha |
0.25 |
EMA smoothing factor for anomaly scores |
SHAPConfig |
n_background_samples |
30 |
Background samples for feature ablation |
CompressionConfig |
max_findings_per_episode |
12 |
Finding cap before episodic abstraction triggers |
CompressionConfig |
trajectory_window_steps |
5 |
Window count for trajectory narrative construction |
FeedbackConfig |
recalibration_cadence_days |
7 |
Scheduled recalibration interval |
FairnessConfig |
flag_rate_disparity_threshold |
0.15 |
Max allowed flag-rate gap between groups |
All commands log to stderr and output machine-readable JSON to stdout.
python main.py generate [OPTIONS]
Options:
--data-dir PATH Output directory for CSVs [default: data]
--seasons INT Number of seasons to simulate [default: 2]
--matchdays INT Matchdays per season [default: 38]
--anomaly-rate FLOAT Fraction of sessions with seeded anomalies [default: 0.05]
--no-corruption Skip sensor corruption layer (cleaner, faster)
--quiet Suppress per-position summary table
--log-level LEVEL DEBUG | INFO | WARNING | ERROR [default: INFO]Output: Five CSVs written to --data-dir: players.csv, sessions.csv, events.csv, annotations.csv, ground_truth_labels.csv.
Exits 1 if validation fails (zero anomalies seeded, missing columns, empty events table).
python main.py train [OPTIONS]
Options:
--data-source {synthetic,kinexon} synthetic: five-CSV generated dataset (regression-
testing path). kinexon: real UWB tracking export via
KinexonAdapter -> KinexonResampler -> gap-aware
windowing -> BaselineBuilder.compute_with_fallback()
(preferred for production runs) [default: synthetic]
--data-dir PATH CSV/Kinexon-export source directory [default: data]
--model-dir PATH Checkpoint output directory [default: models]
--sessions-per-player INT Most-recent N sessions per player (synthetic only) [default: 60]
--session-id STR Kinexon session identifier to train on (kinexon only) [default: 3387]
--use-event-features kinexon only. Extend the 8 positions.csv-derived sequence
features with the 24 window-aggregated events.csv features
(acceleration/deceleration/sprint/jump/change-of-direction/
possession/pass/shot). Default off keeps the original
8-feature model byte-for-byte reproducible.
--checkpoint-path PATH If set, also copies the trained checkpoint here in addition
to --model-dir/shared_backbone.pt
--log-level LEVEL [default: INFO]Writes: models/shared_backbone.pt, models/train_summary.json, models/serve_state.json.
serve_state.json contains serialised per-player baselines and calibrated threshold distributions so serve can cold-start without retraining.
Exits 2 if training produces a degenerate model or the checkpoint is missing.
The real-data (Kinexon) pilot checkpoint currently promoted to models/shared_backbone.pt was trained with --data-source kinexon --use-event-features (32 features: 8 resampled + 24 event-derived). The shared loading/training logic this path uses lives in analysis/pilot_pipeline.py — see Kinexon Real-Data Pilot Pipeline below.
python main.py evaluate [OPTIONS]
Options:
--data-source {synthetic,kinexon} Same meaning as `train`'s flag [default: synthetic]
--data-dir PATH CSV/Kinexon-export source directory [default: data]
--model-dir PATH Checkpoint directory [default: models]
--session-id STR Kinexon session identifier (kinexon only) [default: 3387]
--use-event-features kinexon only — see `train`'s flag of the same name
--out PATH Metrics output (JSON) [default: metrics/eval.json]
--min-auc FLOAT synthetic only — CI gate: exit 3 if mean ROC-AUC below [default: 0.60]
--log-level LEVEL [default: INFO]--data-source synthetic: ROC-AUC, PR-AUC, precision@k, FP-per-90-min, TP/FP/FN/TN per player, against ground_truth_labels.csv. Aggregated as micro (global TP/FP sums) and macro (per-player mean). Exits 3 if mean ROC-AUC < --min-auc or no players produced evaluable windows.
--data-source kinexon: Real Kinexon sessions carry no ground-truth anomaly labels, so this path reports descriptive statistics instead of classification accuracy — reconstruction-loss/confidence distributions, per-player calibration coverage, raw (pre-EMA) threshold-breach rate. --min-auc has no effect here (logged, not silently dropped).
Deeper diagnostics (calibration-state audit, SHAP examples on real windows, full per-window CSV export) are intentionally not duplicated into this command — they remain in scripts/evaluate_pilot_model.py, a retained diagnostic tool (see Validation & Diagnostic Scripts).
python main.py publish [OPTIONS]
Options:
--historical-replay One-shot batch publish: scores and publishes every real
session window immediately, then exits. [default mode]
--continuous Long-running paced replay: real per-tick rows fed one at a
time through LiveWindowAccumulator; publishes incrementally
as each player's window completes. Also publishes
analytics.player_workload per tick.
--model-dir PATH Directory containing the promoted shared_backbone.pt [default: models]
--tick-interval-seconds FLOAT --continuous only: pacing between ticks [default: 0.2]
--max-ticks INT --continuous only: stop after N ticks (verification runs)
--log-level LEVEL [default: INFO]LOADS the promoted checkpoint (analysis/pilot_pipeline.py::build_pipeline_and_load(), no fitting) and publishes PilotPlayerAnalyticsEvent entries to the analytics.players Redis Stream for Backend's AnalyticsBridgeService / SSE relay / Frontend "Player Analytics" tab. Never retrains — per-player threshold calibration is recomputed against the loaded model (that state isn't persisted to disk), but the backbone's weights are the promoted checkpoint's, unmodified.
Workflow for new matches: drop files → run ingest → everything updates automatically. No manual renaming, no hardcoded session IDs, no manual match registration.
- Download Kinexon exports for a match (whatever filenames Kinexon gives them —
Bergischer_HC_vs._SC_Magdeburg_Match_positions.csv,-Overview-Match_THW_Kiel_vs__SC_Magdeburg-hz_01_hz_02.csv, etc. all work as-is). - Copy/drop those CSVs into
data/incoming/(flat, any filenames). - Run
python main.py ingest. - The system automatically: discovers and classifies every file by its actual columns (not filename), extracts session_id/date/player_count/team names from file contents, pairs positions+statistics+events files by timestamp-window and roster overlap, organizes complete bundles into
data/raw_matches/<session_id>/, skips anything already ingested, then runs the existing Parquet/player-trends/match-inventory pipeline over every known match.
python main.py ingest [OPTIONS]
Options:
--data-dir PATH Root directory containing match_<id>/ subdirectories [default: data]
--output-dir PATH Output directory for matches/players/events/positions.parquet + reports [default: data/processed]
--incoming-dir PATH Drop zone for newly-downloaded Kinexon export CSVs [default: data/incoming]
--raw-matches-dir PATH Canonical organized-by-session_id home for discovered bundles [default: data/raw_matches]
--log-level LEVEL [default: INFO]Step 0 (new) — Dataset Discovery (ingestion/dataset_discovery.py::DatasetDiscoveryService): scans --incoming-dir recursively for *.csv, classifies each by column-header inspection only (ts in ms + x in m → positions; Session ID + a Distance (m)-style column → statistics; Timestamp (ms) + Player ID + Event type → events — never the filename), extracts session_id/date/player_count/team_name/opponent_name from each file's actual data rows. Since only statistics.csv carries an explicit Session ID, positions/events files are paired to a statistics bundle by timestamp-window overlap (tie-broken by player-roster overlap) — also metadata, never filename. Complete bundles (has_positions and has_statistics) are moved into --raw-matches-dir/<session_id>/{positions,statistics,events}.csv; already-organized sessions are detected and left untouched (status: "duplicate"), incomplete bundles are left in --incoming-dir (status: "incomplete") for you to investigate rather than silently dropped. Writes the full pre-ingestion readiness table to --output-dir/discovery_inventory.json.
Every --raw-matches-dir/<session_id>/ directory (this run's and every prior run's) is then exposed to the unmodified MultiMatchDatasetBuilder as --data-dir/match_<session_id>/ via symlinks — zero hardcoded session IDs anywhere in this chain, and idempotent (safe to re-run; existing symlinks are left alone).
Step 1 (unchanged) — Multi-Match Dataset Pipeline: scans --data-dir for match_<id>/{positions.csv,events.csv,statistics.csv} subdirectories, validates each export, and writes/updates Parquet datasets plus dataset_summary.json, data_quality_report.json, and match_inventory.json under --output-dir. Also writes player_trends.json (per-player physical/workload metrics). Incremental — a match directory whose files are unchanged since the last run is not re-parsed. Exits 1 if 0 match_* directories are found (including any just organized by Step 0).
Reads newline-delimited JSON events from stdin. Emits NDJSON alerts to stdout. Writes a full inference log (including non-alert windows) to logs/inference_log.jsonl.
python main.py serve [OPTIONS]
Options:
--model-dir PATH Checkpoint directory [default: models]
--min-alert-windows INT Consecutive anomalous windows before alert [default: 3]
--max-latency-ms INT SLA threshold; violations logged as WARNING [default: 200]
--ignore-time-gaps Disable time-gap buffer resets (use for batch replay)
--ignore-session-boundaries Disable session-boundary resets (use for interleaved replay)
--replay-mode Replay-safe mode; implies --ignore-time-gaps and
--ignore-session-boundaries. Also relaxes TVL timestamp
validation: reversals and large gaps produce DEGRADED
(not INVALID) so inference is not silently dropped.
--log-level LEVEL [default: INFO]SLA model: The 200 ms SLA covers inference only (LSTM forward pass + threshold comparison + state compression). LLM NLG generation runs asynchronously off the SLA clock via a thread pool with a 30 s timeout. Two latency figures are observable:
| Metric | What it covers | Where it appears |
|---|---|---|
latency_ms in alert payload |
Inference + compression (T1) | stdout NDJSON, inference log |
| Ollama call duration | Async NLG completion (T2) | Slow Ollama call WARNING in stderr |
Input event fields (NDJSON, one event per line):
| Field | Type | Required | Notes |
|---|---|---|---|
player_external_id |
str |
Yes | Must match a registered player |
ts |
str (ISO 8601) |
Yes | UTC timestamp |
match_id / session_id |
str |
— | Used for session-boundary detection |
speed_ms |
float |
Yes | Instantaneous speed in m/s |
heart_rate_bpm |
int |
Yes | BPM |
x_pitch |
float |
— | Normalised pitch X [0, 100] |
y_pitch |
float |
— | Normalised pitch Y [0, 100] |
distance_delta_m |
float |
— | Distance covered since last tick |
is_sprint |
bool |
— | True if speed >= 7.0 m/s |
elapsed_seconds |
float |
— | Seconds into session (used for fatigue enrichment) |
Alert output payload (NDJSON to stdout on alert):
{
"player_id": 7,
"external_id": "p007",
"recommendation_type": "substitute",
"confidence": 0.923,
"anomaly_score": 0.418,
"fatigue_flag": true,
"drift_flag": false,
"workload_flag": false,
"workload_status": "normal",
"nlg_summary": "Muller shows 28% speed drop and elevated HR non-recovery...",
"counterfactual": "Alert would clear if speed_ms increased by 1.2 m/s.",
"top_features": [
{"feature": "hr_recovery", "shap": 0.142, "value": -0.31, "label": "HR not recovering"},
{"feature": "speed_ms", "shap": 0.097, "value": 3.1, "label": "Below normal speed"}
],
"latency_ms": 47.3,
"ts": "2025-09-14T19:42:11Z",
"gate_windows": 4
}Recommendation priority ladder (at most one per inference cycle):
| Priority | recommendation_type |
Trigger condition |
|---|---|---|
| 1 | substitute |
Recurrent cross-match pattern + sustained persistence (≥4 windows) + high/critical escalation |
| 2 | recovery_intervention |
Cardiovascular or recovery degradation finding, sustained (≥3 windows), high/critical severity |
| 3 | workload_restriction |
Fatigue accumulation finding OR ACWR ≥ 1.30, sustained ≥2 windows |
| 4 | tactical_adjustment |
Tactical instability finding, any severity |
| 5 | performance_monitor |
Locomotor overload finding with worsening trend |
| 6 | anomaly_flag |
Default fallback; no specific rule matched |
python main.py audit [OPTIONS]
Options:
--log PATH Inference log path (NDJSON or JSON array) [default: logs/inference_log.jsonl]
--data-dir PATH CSV directory (for player metadata) [default: data]
--out PATH Audit report output (JSON) [default: metrics/audit.json]
--log-level LEVEL [default: INFO]Checks for flag-rate disparity across three protected attributes: position, age_group, nationality. Triggers RecalibrationPipeline if >= 10 override records are present in the log.
Exits 5 if bias is detected in any protected group (flag-rate disparity > fairness.flag_rate_disparity_threshold).
python main.py status [OPTIONS]
Options:
--model-dir PATH Directory containing shared_backbone.pt / train_summary.json [default: models]
--data-dir PATH Directory containing dataset_summary.json / match_inventory.json [default: data/processed]Read-only, structured-JSON report to stdout: model loaded/version/trained_at, matches available/total/failed, and live Redis last-publish timestamps per stream (analytics.players, analytics.player_workload). Computed entirely from already-written artifacts plus a live Redis ping — nothing here is recomputed. Always exits 0; a fresh checkout with no model trained yet is a normal reported state, not a crash. This is PlayerDynamics' half of the platform health check (see backend/README.md#health-check for the other half, which reads the same artifacts from the Backend side).
Five CSVs are produced by generate and consumed by train / evaluate:
| File | Key columns |
|---|---|
players.csv |
player_id, external_id, full_name, position, age, age_group, nationality |
sessions.csv |
session_id, player_id, started_at, ended_at |
events.csv |
session_id, ts, speed_ms, heart_rate_bpm, x_pitch, y_pitch, distance_delta_m, is_sprint, elapsed_seconds |
annotations.csv |
session_id, annotated_at, annotation_type, note |
ground_truth_labels.csv |
session_id, is_anomaly |
Eight features extracted per 15-second tick, forming 8-step (120 s) windows:
| Index | Name | Description |
|---|---|---|
| 0 | speed_ms |
Instantaneous speed (m/s) |
| 1 | accel |
Acceleration (m/s²), clamped ±10 |
| 2 | heart_rate_bpm |
HR (BPM) |
| 3 | sprint_flag |
Binary; 1 if speed >= 7.0 m/s |
| 4 | x_pitch |
Normalised pitch X [0, 100] |
| 5 | y_pitch |
Normalised pitch Y [0, 100] |
| 6 | distance_delta |
Euclidean displacement since last tick (m) |
| 7 | hr_recovery |
Fractional HR change per tick, clipped [-1, 1] |
- Architecture: Shared LSTM encoder → FiLM (Feature-wise Linear Modulation) per-player conditioning embedding → bottleneck (latent dim 16) → LSTM decoder.
- Training: All registered players jointly. Per-player embeddings are learned alongside shared weights. Per-player µ/σ normalisers applied before encoding.
- Calibration split: 80% training, 20% held-out calibration per player. For large calibration sets (>=150 windows):
quantile(losses, 0.995). For small sets (<150 windows):median + 5.0 × MAD × 1.4826. - Threshold routing: At inference,
SessionRegimeClassifierlabels the window (Territory × Intensity → 9 possible keys). The corresponding regime tracker is used; falls back to global tracker when a regime has <5 calibration samples. - Score smoothing: EMA with α=0.25 applied to per-window reconstruction losses before threshold comparison.
Every 120-second window is classified on two axes:
| Axis | Class | Criterion |
|---|---|---|
| Territory | defensive |
mean x_pitch < 33 |
midfield |
33 <= mean x_pitch <= 67 | |
attacking |
mean x_pitch > 67 | |
| Intensity | high |
sprint fraction >= 15% of window steps |
medium |
4% <= sprint fraction < 15% | |
low |
sprint fraction < 4% |
Each regime maintains its own DynamicThresholdTracker. This distinguishes "normal high-intensity pressing" from "abnormal physiological distress" during the same match phase.
Disabled in production (CONFIG.active_model = "lstm"). Pre-LN transformer encoder with sinusoidal positional encoding and validity-weighted pooling in the bottleneck. Requires >=30 sessions per player. Enable via CONFIG.active_model = "transformer".
Fatigue Curve Comparator — Fits speed(t) = β·exp(−α·t) to each player's historical speed-vs-elapsed-time data (via scipy.optimize.curve_fit). Flags when the live speed residual falls more than one personal standard deviation below the expected curve, coinciding with a model anomaly.
Positional Drift Analyzer — Computes historical GPS centroid (avg_x, avg_y) and spread (position_std_radius). Flags when the player's recent median position deviates beyond positional.zone_radius_meters (default 5.0 m) for more than positional.drift_fraction_threshold (30%) of window ticks.
Workload Trend Tracker (ACWR) — Tracks the Acute-to-Chronic Workload Ratio (7-day / 28-day rolling distance). Flags when ACWR falls outside [0.8, 1.5], the established safe training load band.
The XAI pipeline has four sequential layers with a strict separation of concerns:
Temporal Feature Ablation -> SemanticInterpreter -> MatchStateManager -> LLMNLGEngine
(attribution only) (symbolic findings) (longitudinal memory) (narration only)
Runs F + 2 = 10 model calls per inference window (one per feature zeroed out, plus baseline and full-feature). Provides channel-level attribution within the 200 ms SLA (~30–50 ms on CPU). Used as the primary attribution method in production.
shap.KernelExplainer is used when the shap library is installed and background matrix dimensions match the feature vector. The shap_compat.py magnitude-proxy fallback is used otherwise, preserving the explanation interface.
Converts raw SHAP attributions into typed SemanticFinding objects across five domains:
| Domain | Features monitored |
|---|---|
cardiovascular_load |
heart_rate_bpm, hr_recovery_time_s |
locomotor_load |
speed_ms, distance_delta, sprint_flag, z-scores |
workload_balance |
ACWR, fatigue accumulation metrics |
tactical |
x_pitch, y_pitch, positional drift |
persistence |
Longitudinal recurrence patterns |
The SemanticInterpreter is augmented by the Redis CAG layer (see Cache-Augmented Generation): before classifying current-window attributions, the interpreter retrieves cached SHAP results and prior SemanticFinding objects for the player, enabling trend-aware symbolic reasoning without recomputing past windows. The LLM receives SemanticFinding objects and acts as narrator only — physiological reasoning lives in this symbolic layer, not in the prompt.
Accumulates SemanticFinding objects over the full match timeline. Provides motif detection (repeated finding patterns within a session) and trend reasoning (increasing/decreasing severity over time). build_semantic_summary() feeds the state compression layer, which condenses the finding stream before LLM conditioning.
LLMNLGEngine calls qwen2.5:14b via Ollama asynchronously (off the SLA clock) with a OLLAMA_NLG_TIMEOUT_S timeout (default 30 s). The LLM receives a compressed state representation from the TemporalContextCompressor — not raw telemetry or the full finding stream — ensuring prompt entropy is minimised and physiological reasoning remains in the symbolic layer. On timeout or connection failure, TemplateNLGEngine provides a deterministic, sub-millisecond fallback.
NLG summary guarantee: Every emitted alert carries a non-empty nlg_summary. Alerts where SHAP is on cooldown (60 s XAI cooldown between full SHAP runs per player) receive an immediate template summary backed by cached attribution context from Redis. Alerts where SHAP runs receive the richer LLM-backed summary via the async worker.
Naively feeding the LLM a full stream of SemanticFinding objects accumulates four compounding problems as a match progresses:
- Prompt entropy — unrelated findings from different match phases dilute the signal relevant to the current alert.
- Repeated findings — the same physiological pattern (e.g.,
hr_recoverybelow baseline) may appear in every window of a sustained episode, adding tokens without adding information. - Temporal redundancy — findings from 70 minutes ago carry little diagnostic weight for a substitution decision at 85 minutes.
- Alert flooding — without compression, the LLM receives the same escalation narrative on every window of a sustained episode, producing near-identical summaries.
TemporalContextCompressor (in explainability/episodic_context.py) operates in three stages after MatchStateManager has accumulated findings for the current episode:
1. Trajectory Narrative
Constructs a structured summary of the player's physiological trajectory over the last compression.trajectory_window_steps (default 5) inference windows. Each named domain (cardiovascular_load, locomotor_load, etc.) is represented by its direction vector (stable / worsening / recovering) and peak severity, not by individual finding instances. This reduces a 5-window finding sequence to a single structured object per domain.
cardiovascular_load: worsening (peak severity: HIGH, onset: window -3)
locomotor_load: stable (severity: MEDIUM)
tactical: recovering (drift cleared at window -1)
2. Escalation Summary
Encodes the Alert FSM trajectory for the current episode as a compact descriptor:
NONE → WARNING (w=2) → SUSTAINED (w=4) → gate_windows=6. This gives the LLM the full escalation arc in a single token-efficient string, replacing per-window FSM state repetition.
3. Episodic Abstraction
When compression.max_findings_per_episode (default 12) is exceeded within a single episode_id, older findings are collapsed into a typed episode header: [EPISODE_START: cardiovascular+locomotor, onset 00:74:12, initial_confidence 0.81]. Only findings from the most recent 3 windows are passed verbatim. This preserves longitudinal behavioural structure — the LLM knows what kind of episode this is and when it started — while eliminating token-for-token repetition of resolved findings.
The compressed prompt contains:
- Trajectory narrative (domain → direction + severity): ~40–80 tokens
- Escalation summary (FSM arc): ~15 tokens
- Episodic header (if applicable): ~25 tokens
- Current-window top SHAP features (from ablation or cache): ~60 tokens
- Counterfactual (what would clear the alert): ~20 tokens
Total: ~160–200 tokens of structured context, regardless of match duration or episode length. Without compression, a 90-minute match with 5-window findings would accumulate ~2,700+ tokens of raw finding history.
The compression layer is cache-aware. Before building the trajectory narrative, TemporalContextCompressor queries RedisCheckpointStore / EpisodeStore for the player's cached SHAP attributions from the XAI cooldown period. This ensures that windows where full SHAP was not recomputed (due to the 60 s cooldown) still contribute their attribution signal to the trajectory narrative via the cached values, rather than appearing as gaps.
The system implements Cache-Augmented Generation (CAG) — as opposed to Retrieval-Augmented Generation (RAG) — using Redis as the backing store. The distinction is consequential for a real-time inference pipeline:
| CAG (this system) | RAG (not used) | |
|---|---|---|
| Retrieval | Deterministic key lookup (player_id:shap, player_id:findings) |
Approximate nearest-neighbour search |
| Latency | O(1) Redis GET / ZRANGE | Vector store query latency (5–50 ms typical) |
| Correctness | Exact cached artefacts; no retrieval error | Relevant documents may not be returned |
| Domain | Closed, structured (per-player physiological history) | Open, unstructured (general knowledge) |
For a closed, structured domain like per-player physiological findings, RAG's retrieval flexibility is unnecessary and its latency and retrieval error are unacceptable within the 200 ms SLA. CAG provides the right tradeoff.
SHAP attribution cache (player_id:shap:window_ts)
After each SHAP run, the 8-feature attribution vector is written to Redis with a REDIS_CAG_TTL_S TTL (default 3600 s). During the 60-second XAI cooldown between full SHAP runs per player, the SemanticInterpreter and TemporalContextCompressor read the most recent cached attribution rather than falling back to zero-weight attribution. This means the trajectory narrative always reflects real attribution signal, not silence.
SemanticFinding history (player_id:findings sorted set)
Each SemanticFinding emitted by the SemanticInterpreter is appended to a per-player Redis sorted set, scored by Unix timestamp. EpisodeStore retrieves the N most recent findings (default N=10) before interpreter runs, enabling trend-aware symbolic reasoning:
# Without CAG: interpreter sees only current window
findings = interpreter.classify(current_shap, current_window)
# With CAG: interpreter sees current window + longitudinal context
cached_context = cag_store.get_recent_findings(player_id, n=10)
cached_shap = cag_store.get_latest_shap(player_id)
findings = interpreter.classify(current_shap, current_window,
context=cached_context,
prior_shap=cached_shap)This is the critical enabler for multi-window trend detection — the interpreter can classify a finding as persistence (recurrent pattern) rather than first_occurrence only because the cached history is available without reprocessing the MatchStateManager trajectory.
When Redis is unavailable, RedisCheckpointStore / EpisodeStore returns empty context objects. The SemanticInterpreter falls back to single-window classification, and the TemporalContextCompressor builds trajectory narratives from in-memory MatchStateManager state only. No alerts are suppressed; the finding quality degrades gracefully from trend-aware to window-local.
REDIS_CAG_AVAILABLE=True → trend-aware findings, full trajectory narrative
REDIS_CAG_AVAILABLE=False → window-local findings, in-memory trajectory only
Telemetry validation operates in two stages:
- Pre-accumulation temporal validation (event-level) — detects timestamp reversals and epoch discontinuities before the event enters the accumulator, triggering epoch-scoped runtime resets when continuity cannot be preserved.
- Post-window semantic validation (window-level) — physical plausibility checks run after a complete window is emitted.
Pre-accumulation checks:
| Check | Live behaviour | Replay behaviour (--replay-mode) |
|---|---|---|
| Timestamp reversal | non_monotonic_timestamp → INVALID, buffer reset |
replay_non_monotonic_timestamp → DEGRADED (confidence 0.7, floored to 0.8) |
| Timestamp gap > 60 s | buffer reset | no reset (gaps expected between seasons) |
Post-window checks:
| Check | Live behaviour | Replay behaviour |
|---|---|---|
| Mask completeness | <75% of required fields → INVALID | same |
| Physical plausibility | speed >13.5 m/s (+20% margin), HR outside [30, 220], accel >12 m/s² → INVALID | same |
| Timestamp gap > 5 s | timestamp_gap_* → confidence -0.3 |
replay_timestamp_gap_* → no penalty |
Replay-specific issue strings (replay_*) are distinct from live equivalents so audit queries on non_monotonic_timestamp or timestamp_gap_* continue to find only genuine sensor failures, not expected replay stream disorder.
Status values: VALID (confidence=1.0), DEGRADED (0.0–0.8), INVALID (0.0). Inference is blocked for INVALID events. DEGRADED events are inferred but flagged. The Alert FSM shifts to HOLD when event confidence <0.4.
signal_active (>= min_persistence windows)
NONE ────────────────────────────────────────────▶ WARNING
▲ │
│ recovery (>= recovery_threshold clear windows) │ signal_active (>= escalation_threshold)
│ ▼
└──────────────────────────────────────────────── CRITICAL
◀──────── HOLD (confidence < 0.4) ────────────
◀──────── SAFE_MODE (system-wide) ─────────────
- Hysteresis: Transitions only escalate within an episode; de-escalation requires
recovery_threshold(default 3) consecutive clear windows. - Alert family cooldown: 20 s cooldown per player. Switching alert type resets the cooldown immediately, ensuring the first instance of a new type is never suppressed.
- Episode tracking: Each
NONE → WARNINGtransition incrementsepisode_id, enabling episode-level TP/FP/FN evaluation viautils/evaluation/episodes.py.
| Level | Trigger | Features disabled |
|---|---|---|
NORMAL |
— | None |
LEVEL_1 |
SHAP/LLM violation or TVL DEGRADED |
SHAP explanation; LLM NLG |
LEVEL_2 |
Invariant violation (e.g. model–threshold mismatch) | Above + adaptive calibration frozen |
LEVEL_3 |
Critical invariant failure | Above + inference suspended; all alerts suppressed |
Safe Mode propagates from SystemInvariantGuard → AlertManager.set_safe_mode() → all downstream consumers.
Buffers 24 raw telemetry packets per player before emitting one inference window (non-overlapping, stride = window_size):
- 1,092 telemetry packets → ~45 inference cycles instead of 1,092
- Reduces: alert duplication from near-identical overlapping buffers, fake persistence increments on every packet, exploding trajectory lengths, motif reinforcement without new information
- Resets automatically on confirmed continuity breaks (session boundary transitions, timestamp discontinuities, epoch-scale temporal gaps)
- Buffer resets propagate through a unified epoch-reset path that atomically clears EMA state, positional trajectory buffers, alert FSM persistence state, rolling match-state trajectories, TVL per-player timestamp history, and output cooldown gates
The SLA timer (t_start) is set immediately after the accumulator emits a complete window. The 200 ms budget measures inference time — LSTM forward pass, threshold comparison, result assembly, and state compression — and is not inflated by accumulation time or asynchronous LLM NLG.
Event fingerprinting (MutationJournal): Each calibration update is content-hashed. Idempotent replay: duplicate updates are silently dropped.
Temporal Causality Guard: Detects timestamp reversals and epoch discontinuities before accumulation, triggering epoch-scoped runtime resets. Configurable strict/warn mode.
Priority-aware backpressure (BoundedPriorityQueue): Under load, tasks are shed in reverse priority — LLM summaries dropped first, then SHAP, then inference — ensuring the 200 ms SLA is preserved even when the LLM is slow or unavailable.
Replay consistency is a first-class design concern. Most sports AI systems process historical data without guaranteeing that the inference, alert, and explanation outputs produced during replay are bitwise-reproducible and semantically equivalent to what would have been produced in live operation. This system provides explicit guarantees across four layers.
The TemporalCausalityGuard enforces strict event-time monotonicity across all ingestion paths. In replay mode, timestamp reversals that are expected artefacts of interleaved multi-session streams are classified as replay_non_monotonic_timestamp (DEGRADED, confidence floored at 0.8) rather than triggering buffer resets. This preserves inference continuity through interleaved streams while keeping the live non_monotonic_timestamp marker clean for genuine sensor failure auditing.
The replay-specific issue taxonomy (replay_non_monotonic_timestamp, replay_timestamp_gap_*) is distinct from live equivalents at every layer — TVL classification, log emission, and audit query — so post-match analysis of replay logs cannot be contaminated by expected stream disorder.
In live mode, the LiveWindowAccumulator resets on session boundary transitions and timestamp gaps > 60 s. In --replay-mode, these resets are suppressed because historical streams routinely interleave events from unrelated source sessions, and session-boundary transitions in the stream do not represent genuine continuity breaks. The accumulator instead relies solely on the TemporalCausalityGuard for epoch-scoped resets, preserving the same accumulation semantics that governed alert persistence during live operation.
The unified epoch-reset path ensures that when a continuity break does occur in replay, the full runtime state is cleared atomically: EMA smoothing state, positional trajectory buffers, Alert FSM persistence counters, rolling match-state trajectories, TVL timestamp history, Redis CAG context (player findings and SHAP cache), and output cooldown gates are all reset together. Partial state resets — where the FSM clears but the EMA does not, for example — are architecturally prevented by routing all resets through a single reset_player() call chain.
The 0.8 confidence floor applied to DEGRADED replay events is propagated consistently through the full pipeline: from TelemetryValidityLayer._effective_confidence() through AlertManager (which gates on confidence < 0.4 for HOLD) and through the inference log (confidence field in every NDJSON entry). This means that post-match confidence distributions computed from the inference log accurately reflect the replay-time confidence behaviour, enabling reproducible threshold sensitivity analysis.
The --replay-mode flag is threaded from cmd_serve → _build_pipeline(replay_mode) → PlayersDataAnalysisPipeline(replay_mode) → TelemetryValidityLayer(replay_mode) → process_window_direct(replay_mode) → _effective_confidence() without duplicating policy logic at any layer.
| Behaviour | Live mode | Replay mode (--replay-mode) |
|---|---|---|
| Timestamp reversal | INVALID → buffer reset | DEGRADED (conf 0.8) → inference proceeds |
| Timestamp gap > 60 s | Buffer reset | No reset |
| Session boundary transition | Buffer reset | No reset |
| TVL issue label | non_monotonic_timestamp |
replay_non_monotonic_timestamp |
| Audit query contamination | Genuine sensor failures only | Replay disorder isolated to replay_* labels |
| Confidence floor on DEGRADED | 0.0–0.8 (unclamped) | 0.8 (floored) |
These differences are intentional and documented. They ensure replay outputs are maximally useful for post-match analysis and debugging while preserving the integrity of live sensor-failure auditing.
FairnessMonitor computes flag-rate disparity across three protected attributes:
| Attribute | Groups examined |
|---|---|
position |
GK, CB, LB, RB, CM, AM, LW, RW, ST |
age_group |
U21, Senior, Veteran |
nationality |
All unique nationalities in the squad |
A group whose flag rate deviates more than fairness.flag_rate_disparity_threshold (default 15%) from the squad mean is flagged as biased. The audit command exits with code 5 and identifies the biased groups in the output JSON.
RecalibrationPipeline runs when >=10 coach override records (OverrideRecord) have been logged for a player within the recalibration window. Adjusts per-player thresholds by feedback.threshold_adjustment_step (default ±5%) and applies a feedback.per_player_sensitivity_decay (default 10%) to prevent runaway threshold drift. Default cadence: every 7 days.
All threshold adjustments are recorded in MutationJournal for full auditability and replay-safe reconstruction.
Everything above this section describes the synthetic-data CLI (generate · train · evaluate · serve). A separate, real-data pilot pipeline runs alongside it, ingesting actual Kinexon UWB tracking exports (handball, not football GPS) for analytics.players / analytics.player_workload Redis Streams consumed by the Backend AnalyticsBridgeService / Frontend "Player Analytics" tab. It reuses the same SharedBackboneAutoencoder / PatternAnalysisEngine / SHAP stack documented above — no separate model architecture.
Real Kinexon CSV exports under data/ (positions.csv, statistics.csv, events.csv), loaded via ingestion/kinexon_adapter.py (KinexonAdapter) → ingestion/kinexon_resampler.py (KinexonResampler, 15 s buckets). ingestion/kinexon_events_features.py::merge_event_features() merges 24 additional window-aggregated event features onto the 8 resampled columns, matching the 32-feature input the promoted checkpoint (models/shared_backbone.pt) is actually trained on. As of this pipeline's current data, exactly one real match exists (session 3387, HSG Wetzlar vs. SC Magdeburg, 2026-06-07) — multi-match history is a designed-but-not-yet-implemented roadmap item (see below).
| Entry point | Model? | Feature count | Mode | Output stream |
|---|---|---|---|---|
python main.py publish --historical-replay (default) |
Yes — loads promoted checkpoint, never retrains | 32 | Batch (one-shot) | analytics.players |
python main.py publish --continuous |
Yes — loads promoted checkpoint once at startup, never retrains/reloads | 32 | Continuous — paced replay of real session ticks through LiveWindowAccumulator, publishing incrementally per completed window |
analytics.player_workload and analytics.players |
python main.py train --data-source kinexon --use-event-features |
Yes — genuine retrain, saves+promotes checkpoint | 32 | One-shot | none (writes models/shared_backbone.pt) |
scripts/publish_player_workload.py |
No — model-free aggregation (analysis/player_workload.py) |
32 (loader-level only; not fed to a model) | Batch (one-shot) | analytics.player_workload |
Both publish modes and main.py train --data-source kinexon's pilot path route through the same shared module, analysis/pilot_pipeline.py (build_pipeline_and_load(), build_pipeline_and_train(), score_window_and_build_event()) — the single home for Kinexon loading, checkpoint-loading, and per-window scoring/event-construction logic that used to be duplicated across scripts/publish_pilot_analytics.py, scripts/run_live_player_analytics.py, and scripts/evaluate_pilot_model.py (those first two scripts have been removed; their functionality is now main.py publish).
main.py serve (the synthetic-data CLI documented above) is architecturally separate from this pilot pipeline: it has no Kinexon loader, depends entirely on whatever JSON its stdin producer supplies, uses a mismatched LiveWindowAccumulator(24, 24) against the model's real window_steps=8, and never publishes to Redis — it writes to logs/inference_log.jsonl and drives the synthetic system's own alert/NLG narrative instead. It is not part of the Kinexon production path.
The canonical production runtime for analytics.players. Loads the promoted checkpoint once, then replays real per-tick session data in chronological order across all players (paced via --tick-interval-seconds), pushing each tick through the same LiveWindowAccumulator class main.py serve uses (configured at the model's real window_size=8, stride=8). On each completed window it runs one real inference and publishes immediately — no end-of-run batch dump. No live Kinexon hardware feed exists yet in this codebase (tracking.events has no producer), so this is a paced replay of recorded data rather than a stadium connection, but it is a genuine long-running process otherwise.
python main.py publish --continuous [--tick-interval-seconds 0.2] [--max-ticks N]scripts/ is split into two tiers so it's unambiguous which files are part of the supported production pipeline:
scripts/ (root) — supported, actively used by Backend or the documented pipeline:
| Script | Purpose |
|---|---|
evaluate_pilot_model.py |
Deep diagnostic report on the pilot checkpoint: calibration-state audit, real SHAP examples (lowest/highest/borderline-loss windows), full per-window CSV export (_pilot_eval_windows.csv). Imports its checkpoint-load/train logic from analysis/pilot_pipeline.py. Referenced by main.py evaluate's docstring as the source of its validated procedure. |
publish_player_workload.py |
One-shot batch publisher, model-free, 32-feature loader (see Production Entry Points table above). |
export_match_roster.py |
Exports per-player position + playing-time JSON. Read directly by Backend's PlayerMatchHistoryService/backfill-player-match-history.mjs. |
export_match_timeline.py |
Exports 15-minute-segment workload timeline JSON. Read directly by Backend's TimelineIntelligenceService. |
export_player_match_metrics.py |
Exports per-(match, player) physical/workload metrics JSON. Read directly by Backend's backfill-player-match-history.mjs. |
scripts/archive/ — one-off historical validation/trace scripts, kept for audit trail only, not part of any supported workflow and not referenced by Backend or main.py:
| Script | Purpose |
|---|---|
feature_importance_comparison.py |
OLD (8-feature) vs. NEW (32-feature) training comparison — mean |SHAP| by feature. One-time migration record. |
compare_persistence.py |
Measured AlertManager.min_persistence impact on anomaly signal during calibration tuning. |
baseline_fix_validation.py, baseline_threshold_audit.py |
Validated/audited BaselineBuilder provisional-window thresholds during calibration tuning. |
gap_validation.py, window_gap_trace.py |
Validated gap-aware windowing behaviour against real session gaps during that fix. |
resampler_validation.py |
Validated KinexonResampler bucket/window counts during that fix. |
kinexon_trace.py, semantic_trace.py, phase_a_trace.py |
Before/after traces of specific real-data calibration fixes, kept as the historical record of what changed and why. |
If you need to re-run one of the archived scripts, it still works as-is from its new path (python scripts/archive/<name>.py) — nothing about its logic changed, only its location.
scripts/publish_pilot_analytics.py, scripts/run_live_player_analytics.py, and scripts/train_pilot_session_3387.py have been removed — fully superseded by main.py publish (--historical-replay / --continuous) and main.py train --data-source kinexon --use-event-features.
Implemented on the Backend side: PlayerMatchHistory (PostgreSQL, one append-only row per (player_id, match_id), written by backend/scripts/backfill-player-match-history.mjs from this pipeline's own match_roster.json/player_match_metrics.json exports + the crosswalk to Postgres player IDs). 4 real matches are ingested as of this writing, enabling real multi-match trend analysis (workload trend, match-to-match consistency) in Backend's Match Intelligence layer — see the root ARCHITECTURE.md. acuteLoad/chronicLoad/acwr are attached only to each player's own chronologically-latest match (never back-computed for earlier matches, to avoid fabricating a point-in-time value that doesn't exist).
A second, separate production entrypoint from main.py — owns live possession/team-state/tactical-insight analytics, not the player-workload/LSTM pipeline documented above. run_match_orchestrator.py --match-id <id> instantiates MatchOrchestrator and runs a continuous consume→tick→publish loop:
- Consumes
match.events/match.contextfrom Backend (coach-entered actions/score/clock — Backend is the source of truth for these) andtracking.events(PlayerDynamics-internal Kinexon tactical-event hand-off). - Publishes to
analytics.possessions,analytics.teamstate,analytics.trends,analytics.insights,analytics.situations(Backend'sAnalyticsBridgeServicerelays these ontoGET /api/analytics/streamfor the live dashboard). - Graceful shutdown on SIGINT/SIGTERM: finishes the current tick, calls
finalize(), publishes one last batch, then exits — nothing "tail"/provisional is lost. - This is the process
docker-compose.yml'splayerdynamicsservice runs (--match-id ${MATCH_ID} --tick-interval-seconds 5).
main.py's subcommands (ingest/train/evaluate/publish/serve/audit/status) and run_match_orchestrator.py are independent processes that happen to share the same Redis instance and the same data/processed/ filesystem artifacts — neither imports the other.
Human-readable (default, to stderr):
2025-09-14T19:42:11Z INFO players_data.main Serve complete | events=1092 alerts=23 sla_violations=0
Structured JSON (set JSON_LOGS=1):
{"ts": "2025-09-14T19:42:11Z", "level": "INFO", "logger": "players_data.main", "message": "ALERT player=p007 type=substitution conf=0.92 latency=47.1 ms"}Written by serve for every processed window (not just alerts). Fields: inference_id, player_id, external_id, session_id, recommendation_type, is_anomaly, anomaly_score, confidence, fatigue_flag, drift_flag, workload_flag, nlg_summary, compression_tokens, cag_hit, ts.
cag_hit: true indicates the window used Redis-cached SHAP attributions (XAI cooldown was active). compression_tokens records the token count of the compressed state passed to the LLM, enabling prompt efficiency monitoring.
When async LLM NLG completes, an enriched entry is appended with "_nlg_enrichment": true, nlg_summary_llm, and full shap_values.
| Log pattern | Meaning |
|---|---|
ALERT player=… type=… conf=… latency=… ms |
Alert emitted to stdout |
SLA breach: player=… latency=…ms > 200ms |
Inference exceeded SLA; investigate model load |
CAG hit: player=… shap_cached=True findings_cached=N |
SemanticInterpreter augmented from Redis |
CAG miss: player=… redis_unavailable |
Redis down; falling back to single-window classification |
STATE COMPRESSED: player=… tokens=… findings_collapsed=N |
Episodic abstraction triggered; N findings collapsed to header |
BUFFER RESET reason=session_change |
LiveWindowAccumulator cleared on new session |
EPOCH RESET | player=… reason=… cleared=[…] |
Unified runtime state reset triggered by continuity break |
Telemetry degraded player=… status=INVALID issues=[…] |
TVL rejected event; only live sensor issues appear at WARNING |
AlertManager: ENTERING GLOBAL SAFE MODE |
System-wide alert suppression active |
SHAP computation failed, using fallback |
SHAP library error; magnitude-proxy used |
Slow Ollama call: model=… ms |
LLM NLG took longer than expected; alert already emitted |
circuit breaker tripped — switching to template NLG |
Ollama unavailable; template fallback active for 30 s |
Replay-specific TVL issues (replay_non_monotonic_timestamp, replay_timestamp_gap_*) are logged at DEBUG level only and do not appear in WARNING output during normal replay operation.
| Code | Command(s) | Condition |
|---|---|---|
0 |
all | Success |
1 |
generate, train, audit |
Data or validation error (missing files, empty tables, parse failure) |
2 |
train, evaluate, serve |
Model error (not trained, corrupt checkpoint, zero windows) |
3 |
evaluate |
ROC-AUC below --min-auc, or no players produced evaluable windows |
4 |
serve |
Unhandled stream error |
5 |
audit |
Bias detected in a protected attribute group |
Current limitations:
- The 200 ms SLA covers inference and state compression (T1). LLM NLG generation (T2) is asynchronous and decoupled via a 30 s timeout with deterministic template fallback. Both latencies are observable separately in logs and the inference log.
- Temporal feature ablation explains the derived feature vector, not raw LSTM hidden states. True SHAP over the full sequence space would require ~2,000 model calls per window (~2–15 s), violating the SLA.
SessionRegimeClassifieruses rule-based Territory × Intensity bins. Match phase (first/second half) is not included because elapsed-time context is not threaded through the calibration interface at training time.PatternAnalysisEngineis not thread-safe. One engine per asyncio event loop or per process is the supported deployment model.TransformerAutoencoderis experimental and disabled in production.- Redis CAG TTL is uniform across all artefact types. SHAP attributions and
SemanticFindingobjects have different useful lifetimes (SHAP: ~1 match; findings: ~1 session) that a tiered TTL policy would address. - Historical replay streams may interleave telemetry from unrelated source sessions. Anomaly scores in replay mode will vary across gate windows as the stream cycles through different historical sessions.
- The Kinexon pilot pipeline (see Kinexon Real-Data Pilot Pipeline) currently has exactly one real match (session 3387). Multi-match analytics (workload trend, ACWR, performance trend, match-to-match consistency) cannot be computed until additional matches are ingested — a data-volume limitation, not an engineering gap.
main.py serveand the Kinexon pilot pipeline are architecturally separate runtimes with no shared data loader;servedoes not publish to Redis and is not part of theanalytics.playersproduction path.
Roadmap:
- Learned GMM regime detector to replace rule-based Territory × Intensity bins, enabling data-driven regime discovery.
- Async
PatternAnalysisEnginewith per-player actor isolation for horizontal scaling. - SHAP over LSTM hidden states via integrated gradients (
GradientExplainer) — eliminates the sequence-space dimensionality problem. - Kafka consumer integration for multi-worker
servedeployments. - FastAPI wrapper exposing
process_window_direct()as a REST endpoint for integration with external dashboards. - Elapsed-time axis in regime classification (match phase as a third regime dimension).
- Tiered Redis TTL policy: short TTL for SHAP attributions (match-scoped), longer TTL for compressed episodic abstractions (season-scoped post-match analysis).
- Redis Streams integration for distributed exactly-once event fingerprinting across multi-worker
servedeployments.
- Rein & Memmert (2016) — Big data and tactical analysis in elite soccer; DOI: https://doi.org/10.1186/s40064-016-3108-2
- Foteinakis et al. (2025) — Explainable ML for Basketball; DOI: https://doi.org/10.3390/app152312401
- Odet et al. (2024) — ML and Explainability for Sports Outcome Prediction
- Pietraszewski et al. (2025) — AI in Sports Analytics systematic review; DOI: https://doi.org/10.3390/app15137254
- Kranzinger et al. (2025) — Explainable AI in Sports Science; DOI: https://doi.org/10.48550/arXiv.1705.07874
- Lundberg & Lee (2017) — SHAP: A Unified Approach to Interpreting Model Predictions; DOI: https://doi.org/10.48550/arXiv.1705.07874
- Hochreiter & Schmidhuber (1997) — Long Short-Term Memory
- Bai et al. (2018) — An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling; DOI: https://doi.org/10.48550/arXiv.1803.01271
- Caron & Müller (2023) — TacticalGPT: Uncovering the Potential of LLMs for Predicting Tactical Decisions in Professional Football
- Ferrara (2024) — Large Language Models for Wearable Sensor-Based Human Activity Recognition; DOI: https://doi.org/10.3390/s24155045
- Yang (2024) — ChatPPG: Multi-Modal Alignment of Large Language Models for Time-Series Forecasting in Table Tennis
- Tian et al. (2025) — SportsGPT: An LLM-driven Framework for Interpretable Sports Motion Assessment and Training Guidance; DOI: https://doi.org/10.48550/arXiv.2512.14121
- Liu et al. (2024) — Smartboard: Visual Exploration of Team Tactics with LLM Agent; DOI: https://doi.org/10.1109/TVCG.2024.3456200
- Feli et al. (2025) — An LLM-Powered Agent for Physiological Data Analysis; DOI: https://doi.org/10.1109/EMBC58623.2025.11254428
- Xia et al. (2024) — SportQA: A Benchmark for Sports Understanding in Large Language Models; DOI: https://doi.org/10.18653/v1/2024.naacl-long.283
- Apostolou & Tjortjis (2019) — Sports Analytics algorithms for performance prediction; DOI: https://doi.org/10.1109/IISA.2019.8900754
- Sarlis & Tjortjis (2020) — Sports analytics — Evaluation of basketball players and team performance; DOI: https://doi.org/10.1016/j.is.2020.101562
- Ghosh et al. (2023) — Sports analytics review: AI applications, emerging technologies, and algorithmic perspective; DOI: https://doi.org/10.1002/widm.1496
- Chan et al. (2025) — Don't Do RAG: When Cache-Augmented Generation is Better than Retrieval Augmented Generation; DOI: https://doi.org/10.48550/arXiv.2412.15605
- Lewis et al. (2020) — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks; DOI: https://doi.org/10.48550/arXiv.2005.11401
- Perez et al. (2018) — FiLM: Visual Reasoning with a General Conditioning Layer; AAAI 2018
- Gabbett (2016) — The training-injury prevention paradox; DOI: https://doi.org/10.1136/bjsports-2015-095788