Agentel

AI-powered incident diagnosis for on-call engineers.

Agentel is a multi-agent system that automatically triages and diagnoses production incidents across microservices platforms. It turns noisy telemetry into actionable decisions with measurable confidence.

What It Does

When a SEV-1 incident hits, you need answers fast. Agentel:

Identifies the root cause — What broke and why
Shows blast radius — Which services and users are affected
Recommends safe remediation — What to do, with safety checks
Knows when to escalate — When uncertainty is too high for automation

Target: Reduce MTTR for checkout-impacting incidents by 40%.

Quick Start

# Clone and install
git clone https://github.com/marcospolanco/agentel.git
cd agentel
pip install -r requirements.txt

# Run the dashboard
streamlit run ui/dashboard.py

# Run tests
pytest tests/ -v

# Run evaluation harness
python evals/eval_harness.py --all

Demo

Root cause identified with confidence, blast radius flow card, and recommended action

Architecture

Agentel uses a 3-agent pipeline focused on guiding SRE attention:

Agent	Responsibility
Context	Maps service dependencies, calculates blast radius
Diagnosis	Analyzes telemetry to identify root cause
Validation	Checks remediations against architectural constraints

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Incident  │───>│  Context    │───>│  Diagnosis  │───>│ Validation  │
│   Telemetry │    │   Agent     │    │   Agent     │    │   Agent     │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                             │                    │
                                             ▼                    ▼
                                      ┌─────────────┐    ┌─────────────┐
                                      │  Root Cause │    │  Safety     │
                                      │  Hypothesis │    │  Check      │
                                      └─────────────┘    └─────────────┘

Key design principle: Attention prioritization. The system surfaces exactly what you need (≤3 evidence items) and hides the noise (suppressed metrics).

Operator-facing output: The dashboard uses on-call language (root cause, blast radius) — not raw metric or schema names. CI enforces this with system leak checks (see Evaluation).

Project Status

Overall: Reference implementation — the core diagnosis pipeline, semantic eval harness, and Stitch-backed dashboard are in place. Live LLM diagnosis and full UI polish (rollback modal, loading morph) remain open work.

Based on agentel-spec.md v2.1.0:

Phase	Status	Description
3.0	✅ Complete	Runtime flow & API contract (`Orchestrator.diagnose()`)
1	✅ Complete	Foundations — models, vocabulary, topology loader
2	🚧 Partial	3-agent core; rule-based diagnosis (OpenAI LLM path not wired)
3	✅ Complete	Eval harness & semantic fitness tests
4	🚧 Partial	DashboardView + Stitch templates in Streamlit; modal/rollback UX pending

Phase 4 breakdown:

Sub-phase	Status	Deliverable
4a	✅ Complete	`DashboardView` + `build_dashboard_view()`
4b	✅ Complete	Golden incidents (`INC-2026-001`, `-002-partial`, `-003-approval`)
4c	🚧 Partial	`ui/stitch_renderer.py` + Streamlit dashboard; runtime smoke & modal behavior pending

CI: GitHub Actions runs pytest tests/ -v and python evals/eval_harness.py --all on every push/PR to main (Python 3.11–3.13).

Last verified: 2026-06-15 — pytest tests/ -v (20 passed) · python evals/eval_harness.py --all (3/3 passed)

Evaluation

Agentel includes a rigorous evaluation harness that measures:

Root cause accuracy — Semantic similarity against expected causes
Attention Focus Index (AFI) — Measures prioritization (target: ≥0.80)
Confidence Calibration Error (CCE) — How well confidence matches correctness (target: ≤0.15)
System leak checks — Ensures no blocked technical terms leak to UI

python evals/eval_harness.py --all --report

Performance Guarantees

Metric	Target
Diagnosis timeout	≤20 seconds
Tokens per session	≤40k
Topology traversal	Offline (no external HTTP)
Interactive elements (primary view)	≤7

Contributing

We welcome contributions!

Good first issues:

Wire live OpenAI LLM calls to DiagnosisAgent (currently rule-based fallback)
Add suppressed metrics expander to dashboard UI
Implement rollback confirmation modal with domain vocabulary

Areas for contribution:

Additional golden incident scenarios (data/golden_incidents/)
Platform-specific telemetry adapters (Prometheus, Jaeger, OpenTelemetry)
UI polish per agentel-ui-brief.md

Documentation

Document	Purpose
`agentel-spec.md`	Canonical specification — technical & semantic requirements
`agentel-ui-brief.md`	UX design brief for Google Stitch (compact semantics)

License

Built with a focus on correctness, observability, and measurable reliability.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
data		data
docs		docs
evals		evals
src		src
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentel

What It Does

Quick Start

Demo

Architecture

Project Status

Evaluation

Performance Guarantees

Contributing

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentel

What It Does

Quick Start

Demo

Architecture

Project Status

Evaluation

Performance Guarantees

Contributing

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages