Skip to content

Feature discussion: semantic model routing via reranker + LLM fallback #594

@dilong888

Description

@dilong888

Problem

GA currently has no built-in mechanism to route tasks to different LLMs based on task semantics. Users either hardcode a model or switch manually. As model options grow (cost tiers, capability tiers), routing becomes a real problem.

Proposed approach

A three-layer routing module that fits in a single file:

Layer 1 - Reranker (primary path)
Each model has a set of exemplar task strings. A cross-encoder reranker scores the incoming task against all exemplars in one pass, aggregates per-model (top-2 mean), and picks the winner. Fast, no LLM call needed.

Layer 2 - Causal LLM fallback
When L1 fails (API down / low confidence margin), a small LLM classifies the task using model descriptions. Graceful degradation.

Layer 2.5 - Anti-bias challenger
When L2 selects a high-tier model (sonnet/opus), an independent model family cross-checks the decision to prevent single-model bias. Upgrades are accepted; downgrades are rejected.

Layer 3 - Feedback loop
feedback(routing_id, outcome) adjusts per-model weights at runtime. No retraining.

Why it fits GA's design

Single file, ~400 lines in a clean implementation.
Zero required dependencies beyond requests (already in use).
Reranker provider is configurable; falls back gracefully if unavailable.
Self-evolving: exemplar pool grows from successful routing history.
Interface: decide(task: str) -> str, returns model id, nothing more.
Safe in multi-worker scenarios: routing is decided once per task at dispatch time. Once a worker starts, its model is fixed for the entire session, no mid-session switching, no cross-worker interference. Context coherence is preserved by design.

Prior art

Running a battle-tested version of this in production on a fork for ~2 months. Routing accuracy measured against manual labels: L1 handles ~78% of cases, L2+L2.5 covers the rest. Cost reduction vs always-using-best-model: ~40%.

Question

Is this the kind of routing primitive that belongs in the core repo, or would this be better suited for the Skill Marketplace once it launches?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions