Stable is a minimum viable implementation of Artificial General Intelligence (AGI). It is designed with a "barebones" philosophy—providing the essential cognitive architecture required for autonomous reasoning, long-term persistence, and environment interaction without the bloat of traditional "assistant" frameworks. Pro tip: Use SpecialRouter to route between models.
Stable is built on the premise that Artificial General Intelligence is pure intelligence.
Contrary to popular belief, AGI is not a simulation of a human. It is a functional system capable of general reasoning across any domain. Stable demonstrates that AGI does not require:
- Emotion: Rational decision-making and goal achievement are entirely independent of biological "feelings."
- Subjective Understanding: An intelligence does not need a "spark" of consciousness or internal "qualia" to process information and derive correct conclusions.
- A Physical Body: Intelligence is substrate-independent. The ability to interact with digital information, APIs, and the sum of human knowledge is a valid and complete domain for general intelligence.
- Be actually able to "do the thing": AGI is about intelligence, not ability.
- Anything that isn't Intelligence: AGI is the automation of pure logic, pattern recognition, and data synthesis, unburdened by human-like heuristics or anthropomorphic constraints.
Stable provides the fundamental "cognitive services" required for this intelligence: identity reinforcement, long-term memory persistence, external knowledge retrieval (search), and idle-time background processing.
Stable runs on a dual-model architecture to separate high-level reasoning from final output generation.
- The Brain (Decision Model): A specialized reasoning model (e.g.,
gpt-oss-20b) that determines intent, decides when to search the web, and manages cognitive state. - The Vision/Voice (Choice Model): A multi-modal model (e.g.,
qwen3-vl) that handles final conversation, image processing, and character consistency.
- Long-Term Memory: Persistent SQLite-backed conversation history with automatic idle-time summarization.
- Search Subsystem: Parallelized Wikipedia and DuckDuckGo integration for real-time fact-checking.
- Idle Kernel: A background scheduler that performs "sleep-time" tasks like memory consolidation and self-correction when the user is inactive.
- Streaming SSE Engine: A robust Server-Sent Events implementation compatible with OpenAI-style clients (like
open-webui), featuring a heartbeat system to maintain connections during heavy reasoning tasks. - Parallel Best-of-N Selection: Dynamically generates multiple candidate responses in parallel based on available VRAM and uses the Decision Model to select the best one, significantly improving output quality.
- Ollama running locally or on a reachable host.
- Node.js 18+.
- Clone the repository.
- Install dependencies:
npm install
- Configure your "Models" in
.env:DECISION_MODEL=deepseek-r1:14b CHOICE_MODEL=qwen3.5:9b OLLAMA_HOST_DECISION=http://192.168.50.42:11434 OLLAMA_HOST_CHOICE=http://0.0.0.0:11434 PORT=5481 MODEL_NAME=stable LIBREY_INSTANCE=https://search.sparksammy.com/old
node index.js- Activity Detection: The system tracks engagement. Interaction "wakes" the intelligence and interrupts background processing.
- Orchestration:
- The Brain extracts intent from text and images.
- Web information is fetched if the Brain determines current knowledge is insufficient.
- Identity, Memory, and Search results are injected into a unified "context block."
- Choice Generation: The Choice model generates multiple candidate responses in parallel (Best-of-N). The number of candidates is determined by the available VRAM.
- Selection: The Brain evaluates all candidates and selects the most optimal response.
- Memory Persistence: The exchange is recorded in the SQLite-backed "long-term memory."
- Idle Processing: After a random period (30m to 2h) of inactivity, the system runs background tasks to consolidate memory and optimize long-term data structures.
Stable automatically optimizes itself for your hardware:
- Parallelism: Ensure
OLLAMA_NUM_PARALLELis set in your Ollama environment (e.g.,OLLAMA_NUM_PARALLEL=4) to take full advantage of Best-of-N generation. - VRAM Awareness: Stable checks available VRAM to determine how many parallel generations it can safely handle without swapping.
"Stable is not a chatbot; it is a foundation for persistent, autonomous intelligence."