RunanywhereAI
diff --git a/‎.github/pull_request_template.md‎
Lines changed: 3 additions & 0 deletions b/‎.github/pull_request_template.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎Playground/README.md‎
Lines changed: 10 additions & 6 deletions b/‎Playground/README.md‎
Lines changed: 10 additions & 6 deletions
diff --git a/‎Playground/android-use-agent/ASSESSMENT.md‎
Lines changed: 565 additions & 0 deletions b/‎Playground/android-use-agent/ASSESSMENT.md‎
Lines changed: 565 additions & 0 deletions
diff --git a/‎Playground/android-use-agent/README.md‎
Lines changed: 212 additions & 77 deletions b/‎Playground/android-use-agent/README.md‎
Lines changed: 212 additions & 77 deletions
diff --git a/‎Playground/android-use-agent/X_POST.md‎
Lines changed: 202 additions & 0 deletions b/‎Playground/android-use-agent/X_POST.md‎
Lines changed: 202 additions & 0 deletions
diff --git a/‎Playground/android-use-agent/app/build.gradle.kts‎
Lines changed: 24 additions & 28 deletions b/‎Playground/android-use-agent/app/build.gradle.kts‎
Lines changed: 24 additions & 28 deletions
diff --git a/‎Playground/android-use-agent/app/src/main/AndroidManifest.xml‎
Lines changed: 27 additions & 0 deletions b/‎Playground/android-use-agent/app/src/main/AndroidManifest.xml‎
Lines changed: 27 additions & 0 deletions
@@ -29,6 +29,9 @@ Brief description of the changes made.
 - [ ] Tested on iOS
 - [ ] Tested on Android
 
+**Playground:**
+- [ ] Tested on target platform
+- [ ] Verified no regressions in existing Playground projects
 **Web SDK / Web Sample:**
 - [ ] Tested in Chrome (Desktop)
 - [ ] Tested in Firefox
 
@@ -6,7 +6,7 @@ Interactive demo projects showcasing what you can build with RunAnywhere.
 |---------|-------------|----------|
 | [swift-starter-app](swift-starter-app/) | Privacy-first AI demo — LLM Chat, Speech-to-Text, Text-to-Speech, and Voice Pipeline with VAD | iOS (Swift/SwiftUI) |
 | [on-device-browser-agent](on-device-browser-agent/) | On-device AI browser automation using WebLLM — no cloud, no API keys, fully private | Chrome Extension (TypeScript/React) |
-| [android-use-agent](android-use-agent/) | Autonomous Android agent — navigates phone UI via accessibility + GPT-4o Vision + on-device LLM fallback | Android (Kotlin/Jetpack Compose) |
+| [android-use-agent](android-use-agent/) | Fully on-device autonomous Android agent — navigates phone UI via accessibility + on-device LLM (Qwen3-4B). See [benchmarks](android-use-agent/ASSESSMENT.md) | Android (Kotlin/Jetpack Compose) |
 | [linux-voice-assistant](linux-voice-assistant/) | Fully on-device voice assistant — Wake Word, VAD, STT, LLM, and TTS with zero cloud dependency | Linux (C++/ALSA) |
 | [openclaw-hybrid-assistant](openclaw-hybrid-assistant/) | Hybrid voice assistant — on-device Wake Word, VAD, STT, and TTS with cloud LLM via OpenClaw WebSocket | Linux (C++/ALSA) |
 
@@ -46,14 +46,18 @@ A Chrome extension that automates browser tasks entirely on-device using WebLLM
 
 ## android-use-agent
 
-An autonomous Android agent that navigates your phone's UI to accomplish tasks:
+A fully on-device autonomous Android agent that navigates your phone's UI to accomplish tasks. All LLM inference runs locally via RunAnywhere SDK with llama.cpp -- no cloud dependency required.
 
-- **Autonomous UI Navigation** — Taps, types, swipes, and navigates apps to complete goals
-- **GPT-4o Vision** — Screenshots sent to GPT-4o for visual screen understanding
-- **Unified Tool Calling** — All UI actions registered as OpenAI function calling tools
-- **On-Device Fallback** — Falls back to local LLM via RunAnywhere SDK when offline
+- **Fully On-Device AI** — LLM inference via RunAnywhere SDK + llama.cpp (Qwen3-4B recommended)
+- **Accessibility-Based Screen Parsing** — Reads UI tree via Android Accessibility API, no root required
+- **Tool Calling** — LLM outputs structured tool calls (`<tool_call>` XML or `ui_tap(index=5)` function-call style)
+- **Samsung Foreground Boost** — 15x inference speedup by bringing agent to foreground during inference
+- **Smart Pre-Launch** — Opens target apps via Android intents before the agent loop
+- **Optional Cloud Fallback** — GPT-4o with vision and function calling when an API key is configured
 - **Voice Mode** — Speak goals via on-device Whisper STT, hear progress via TTS
 
+See [android-use-agent/ASSESSMENT.md](android-use-agent/ASSESSMENT.md) for detailed model benchmarks across Qwen3-4B, LFM2.5-1.2B, LFM2-8B-A1B MoE, and DS-R1-Qwen3-8B on Samsung Galaxy S24.
+
 **Requirements:** Android 8.0+ (API 26), arm64-v8a device, Accessibility service permission
 
 ## openclaw-hybrid-assistant
 
@@ -0,0 +1,202 @@
+# Android Use Agent — X Post Demo Report
+
+**Task:** Post "Hi from RunAnywhere Android agent" on X (Twitter) using on-device LLM inference, navigating from the home feed with no cloud dependency.
+
+**Result:** ✅ PASS — Qwen3 4B, 6 steps, 3 real LLM inferences, goal-aware element filtering, ~4 min total.
+
+**Screen recording:** [`android_agent_muted.mov`](./assets/android_agent_muted.mov)
+
+---
+
+## Device
+
+| Spec | Detail |
+|------|--------|
+| Device | Samsung Galaxy S24 (SM-S931U1) |
+| SoC | Snapdragon 8 Gen 3 — 1× Cortex-X4 @ 3.39 GHz + 3× A720 + 4× A520 |
+| RAM | 8 GB LPDDR5X |
+| OS | Android 16 (One UI 7) |
+| Backend | llama.cpp (GGUF Q4_K_M) via RunAnywhere SDK |
+
+**Critical hardware quirk — Samsung background throttling:** OneUI pins background processes to efficiency cores (A520 @ 2.27 GHz), capping inference at **0.19 tok/s**. Bringing the app to the foreground during inference restores full CPU access: **2.4–25 tok/s** (15–17× improvement). This foreground boost is mandatory for any on-device LLM on Samsung.
+
+---
+
+## Models Benchmarked (UC1: "Open X and tap the post button")
+
+| Model | Size | Speed (fg) | Step Latency | Tool Format | UC1 Result |
+|-------|------|-----------|--------------|-------------|------------|
+| LFM2-350M Base | 229 MB | ~20 tok/s | 7–12s | ❌ Narrates instead of calls tools | FAIL |
+| LFM2.5-1.2B Instruct | 731 MB | ~2.8 tok/s | 8–14s | ✅ Valid | FAIL — always picks index 0–2 |
+| **Qwen3-4B** (`/no_think`) | **2.5 GB** | **~4 tok/s** | **67–85s** | **✅ Valid** | **PASS** |
+| LFM2-8B-A1B MoE | 5 GB | ~5 tok/s | 29–43s | ⚠️ Emits multi-action plans | FAIL — only 1st action runs |
+| DS-R1-Qwen3-8B | 5 GB | ~1.1 tok/s | ~197s | ❌ Hallucinated inner agent loop | FAIL |
+
+**Key finding:** There is a hard capability threshold around 4B parameters. Sub-2B models either can't follow tool-call format (350M) or can't reason about element selection (1.2B). 8B models have better reasoning but wrong output format or wrong speed. **Qwen3-4B with `/no_think` is the only viable on-device model for this task.**
+
+`/no_think` matters: with chain-of-thought enabled, Qwen3-4B spends 95%+ of its 512-token budget on `<think>` and runs out of space before the tool call. `/no_think` forces a direct 18-token output.
+
+---
+
+## Approaches Tried
+
+| # | Strategy | Model | LLM Calls | Outcome |
+|---|----------|-------|-----------|---------|
+| 1 | Pure LLM — no assists | LFM2.5 1.2B | All | ❌ FAIL — FAB at raw index 13, model always taps 0 |
+| 2 | Keyword FAB tap, LLM handles compose | LFM2.5 1.2B | Partial | ❌ FAIL — `ComposerActivity` destroyed when agent steals foreground for inference |
+| 3 | Deep link to compose + SINGLE_TOP + quick POST | LFM2.5 1.2B | 0 | ✅ PASS (~20s) — but skips home feed entirely, looks scripted |
+| 4 | Full programmatic flow (FAB→compose→type→POST) | LFM2.5 1.2B | 0 | ✅ PASS (~27s) — visible navigation, zero AI reasoning |
+| **5** | **Goal-aware filter + SINGLE_TOP + targeted guards** | **Qwen3 4B** | **3** | ✅ **PASS (6 steps, ~4 min) — real LLM navigation decisions** |
+
+**Approach 2 failure detail:** When the agent steals the foreground for inference, `getLaunchIntentForPackage()` on return triggers X's `singleTask` launch mode — this clears the back stack and destroys any open `ComposerActivity`. The composed tweet is lost before the model can post it. Fix: use `FLAG_ACTIVITY_SINGLE_TOP` when `ComposerActivity` is detected as open.
+
+**Approach 5 is the only one with genuine on-device reasoning.** The model made 3 real navigation decisions; guards only cover two deterministic failure modes that LLM inference cannot solve reliably.
+
+---
+
+## Goal-Aware Element Filtering
+
+Both LFM2.5 1.2B and Qwen3 4B consistently output `ui_tap(0)` or `ui_tap(1)`. On X's home feed, "New post" is at raw accessibility index **13** — unreachable for a model that always picks low indices.
+
+**Solution — `filterScreenForGoal(compactText, goal)`:** Before every inference step, score and re-rank all interactive elements against the goal, take the top 5, and re-index them 0–4. The model sees a 5-element screen where the most relevant action is always at index 0.
+
+**Scoring:**
+- Keyword match: each word in the goal scored against element label (case-insensitive)
+- EditText bonus: +10 when goal implies text composition (makes the compose field rank above toolbar buttons)
+- Index remapping logged: `Clicked element orig=13 (filtered=0) via accessibility action`
+
+**Home feed example** (goal = "post saying Hi from RunAnywhere Android agent"):
+
+```
+Raw index 13: New post (ImageButton) [tap]  → score 8  → filtered index 0  ← model taps this
+Raw index  0: Show navigation drawer [tap]  → score 1  → filtered index 1
+Raw index  1: Timeline settings [tap]       → score 1  → filtered index 2
+… 16 other elements hidden
+```
+
+**ComposerActivity example** (same goal):
+
+```
+Raw index 2: What's happening? (EditText) [tap,edit]  → score 11 (editBonus)  → filtered index 0
+Raw index 3: Changes who can reply [tap]               → score 2               → filtered index 1
+… 10 other elements hidden
+```
+
+Model outputs `ui_tap(0)` both times — and both times it is the correct action.
+
+---
+
+## Guards (Minimal Hardcoded Assists)
+
+Three guards cover specific failure modes that cannot be solved by LLM inference alone:
+
+| Guard | Trigger | Why LLM can't handle it | Action |
+|-------|---------|------------------------|--------|
+| **X-NAV** | FAB overlay expanded ("Go Live" + "Post Photos" visible) | Overlay collapses when agent steals foreground for inference — a ~70s window lost | Tap "New post" immediately, no inference |
+| **Recovery Strategy 0** | `isXComposeOpen`, tweet text not typed, loop detected | Model calls `ui_tap` to focus EditText but never follows with `ui_type` | Type `xComposeMessage` directly via accessibility |
+| **X-GUARD** | Tweet text in compose field + POST button visible + step ≥ 3 | Prevents model from navigating away from a fully-composed tweet | Tap POST directly |
+
+---
+
+## Live Run Trace
+
+**Model:** Qwen3 4B Q4_K_M · **Device:** Samsung Galaxy S24 · **Date:** 2026-02-20
+
+```
+PRE-LAUNCH
+  extractTweetText("...post saying Hi from RunAnywhere Android agent")
+    → xComposeMessage = "Hi from RunAnywhere Android agent"
+  twitter://timeline → X opens on clean home feed
+
+STEP 1  [LLM inference — 65.3s — 0.3 tok/s]
+  Screen: X home feed, 19 elements
+  FILTER 5/19 → New post (ImageButton) at filtered index 0  [orig=13]
+  Foreground: Agent app (inference boost active), X in background
+  Model output: ui_tap({index: 0})  ✓ correct
+  Executor: filtered=0 → orig=13, tapped FAB
+  → X foreground, agent background
+
+STEP 2  [X-NAV guard — <1s — no inference]
+  Screen: FAB overlay expanded, 22 elements ("Go Live", "Post Photos" visible)
+  FILTER 5/22 → New post at filtered index 0  [orig=16]
+  Guard: overlay would collapse on foreground steal → tap immediately
+  → ComposerActivity opens, isXComposeOpen = true
+
+STEP 3  [LLM inference — 72.4s — 0.3 tok/s]
+  Screen: ComposerActivity, 12 elements, compose field empty
+  FILTER 5/12 → What's happening? (EditText) at filtered index 0  [orig=2, editBonus=10]
+  Foreground: Agent app (SINGLE_TOP keeps ComposerActivity alive in background)
+  Model output: ui_tap({index: 0})  ✓ focuses compose field
+  → X ComposerActivity foreground (SINGLE_TOP, compose preserved)
+
+STEP 4  [LLM inference — 78.3s — 0.3 tok/s]
+  Screen: identical (field focused but empty)
+  FILTER: same, EditText still at index 0
+  Model output: ui_tap({index: 0})  — taps EditText again (loop begins)
+
+STEP 5  [Recovery Strategy 0 — <1s — no inference]
+  Loop detected: steps 3+4 both tapped filtered=0
+  isXComposeOpen=true, tweet text not in compactText
+  → actionExecutor.execute(Decision("type", text="Hi from RunAnywhere Android agent"))
+  Log: "[RECOVERY] Compose field empty — typing tweet text directly"
+
+STEP 6  [X-GUARD — <1s — no inference]
+  Screen: 13 elements, POST (Button) at raw index 1
+  "Hi from RunAnywhere Android agent" present in compactText → textTyped=true
+  → tapped POST (orig=1)
+  Log: "[X-GUARD] Tweet ready — tapping POST at index 1"
+  Log: "Tweet posted successfully!"
+  Log: "Goal achieved: tweet posted"
+  Status: DONE ✅
+
+SUMMARY
+  Steps:          6/30
+  LLM inferences: 3  (steps 1, 3, 4 — ~70s each)
+  Guard actions:  3  (X-NAV, Recovery, X-GUARD — <1s each)
+  Total time:     ~4 min 10s  (216s inference + ~34s UI transitions)
+  Inference speed: 0.3 tok/s (Qwen3 4B, foreground-boosted)
+
+PIPELINE TIMELINE (elapsed from agent start)
+  T+0:00–0:02   PRE-LAUNCH     App init, goal parsed, twitter://timeline fires    ~2s
+  T+0:02–1:07   STEP 1         Agent app → foreground, LLM inference, FAB tapped  65s
+  T+1:07–1:08   STEP 2         X-NAV guard fires, overlay → "New post" → Composer  <1s
+  T+1:08–1:20   (transition)   Composer opens, agent detects ComposerActivity      ~12s
+  T+1:20–2:32   STEP 3         Agent app → foreground, LLM inference, tap EditText 72s
+  T+2:32–2:42   (transition)   X returns to foreground (SINGLE_TOP), field focused ~10s
+  T+2:42–4:00   STEP 4         Agent app → foreground, LLM inference, tap EditText 78s
+  T+4:00–4:01   STEP 5         Recovery Strategy 0 → tweet text typed directly     <1s
+  T+4:01–4:02   STEP 6         X-GUARD → POST tapped → tweet live                 <1s
+                ─────────────────────────────────────────────────────────────────────
+  Total                        3 inferences (216s) + guards (<3s) + transitions    ~4:10
+```
+
+---
+
+## What Was LLM vs. Guard
+
+| Step | Who decided | Outcome |
+|------|------------|---------|
+| Open X home feed | Hardcoded (`twitter://timeline`) | Clean entry point |
+| Step 1: tap FAB | **LLM** — filter put FAB at index 0 | ✅ Correct |
+| Step 2: tap "New post" in overlay | Guard (X-NAV) | ✅ Correct — LLM would have collapsed the overlay |
+| Step 3: focus compose field | **LLM** — filter put EditText at index 0 | ✅ Correct |
+| Step 4: focus compose field | **LLM** — same decision | ✅ Valid (loop trigger) |
+| Step 5: type tweet text | Guard (Recovery) — loop detected | ✅ Correct — model can't chain `ui_tap` → `ui_type` |
+| Step 6: tap POST | Guard (X-GUARD) | ✅ Correct — safety net |
+
+**The model made 3 real navigation decisions. All 3 were correct — not coincidentally, but because goal-aware filtering placed the right element at index 0 before the model saw the screen.**
+
+---
+
+## Proof
+
+Tweet posted live during screen recording. Agent status: **DONE** (green), logs: "Tweet posted successfully! / Goal achieved: tweet posted".
+
+> **Hi from RunAnywhere Android agent**
+> — @RunAnywhereAI · Feb 20, 2026
+
+**Screen recording:** [`android_agent_muted.mov`](./assets/android_agent_muted.mov)
+
+---
+
+*Built by the RunAnywhere team · san@runanywhere.ai*
@@ -1,17 +1,17 @@
 plugins {
-    id("com.android.application")
-    id("org.jetbrains.kotlin.android")
-    id("org.jetbrains.kotlin.plugin.compose")
+    alias(libs.plugins.android.application)
+    alias(libs.plugins.kotlin.android)
+    alias(libs.plugins.compose.compiler)
 }
 
 android {
     namespace = "com.runanywhere.agent"
-    compileSdk = 34
+    compileSdk = 35
 
     defaultConfig {
         applicationId = "com.runanywhere.agent"
         minSdk = 26
-        targetSdk = 34
+        targetSdk = 35
         versionCode = 1
         versionName = "1.0"
 
@@ -59,7 +59,6 @@ android {
         buildConfig = true
     }
 
-
     packaging {
         resources {
             excludes += "/META-INF/{AL2.0,LGPL2.1}"
@@ -71,36 +70,33 @@ android {
 }
 
 dependencies {
-    // RunAnywhere SDK (on-device LLM + STT) - local AARs
-    implementation(files("../libs/RunAnywhereKotlinSDK-release.aar"))
-    implementation(files("../libs/runanywhere-core-llamacpp-release.aar"))
-    implementation(files("../libs/runanywhere-core-onnx-release.aar"))
+    // RunAnywhere SDK (on-device LLM + VLM + STT + Tool Calling)
+    implementation(libs.runanywhere.sdk)
+    implementation(libs.runanywhere.llamacpp)
+    implementation(libs.runanywhere.onnx)
 
     // Android Core
-    implementation("androidx.core:core-ktx:1.12.0")
-    implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.7.0")
-    implementation("androidx.activity:activity-compose:1.8.2")
+    implementation(libs.androidx.core.ktx)
+    implementation(libs.androidx.lifecycle.runtime.ktx)
+    implementation(libs.androidx.activity.compose)
 
     // Jetpack Compose
-    implementation(platform("androidx.compose:compose-bom:2024.06.00"))
-    implementation("androidx.compose.ui:ui")
-    implementation("androidx.compose.ui:ui-graphics")
-    implementation("androidx.compose.ui:ui-tooling-preview")
-    implementation("androidx.compose.material3:material3")
-    implementation("androidx.compose.material:material-icons-extended")
-    implementation("androidx.lifecycle:lifecycle-viewmodel-compose:2.7.0")
+    implementation(platform(libs.androidx.compose.bom))
+    implementation(libs.androidx.compose.ui)
+    implementation(libs.androidx.compose.ui.graphics)
+    implementation(libs.androidx.compose.ui.tooling.preview)
+    implementation(libs.androidx.compose.material3)
+    implementation(libs.androidx.compose.material.icons.extended)
+    implementation(libs.androidx.lifecycle.viewmodel.compose)
 
     // Coroutines
-    implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3")
-    implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3")
-
-    // Archive extraction (required by RunAnywhere SDK for model downloads)
-    implementation("org.apache.commons:commons-compress:1.26.0")
+    implementation(libs.kotlinx.coroutines.core)
+    implementation(libs.kotlinx.coroutines.android)
 
     // Networking
-    implementation("com.squareup.okhttp3:okhttp:4.12.0")
+    implementation(libs.okhttp)
 
     // Debug
-    debugImplementation("androidx.compose.ui:ui-tooling")
-    debugImplementation("androidx.compose.ui:ui-test-manifest")
+    debugImplementation(libs.androidx.compose.ui.tooling)
+    debugImplementation(libs.androidx.compose.ui.test.manifest)
 }
@@ -5,13 +5,35 @@
     <uses-permission android:name="android.permission.INTERNET" />
     <uses-permission android:name="android.permission.RECORD_AUDIO" />
     <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
+    <uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
+    <uses-permission android:name="android.permission.FOREGROUND_SERVICE_SPECIAL_USE" />
+    <uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
+    <uses-permission android:name="android.permission.WAKE_LOCK" />
+    <uses-permission android:name="com.android.alarm.permission.SET_ALARM" />
+    <uses-permission android:name="android.permission.REQUEST_IGNORE_BATTERY_OPTIMIZATIONS" />
     <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"
         android:maxSdkVersion="28" />
     <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"
         android:maxSdkVersion="32" />
+    <!-- Agent needs to discover and launch any app on the device -->
+    <uses-permission android:name="android.permission.QUERY_ALL_PACKAGES"
+        tools:ignore="QueryAllPackagesPermission" />
 
     <uses-feature android:name="android.hardware.microphone" android:required="false" />
 
+    <!-- Package visibility: allow querying any app with a launcher intent -->
+    <queries>
+        <intent>
+            <action android:name="android.intent.action.MAIN" />
+            <category android:name="android.intent.category.LAUNCHER" />
+        </intent>
+        <intent>
+            <action android:name="android.intent.action.VIEW" />
+            <category android:name="android.intent.category.BROWSABLE" />
+            <data android:scheme="https" />
+        </intent>
+    </queries>
+
     <application
         android:name=".AgentApplication"
         android:allowBackup="true"
@@ -32,6 +54,11 @@
             </intent-filter>
         </activity>
 
+        <service
+            android:name=".AgentForegroundService"
+            android:foregroundServiceType="specialUse"
+            android:exported="false" />
+
         <service
             android:name=".accessibility.AgentAccessibilityService"
             android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE"