Skip to content

Commit c4c2bb2

Browse files
sanchitmonga22runanywhereclaudeSiddhesh2377
authored
Android Use agent update (#361)
* Add Android in-app benchmarking implementation prompt (iOS-parity spec) * Add Android-parity in-app benchmarks implementation prompt * Add core agent LLM and vision provider abstractions - Introduced `AgentLLMProvider` interface for LLM reasoning, including methods for action decision-making and planning. - Implemented `OnDeviceLLMProvider` for on-device LLM operations using the RunAnywhere SDK, supporting utility tool registration and model management. - Added `VisionProvider` interface for screen analysis, with a `TextOnlyVisionProvider` fallback. - Registered various utility tools (e.g., time, weather, battery level) to enhance agent capabilities. These changes lay the groundwork for improved decision-making and context understanding in the agent's operations. * cleanup * Refactor build configuration and enhance VLM support - Updated build.gradle.kts to use plugin aliases for better readability. - Modified settings.gradle.kts to include specific content filters for Google and AndroidX repositories. - Increased compileSdk and targetSdk versions to 35. - Replaced local AAR dependencies with library references for RunAnywhere SDK and other dependencies. - Implemented OnDeviceVisionProvider for local VLM model analysis, including model registration and loading. - Enhanced AgentViewModel to manage VLM model state and loading. - Added UI components for VLM model status and provider mode display. - Removed obsolete local AAR files for RunAnywhere SDK components. - Introduced a new ProviderBadge component to indicate the current provider mode in the UI. * Add on-device LLM benchmarking documentation and enhance agent functionality - Introduced ASSESSMENT.md for benchmarking study of four on-device LLM models on Samsung Galaxy S24. - Updated README.md to reflect new features and architecture, emphasizing fully on-device AI capabilities. - Added permissions and service declarations in AndroidManifest.xml for foreground service to maintain agent activity. - Implemented AgentForegroundService to prevent process termination during background operation. - Modified AgentViewModel to start and stop the foreground service appropriately. - Enhanced ActionExecutor to support opening notes apps and setting alarms. - Updated ActionHistory to provide a compact format for local models. - Improved ScreenParser to include foreground package information for better context during agent operation. - Adjusted AgentKernel to manage app navigation and pre-launch logic more effectively, ensuring smoother user experience. * minor updates * update * updating assessment * Add X Compose Shortcut Implementation and UI Enhancements - Introduced a three-piece solution for X (Twitter) compose flow, utilizing deep links and foreground activity management to improve navigation speed and reliability. - Enhanced AgentViewModel to manage live LLM streaming text and clipboard functionality for log exports. - Updated AgentAccessibilityService to filter out unlabeled container classes, improving navigation accuracy. - Implemented a new ThinkingPanel UI component to display streaming tokens during LLM reasoning. - Enhanced ActionExecutor to support direct clicks on elements, bypassing gesture interceptors for improved interaction. - Added structured logging for agent steps to facilitate better performance tracking and export capabilities. * Add Approach 3 X compose flow and document live test results Implemented fully-assisted X posting flow (Approach 3) with LFM2.5-1.2B: - Restored xComposeMessage field to track compose state - Extended extractTweetText() with Pattern 3: "post saying <text>" (no quotes needed) - Restored openXCompose() deep link in preLaunchApp() for X goals - Restored ComposerActivity + SINGLE_TOP in bringAppToForeground() to preserve compose during inference - Restored findPostButtonIndex() quick-tap block for zero-LLM-step POST - Kept X-FAB keyword FAB tap as fallback Live test results documented in ASSESSMENT.md: - Approach 1 (pure LLM): FAIL — 1.2B always picks index 0, stuck in nav drawer - Approach 2 (keyword FAB): FAIL — opens compose correctly but compose destroyed during inference - Approach 3 (fully assisted): PASS — tweet posted in ~20s, 0 LLM inference steps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add X post demo mode with X-FAB + X-TYPE + X-POST flow and report - X-TYPE block: auto-types tweet text into compose field when blank - X-POST block: updated to check text is present before tapping POST - findComposeTextFieldIndex(): finds [tap,edit] EditText in accessibility tree - preLaunchApp(): always opens X home feed (no deep link) for visible navigation - extractTweetText() Pattern 3: matches "post/tweet saying <text>" without quotes - X_POST.md: full implementation report with live logcat trace and proof of tweet Tweet posted live: "Hi from RunAnywhere Android agent" — @RunanywhereAI, Feb 19 2026 27s execution time, 0 LLM inference calls, full navigation steps visible on screen. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * adding stuff --------- Co-authored-by: runanywhere <runanywhere@runanywheres-MacBook-Pro.local> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Siddhesh <siddheshsonar2377@gmail.com>
1 parent efca85a commit c4c2bb2

34 files changed

Lines changed: 3795 additions & 368 deletions

.github/pull_request_template.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ Brief description of the changes made.
2929
- [ ] Tested on iOS
3030
- [ ] Tested on Android
3131

32+
**Playground:**
33+
- [ ] Tested on target platform
34+
- [ ] Verified no regressions in existing Playground projects
3235
**Web SDK / Web Sample:**
3336
- [ ] Tested in Chrome (Desktop)
3437
- [ ] Tested in Firefox

Playground/README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Interactive demo projects showcasing what you can build with RunAnywhere.
66
|---------|-------------|----------|
77
| [swift-starter-app](swift-starter-app/) | Privacy-first AI demo — LLM Chat, Speech-to-Text, Text-to-Speech, and Voice Pipeline with VAD | iOS (Swift/SwiftUI) |
88
| [on-device-browser-agent](on-device-browser-agent/) | On-device AI browser automation using WebLLM — no cloud, no API keys, fully private | Chrome Extension (TypeScript/React) |
9-
| [android-use-agent](android-use-agent/) | Autonomous Android agent — navigates phone UI via accessibility + GPT-4o Vision + on-device LLM fallback | Android (Kotlin/Jetpack Compose) |
9+
| [android-use-agent](android-use-agent/) | Fully on-device autonomous Android agent — navigates phone UI via accessibility + on-device LLM (Qwen3-4B). See [benchmarks](android-use-agent/ASSESSMENT.md) | Android (Kotlin/Jetpack Compose) |
1010
| [linux-voice-assistant](linux-voice-assistant/) | Fully on-device voice assistant — Wake Word, VAD, STT, LLM, and TTS with zero cloud dependency | Linux (C++/ALSA) |
1111
| [openclaw-hybrid-assistant](openclaw-hybrid-assistant/) | Hybrid voice assistant — on-device Wake Word, VAD, STT, and TTS with cloud LLM via OpenClaw WebSocket | Linux (C++/ALSA) |
1212

@@ -46,14 +46,18 @@ A Chrome extension that automates browser tasks entirely on-device using WebLLM
4646

4747
## android-use-agent
4848

49-
An autonomous Android agent that navigates your phone's UI to accomplish tasks:
49+
A fully on-device autonomous Android agent that navigates your phone's UI to accomplish tasks. All LLM inference runs locally via RunAnywhere SDK with llama.cpp -- no cloud dependency required.
5050

51-
- **Autonomous UI Navigation** — Taps, types, swipes, and navigates apps to complete goals
52-
- **GPT-4o Vision** — Screenshots sent to GPT-4o for visual screen understanding
53-
- **Unified Tool Calling** — All UI actions registered as OpenAI function calling tools
54-
- **On-Device Fallback** — Falls back to local LLM via RunAnywhere SDK when offline
51+
- **Fully On-Device AI** — LLM inference via RunAnywhere SDK + llama.cpp (Qwen3-4B recommended)
52+
- **Accessibility-Based Screen Parsing** — Reads UI tree via Android Accessibility API, no root required
53+
- **Tool Calling** — LLM outputs structured tool calls (`<tool_call>` XML or `ui_tap(index=5)` function-call style)
54+
- **Samsung Foreground Boost** — 15x inference speedup by bringing agent to foreground during inference
55+
- **Smart Pre-Launch** — Opens target apps via Android intents before the agent loop
56+
- **Optional Cloud Fallback** — GPT-4o with vision and function calling when an API key is configured
5557
- **Voice Mode** — Speak goals via on-device Whisper STT, hear progress via TTS
5658

59+
See [android-use-agent/ASSESSMENT.md](android-use-agent/ASSESSMENT.md) for detailed model benchmarks across Qwen3-4B, LFM2.5-1.2B, LFM2-8B-A1B MoE, and DS-R1-Qwen3-8B on Samsung Galaxy S24.
60+
5761
**Requirements:** Android 8.0+ (API 26), arm64-v8a device, Accessibility service permission
5862

5963
## openclaw-hybrid-assistant

Playground/android-use-agent/ASSESSMENT.md

Lines changed: 565 additions & 0 deletions
Large diffs are not rendered by default.

Playground/android-use-agent/README.md

Lines changed: 212 additions & 77 deletions
Large diffs are not rendered by default.
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Android Use Agent — X Post Demo Report
2+
3+
**Task:** Post "Hi from RunAnywhere Android agent" on X (Twitter) using on-device LLM inference, navigating from the home feed with no cloud dependency.
4+
5+
**Result:** ✅ PASS — Qwen3 4B, 6 steps, 3 real LLM inferences, goal-aware element filtering, ~4 min total.
6+
7+
**Screen recording:** [`android_agent_muted.mov`](./assets/android_agent_muted.mov)
8+
9+
---
10+
11+
## Device
12+
13+
| Spec | Detail |
14+
|------|--------|
15+
| Device | Samsung Galaxy S24 (SM-S931U1) |
16+
| SoC | Snapdragon 8 Gen 3 — 1× Cortex-X4 @ 3.39 GHz + 3× A720 + 4× A520 |
17+
| RAM | 8 GB LPDDR5X |
18+
| OS | Android 16 (One UI 7) |
19+
| Backend | llama.cpp (GGUF Q4_K_M) via RunAnywhere SDK |
20+
21+
**Critical hardware quirk — Samsung background throttling:** OneUI pins background processes to efficiency cores (A520 @ 2.27 GHz), capping inference at **0.19 tok/s**. Bringing the app to the foreground during inference restores full CPU access: **2.4–25 tok/s** (15–17× improvement). This foreground boost is mandatory for any on-device LLM on Samsung.
22+
23+
---
24+
25+
## Models Benchmarked (UC1: "Open X and tap the post button")
26+
27+
| Model | Size | Speed (fg) | Step Latency | Tool Format | UC1 Result |
28+
|-------|------|-----------|--------------|-------------|------------|
29+
| LFM2-350M Base | 229 MB | ~20 tok/s | 7–12s | ❌ Narrates instead of calls tools | FAIL |
30+
| LFM2.5-1.2B Instruct | 731 MB | ~2.8 tok/s | 8–14s | ✅ Valid | FAIL — always picks index 0–2 |
31+
| **Qwen3-4B** (`/no_think`) | **2.5 GB** | **~4 tok/s** | **67–85s** | **✅ Valid** | **PASS** |
32+
| LFM2-8B-A1B MoE | 5 GB | ~5 tok/s | 29–43s | ⚠️ Emits multi-action plans | FAIL — only 1st action runs |
33+
| DS-R1-Qwen3-8B | 5 GB | ~1.1 tok/s | ~197s | ❌ Hallucinated inner agent loop | FAIL |
34+
35+
**Key finding:** There is a hard capability threshold around 4B parameters. Sub-2B models either can't follow tool-call format (350M) or can't reason about element selection (1.2B). 8B models have better reasoning but wrong output format or wrong speed. **Qwen3-4B with `/no_think` is the only viable on-device model for this task.**
36+
37+
`/no_think` matters: with chain-of-thought enabled, Qwen3-4B spends 95%+ of its 512-token budget on `<think>` and runs out of space before the tool call. `/no_think` forces a direct 18-token output.
38+
39+
---
40+
41+
## Approaches Tried
42+
43+
| # | Strategy | Model | LLM Calls | Outcome |
44+
|---|----------|-------|-----------|---------|
45+
| 1 | Pure LLM — no assists | LFM2.5 1.2B | All | ❌ FAIL — FAB at raw index 13, model always taps 0 |
46+
| 2 | Keyword FAB tap, LLM handles compose | LFM2.5 1.2B | Partial | ❌ FAIL — `ComposerActivity` destroyed when agent steals foreground for inference |
47+
| 3 | Deep link to compose + SINGLE_TOP + quick POST | LFM2.5 1.2B | 0 | ✅ PASS (~20s) — but skips home feed entirely, looks scripted |
48+
| 4 | Full programmatic flow (FAB→compose→type→POST) | LFM2.5 1.2B | 0 | ✅ PASS (~27s) — visible navigation, zero AI reasoning |
49+
| **5** | **Goal-aware filter + SINGLE_TOP + targeted guards** | **Qwen3 4B** | **3** |**PASS (6 steps, ~4 min) — real LLM navigation decisions** |
50+
51+
**Approach 2 failure detail:** When the agent steals the foreground for inference, `getLaunchIntentForPackage()` on return triggers X's `singleTask` launch mode — this clears the back stack and destroys any open `ComposerActivity`. The composed tweet is lost before the model can post it. Fix: use `FLAG_ACTIVITY_SINGLE_TOP` when `ComposerActivity` is detected as open.
52+
53+
**Approach 5 is the only one with genuine on-device reasoning.** The model made 3 real navigation decisions; guards only cover two deterministic failure modes that LLM inference cannot solve reliably.
54+
55+
---
56+
57+
## Goal-Aware Element Filtering
58+
59+
Both LFM2.5 1.2B and Qwen3 4B consistently output `ui_tap(0)` or `ui_tap(1)`. On X's home feed, "New post" is at raw accessibility index **13** — unreachable for a model that always picks low indices.
60+
61+
**Solution — `filterScreenForGoal(compactText, goal)`:** Before every inference step, score and re-rank all interactive elements against the goal, take the top 5, and re-index them 0–4. The model sees a 5-element screen where the most relevant action is always at index 0.
62+
63+
**Scoring:**
64+
- Keyword match: each word in the goal scored against element label (case-insensitive)
65+
- EditText bonus: +10 when goal implies text composition (makes the compose field rank above toolbar buttons)
66+
- Index remapping logged: `Clicked element orig=13 (filtered=0) via accessibility action`
67+
68+
**Home feed example** (goal = "post saying Hi from RunAnywhere Android agent"):
69+
70+
```
71+
Raw index 13: New post (ImageButton) [tap] → score 8 → filtered index 0 ← model taps this
72+
Raw index 0: Show navigation drawer [tap] → score 1 → filtered index 1
73+
Raw index 1: Timeline settings [tap] → score 1 → filtered index 2
74+
… 16 other elements hidden
75+
```
76+
77+
**ComposerActivity example** (same goal):
78+
79+
```
80+
Raw index 2: What's happening? (EditText) [tap,edit] → score 11 (editBonus) → filtered index 0
81+
Raw index 3: Changes who can reply [tap] → score 2 → filtered index 1
82+
… 10 other elements hidden
83+
```
84+
85+
Model outputs `ui_tap(0)` both times — and both times it is the correct action.
86+
87+
---
88+
89+
## Guards (Minimal Hardcoded Assists)
90+
91+
Three guards cover specific failure modes that cannot be solved by LLM inference alone:
92+
93+
| Guard | Trigger | Why LLM can't handle it | Action |
94+
|-------|---------|------------------------|--------|
95+
| **X-NAV** | FAB overlay expanded ("Go Live" + "Post Photos" visible) | Overlay collapses when agent steals foreground for inference — a ~70s window lost | Tap "New post" immediately, no inference |
96+
| **Recovery Strategy 0** | `isXComposeOpen`, tweet text not typed, loop detected | Model calls `ui_tap` to focus EditText but never follows with `ui_type` | Type `xComposeMessage` directly via accessibility |
97+
| **X-GUARD** | Tweet text in compose field + POST button visible + step ≥ 3 | Prevents model from navigating away from a fully-composed tweet | Tap POST directly |
98+
99+
---
100+
101+
## Live Run Trace
102+
103+
**Model:** Qwen3 4B Q4_K_M · **Device:** Samsung Galaxy S24 · **Date:** 2026-02-20
104+
105+
```
106+
PRE-LAUNCH
107+
extractTweetText("...post saying Hi from RunAnywhere Android agent")
108+
→ xComposeMessage = "Hi from RunAnywhere Android agent"
109+
twitter://timeline → X opens on clean home feed
110+
111+
STEP 1 [LLM inference — 65.3s — 0.3 tok/s]
112+
Screen: X home feed, 19 elements
113+
FILTER 5/19 → New post (ImageButton) at filtered index 0 [orig=13]
114+
Foreground: Agent app (inference boost active), X in background
115+
Model output: ui_tap({index: 0}) ✓ correct
116+
Executor: filtered=0 → orig=13, tapped FAB
117+
→ X foreground, agent background
118+
119+
STEP 2 [X-NAV guard — <1s — no inference]
120+
Screen: FAB overlay expanded, 22 elements ("Go Live", "Post Photos" visible)
121+
FILTER 5/22 → New post at filtered index 0 [orig=16]
122+
Guard: overlay would collapse on foreground steal → tap immediately
123+
→ ComposerActivity opens, isXComposeOpen = true
124+
125+
STEP 3 [LLM inference — 72.4s — 0.3 tok/s]
126+
Screen: ComposerActivity, 12 elements, compose field empty
127+
FILTER 5/12 → What's happening? (EditText) at filtered index 0 [orig=2, editBonus=10]
128+
Foreground: Agent app (SINGLE_TOP keeps ComposerActivity alive in background)
129+
Model output: ui_tap({index: 0}) ✓ focuses compose field
130+
→ X ComposerActivity foreground (SINGLE_TOP, compose preserved)
131+
132+
STEP 4 [LLM inference — 78.3s — 0.3 tok/s]
133+
Screen: identical (field focused but empty)
134+
FILTER: same, EditText still at index 0
135+
Model output: ui_tap({index: 0}) — taps EditText again (loop begins)
136+
137+
STEP 5 [Recovery Strategy 0 — <1s — no inference]
138+
Loop detected: steps 3+4 both tapped filtered=0
139+
isXComposeOpen=true, tweet text not in compactText
140+
→ actionExecutor.execute(Decision("type", text="Hi from RunAnywhere Android agent"))
141+
Log: "[RECOVERY] Compose field empty — typing tweet text directly"
142+
143+
STEP 6 [X-GUARD — <1s — no inference]
144+
Screen: 13 elements, POST (Button) at raw index 1
145+
"Hi from RunAnywhere Android agent" present in compactText → textTyped=true
146+
→ tapped POST (orig=1)
147+
Log: "[X-GUARD] Tweet ready — tapping POST at index 1"
148+
Log: "Tweet posted successfully!"
149+
Log: "Goal achieved: tweet posted"
150+
Status: DONE ✅
151+
152+
SUMMARY
153+
Steps: 6/30
154+
LLM inferences: 3 (steps 1, 3, 4 — ~70s each)
155+
Guard actions: 3 (X-NAV, Recovery, X-GUARD — <1s each)
156+
Total time: ~4 min 10s (216s inference + ~34s UI transitions)
157+
Inference speed: 0.3 tok/s (Qwen3 4B, foreground-boosted)
158+
159+
PIPELINE TIMELINE (elapsed from agent start)
160+
T+0:00–0:02 PRE-LAUNCH App init, goal parsed, twitter://timeline fires ~2s
161+
T+0:02–1:07 STEP 1 Agent app → foreground, LLM inference, FAB tapped 65s
162+
T+1:07–1:08 STEP 2 X-NAV guard fires, overlay → "New post" → Composer <1s
163+
T+1:08–1:20 (transition) Composer opens, agent detects ComposerActivity ~12s
164+
T+1:20–2:32 STEP 3 Agent app → foreground, LLM inference, tap EditText 72s
165+
T+2:32–2:42 (transition) X returns to foreground (SINGLE_TOP), field focused ~10s
166+
T+2:42–4:00 STEP 4 Agent app → foreground, LLM inference, tap EditText 78s
167+
T+4:00–4:01 STEP 5 Recovery Strategy 0 → tweet text typed directly <1s
168+
T+4:01–4:02 STEP 6 X-GUARD → POST tapped → tweet live <1s
169+
─────────────────────────────────────────────────────────────────────
170+
Total 3 inferences (216s) + guards (<3s) + transitions ~4:10
171+
```
172+
173+
---
174+
175+
## What Was LLM vs. Guard
176+
177+
| Step | Who decided | Outcome |
178+
|------|------------|---------|
179+
| Open X home feed | Hardcoded (`twitter://timeline`) | Clean entry point |
180+
| Step 1: tap FAB | **LLM** — filter put FAB at index 0 | ✅ Correct |
181+
| Step 2: tap "New post" in overlay | Guard (X-NAV) | ✅ Correct — LLM would have collapsed the overlay |
182+
| Step 3: focus compose field | **LLM** — filter put EditText at index 0 | ✅ Correct |
183+
| Step 4: focus compose field | **LLM** — same decision | ✅ Valid (loop trigger) |
184+
| Step 5: type tweet text | Guard (Recovery) — loop detected | ✅ Correct — model can't chain `ui_tap``ui_type` |
185+
| Step 6: tap POST | Guard (X-GUARD) | ✅ Correct — safety net |
186+
187+
**The model made 3 real navigation decisions. All 3 were correct — not coincidentally, but because goal-aware filtering placed the right element at index 0 before the model saw the screen.**
188+
189+
---
190+
191+
## Proof
192+
193+
Tweet posted live during screen recording. Agent status: **DONE** (green), logs: "Tweet posted successfully! / Goal achieved: tweet posted".
194+
195+
> **Hi from RunAnywhere Android agent**
196+
> @RunAnywhereAI · Feb 20, 2026
197+
198+
**Screen recording:** [`android_agent_muted.mov`](./assets/android_agent_muted.mov)
199+
200+
---
201+
202+
*Built by the RunAnywhere team · san@runanywhere.ai*

Playground/android-use-agent/app/build.gradle.kts

Lines changed: 24 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
plugins {
2-
id("com.android.application")
3-
id("org.jetbrains.kotlin.android")
4-
id("org.jetbrains.kotlin.plugin.compose")
2+
alias(libs.plugins.android.application)
3+
alias(libs.plugins.kotlin.android)
4+
alias(libs.plugins.compose.compiler)
55
}
66

77
android {
88
namespace = "com.runanywhere.agent"
9-
compileSdk = 34
9+
compileSdk = 35
1010

1111
defaultConfig {
1212
applicationId = "com.runanywhere.agent"
1313
minSdk = 26
14-
targetSdk = 34
14+
targetSdk = 35
1515
versionCode = 1
1616
versionName = "1.0"
1717

@@ -59,7 +59,6 @@ android {
5959
buildConfig = true
6060
}
6161

62-
6362
packaging {
6463
resources {
6564
excludes += "/META-INF/{AL2.0,LGPL2.1}"
@@ -71,36 +70,33 @@ android {
7170
}
7271

7372
dependencies {
74-
// RunAnywhere SDK (on-device LLM + STT) - local AARs
75-
implementation(files("../libs/RunAnywhereKotlinSDK-release.aar"))
76-
implementation(files("../libs/runanywhere-core-llamacpp-release.aar"))
77-
implementation(files("../libs/runanywhere-core-onnx-release.aar"))
73+
// RunAnywhere SDK (on-device LLM + VLM + STT + Tool Calling)
74+
implementation(libs.runanywhere.sdk)
75+
implementation(libs.runanywhere.llamacpp)
76+
implementation(libs.runanywhere.onnx)
7877

7978
// Android Core
80-
implementation("androidx.core:core-ktx:1.12.0")
81-
implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.7.0")
82-
implementation("androidx.activity:activity-compose:1.8.2")
79+
implementation(libs.androidx.core.ktx)
80+
implementation(libs.androidx.lifecycle.runtime.ktx)
81+
implementation(libs.androidx.activity.compose)
8382

8483
// Jetpack Compose
85-
implementation(platform("androidx.compose:compose-bom:2024.06.00"))
86-
implementation("androidx.compose.ui:ui")
87-
implementation("androidx.compose.ui:ui-graphics")
88-
implementation("androidx.compose.ui:ui-tooling-preview")
89-
implementation("androidx.compose.material3:material3")
90-
implementation("androidx.compose.material:material-icons-extended")
91-
implementation("androidx.lifecycle:lifecycle-viewmodel-compose:2.7.0")
84+
implementation(platform(libs.androidx.compose.bom))
85+
implementation(libs.androidx.compose.ui)
86+
implementation(libs.androidx.compose.ui.graphics)
87+
implementation(libs.androidx.compose.ui.tooling.preview)
88+
implementation(libs.androidx.compose.material3)
89+
implementation(libs.androidx.compose.material.icons.extended)
90+
implementation(libs.androidx.lifecycle.viewmodel.compose)
9291

9392
// Coroutines
94-
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3")
95-
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3")
96-
97-
// Archive extraction (required by RunAnywhere SDK for model downloads)
98-
implementation("org.apache.commons:commons-compress:1.26.0")
93+
implementation(libs.kotlinx.coroutines.core)
94+
implementation(libs.kotlinx.coroutines.android)
9995

10096
// Networking
101-
implementation("com.squareup.okhttp3:okhttp:4.12.0")
97+
implementation(libs.okhttp)
10298

10399
// Debug
104-
debugImplementation("androidx.compose.ui:ui-tooling")
105-
debugImplementation("androidx.compose.ui:ui-test-manifest")
100+
debugImplementation(libs.androidx.compose.ui.tooling)
101+
debugImplementation(libs.androidx.compose.ui.test.manifest)
106102
}

Playground/android-use-agent/app/src/main/AndroidManifest.xml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,35 @@
55
<uses-permission android:name="android.permission.INTERNET" />
66
<uses-permission android:name="android.permission.RECORD_AUDIO" />
77
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
8+
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
9+
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_SPECIAL_USE" />
10+
<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
11+
<uses-permission android:name="android.permission.WAKE_LOCK" />
12+
<uses-permission android:name="com.android.alarm.permission.SET_ALARM" />
13+
<uses-permission android:name="android.permission.REQUEST_IGNORE_BATTERY_OPTIMIZATIONS" />
814
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"
915
android:maxSdkVersion="28" />
1016
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"
1117
android:maxSdkVersion="32" />
18+
<!-- Agent needs to discover and launch any app on the device -->
19+
<uses-permission android:name="android.permission.QUERY_ALL_PACKAGES"
20+
tools:ignore="QueryAllPackagesPermission" />
1221

1322
<uses-feature android:name="android.hardware.microphone" android:required="false" />
1423

24+
<!-- Package visibility: allow querying any app with a launcher intent -->
25+
<queries>
26+
<intent>
27+
<action android:name="android.intent.action.MAIN" />
28+
<category android:name="android.intent.category.LAUNCHER" />
29+
</intent>
30+
<intent>
31+
<action android:name="android.intent.action.VIEW" />
32+
<category android:name="android.intent.category.BROWSABLE" />
33+
<data android:scheme="https" />
34+
</intent>
35+
</queries>
36+
1537
<application
1638
android:name=".AgentApplication"
1739
android:allowBackup="true"
@@ -32,6 +54,11 @@
3254
</intent-filter>
3355
</activity>
3456

57+
<service
58+
android:name=".AgentForegroundService"
59+
android:foregroundServiceType="specialUse"
60+
android:exported="false" />
61+
3562
<service
3663
android:name=".accessibility.AgentAccessibilityService"
3764
android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE"

0 commit comments

Comments
 (0)