feat: add grammar-constrained structured output across SDKs by sanchitmonga22 · Pull Request #468 · RunanywhereAI/runanywhere-sdks

sanchitmonga22 · 2026-04-14T06:59:46Z

Introduces GBNF grammar-constrained decoding for guaranteed valid JSON output matching a developer schema. Implementation layered through the C++ core, with bindings exposed in all platform SDKs.

Commons (C++): llamacpp grammar sampler at head of sampler chain, JSON Schema → GBNF converter, new component-level structured generate/stream APIs, vtable op json_schema_to_grammar, updated platform backend vtable with explicit NULL entries
JNI: wire grammar field through commons + llamacpp bridges
Swift: StructuredOutputFallback, extended StructuredOutputConfig, generate via rac_llm_component_generate_structured
Web: WASM offset helpers for new fields, TypeScript types updated, StructuredOutputFallback exported
Kotlin: StructuredOutputFallback enum, extended config, LlamaCPPBridge JNI declarations for direct LLM ops + schema-to-grammar

Description

Brief description of the changes made.

Type of Change

Bug fix
New feature
Documentation update
Refactoring

Testing

Lint passes locally
Added/updated tests for changes

Platform-Specific Testing (check all that apply)

Swift SDK / iOS Sample:

Tested on iPhone (Simulator or Device)
Tested on iPad / Tablet
Tested on Mac (macOS target)

Kotlin SDK / Android Sample:

Tested on Android Phone (Emulator or Device)
Tested on Android Tablet

Flutter SDK / Flutter Sample:

Tested on iOS
Tested on Android

React Native SDK / React Native Sample:

Tested on iOS
Tested on Android

Playground:

Tested on target platform
Verified no regressions in existing Playground projects
Web SDK / Web Sample:
Tested in Chrome (Desktop)
Tested in Firefox
Tested in Safari
WASM backends load (LlamaCpp + ONNX)
OPFS storage persistence verified (survives page refresh)
Settings persistence verified (localStorage)

Labels

Please add the appropriate label(s):

SDKs:

Swift SDK - Changes to Swift SDK (sdk/runanywhere-swift)
Kotlin SDK - Changes to Kotlin SDK (sdk/runanywhere-kotlin)
Flutter SDK - Changes to Flutter SDK (sdk/runanywhere-flutter)
React Native SDK - Changes to React Native SDK (sdk/runanywhere-react-native)
Web SDK - Changes to Web SDK (sdk/runanywhere-web)
Commons - Changes to shared native code (sdk/runanywhere-commons)

Sample Apps:

iOS Sample - Changes to iOS example app (examples/ios)
Android Sample - Changes to Android example app (examples/android)
Flutter Sample - Changes to Flutter example app (examples/flutter)
React Native Sample - Changes to React Native example app (examples/react-native)
Web Sample - Changes to Web example app (examples/web)

Checklist

Code follows project style guidelines
Self-review completed
Documentation updated (if needed)

Screenshots

Attach relevant UI screenshots for changes (if applicable):

Mobile (Phone)
Tablet / iPad
Desktop / Mac

Summary by CodeRabbit

New Features
- Grammar-constrained structured output generation now available, enabling models to produce outputs that strictly conform to specified JSON schemas.
- Added automatic JSON Schema to GBNF grammar conversion support for all backends.
- New configuration options: fallback strategies (raw output, retry, prompt-only) and retry limits for grammar-constrained generation.

Greptile Summary

This PR wires GBNF grammar-constrained decoding through the full stack — C++ core, JNI, Swift, Kotlin, and Web — to enable guaranteed-valid JSON output matching a developer schema. The C++ backend integration (grammar sampler placement, JSON Schema → GBNF conversion, vtable extension) is well-structured, but the new component-level structured-output APIs in llm_component.cpp have a critical concurrency defect.

P0 deadlock: Both rac_llm_component_generate_structured and _stream acquire component->mtx, then immediately delegate to rac_llm_component_generate / rac_llm_component_generate_stream, which unconditionally re-acquire the same non-recursive std::mutex. Every call through the new API will hang indefinitely.
P1 silent no-op: max_retries and fallback are documented in the public API and set by all four platform SDKs, but neither field is read anywhere in the C++ implementation; the advertised retry-on-failure behaviour is never executed.

Confidence Score: 4/5

Not safe to merge — the new structured-output component APIs will deadlock on every call due to non-recursive mutex re-acquisition.

The underlying backend work (grammar sampler, schema converter, vtable wiring, platform SDK types) is solid, but the glue layer in llm_component.cpp introduces a guaranteed deadlock that makes both new exported functions completely unusable. Once the mutex re-acquisition is fixed and the max_retries/fallback fields are either implemented or documented as future work, the PR can be merged.

sdk/runanywhere-commons/src/features/llm/llm_component.cpp — both new structured-output functions deadlock; max_retries and fallback are silently dropped.

Important Files Changed

Filename	Overview
sdk/runanywhere-commons/src/features/llm/llm_component.cpp	Adds rac_llm_component_generate_structured and _stream — both acquire component->mtx then delegate to generate/generate_stream which re-acquire the same non-recursive mutex, causing a guaranteed deadlock; max_retries and fallback fields are also silently ignored.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp	Adds grammar sampler to the sampler chain (correctly placed first before temperature/top-p/top-k samplers), adds convert_json_schema_to_grammar using llama.cpp's built-in converter, and caches grammar string for sampler invalidation — implementation looks correct.
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp	Wires grammar through nativeGenerate and adds nativeJsonSchemaToGrammar; JNI string lifecycle is handled correctly, but free() is used instead of rac_free() for the allocated grammar string.
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.h	Adds grammar field to rac_llm_options_t, rac_structured_output_fallback_t enum, and extends rac_structured_output_config_t — struct layout and defaults look correct.
sdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cpp	Adds explicit NULL entries for all optional vtable ops including json_schema_to_grammar — prevents undefined behaviour from uninitialized function pointers.
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift	Migrates generateForStructuredOutput to call rac_llm_component_generate_structured with correct C struct mapping; will deadlock at the C layer, but Swift-side code itself is structurally sound.
sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.kt	Adds JNI declarations for direct LlamaCPP ops (nativeCreate, nativeDestroy, nativeGenerate with grammar, nativeJsonSchemaToGrammar, nativeCancel, nativeGetModelInfo) — signatures are consistent with the JNI C implementations.
sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.ts	Adds StructuredOutputFallback enum and wires useGrammar/maxRetries/fallback into the WASM struct memory — offset names are consistent with the new wasm_exports.cpp helpers.
sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp	Adds grammar field wiring in generate/generate_stream/generate_from_context C-API paths and implements rac_llm_llamacpp_json_schema_to_grammar — implementation is clean and handles null/error cases.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant generate_structured
    participant mtx as component->mtx
    participant generate

    Caller->>generate_structured: rac_llm_component_generate_structured()
    generate_structured->>mtx: lock_guard acquire ✓
    generate_structured->>generate_structured: rac_llm_json_schema_to_grammar()
    generate_structured->>generate: rac_llm_component_generate()
    generate->>mtx: lock_guard acquire ✗ DEADLOCK
    Note over generate,mtx: std::mutex is non-recursive — hangs forever

Prompt To Fix All With AI

This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 786-818

Comment:
**Guaranteed deadlock via non-recursive mutex re-acquisition**

`rac_llm_component_generate_structured` acquires `component->mtx` (line 786) and then delegates to `rac_llm_component_generate` (line 818). `rac_llm_component_generate` unconditionally acquires the same `component->mtx` (line 336). Because `std::mutex` is not recursive, this will deadlock every time the function is called.

The same problem occurs in `rac_llm_component_generate_structured_stream` (line 840) → `rac_llm_component_generate_stream` (line 582), which also immediately acquires `component->mtx`.

The fix is to extract the mutex-free core logic of `generate` and `generate_stream` into internal helpers (e.g., `generate_locked`) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 800-825

Comment:
**`max_retries` and `fallback` silently ignored**

The public API exposes `so_config->max_retries` (default 3) and `so_config->fallback` (default `RETRY`), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in `rac_llm_component_generate_structured` or `rac_llm_component_generate_structured_stream`. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the `RETRY` default is never honoured.

Callers relying on `fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY` or a non-zero `max_retries` to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 311-312

Comment:
**`free` instead of `rac_free` violates API contract**

Both `rac_llm_llamacpp_json_schema_to_grammar` and `rac_llm_json_schema_to_grammar` document their output pointer as "caller must free with `rac_free()`". Here and in `llm_component.cpp` lines 822 and 878, the raw `free()` is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.

```suggestion
    jstring result = env->NewStringUTF(grammarOut);
    rac_free(grammarOut);
    return result;
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat: add grammar-constrained structured..." | Re-trigger Greptile}

Greptile also left 3 inline comments on this PR.

Introduces GBNF grammar-constrained decoding for guaranteed valid JSON output matching a developer schema. Implementation layered through the C++ core, with bindings exposed in all platform SDKs. - Commons (C++): llamacpp grammar sampler at head of sampler chain, JSON Schema → GBNF converter, new component-level structured generate/stream APIs, vtable op json_schema_to_grammar, updated platform backend vtable with explicit NULL entries - JNI: wire grammar field through commons + llamacpp bridges - Swift: StructuredOutputFallback, extended StructuredOutputConfig, generate via rac_llm_component_generate_structured - Web: WASM offset helpers for new fields, TypeScript types updated, StructuredOutputFallback exported - Kotlin: StructuredOutputFallback enum, extended config, LlamaCPPBridge JNI declarations for direct LLM ops + schema-to-grammar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-14T06:59:55Z

📝 Walkthrough

Walkthrough

This pull request adds grammar-constrained structured output generation across the RunAnywhere SDK. It introduces new public C APIs for structured generation with JSON schema validation, extends the LLM options and configuration types to support GBNF grammar parameters, implements JSON-to-grammar conversion in the LlamaCPP backend, and provides language bindings for Kotlin, Swift, and Web platforms.

Changes

Cohort / File(s)	Summary
C API Type Definitions `sdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.h`, `sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_types.h`	Extended `rac_llm_options_t` with `grammar` field for GBNF constraints; added `rac_structured_output_fallback_t` enum and extended `rac_structured_output_config_t` with `use_grammar`, `max_retries`, and `fallback` fields.
C API Service Layer `sdk/runanywhere-commons/include/rac/features/llm/rac_llm_service.h`, `sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_service.h`, `sdk/runanywhere-commons/src/features/llm/rac_llm_service.cpp`	Added vtable operation `json_schema_to_grammar` to `rac_llm_service_ops_t` and new public function `rac_llm_json_schema_to_grammar` for dispatching grammar conversion to backend implementations.
C API Component Layer `sdk/runanywhere-commons/include/rac/features/llm/rac_llm_component.h`, `sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_component.h`, `sdk/runanywhere-commons/src/features/llm/llm_component.cpp`	Added `rac_llm_component_generate_structured` and `_stream` variants that accept `rac_structured_output_config_t`, perform schema-to-grammar conversion, inject grammar into options, and delegate to existing generation functions.
LlamaCPP Backend C API `sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h`, `sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp`	Added `rac_llm_llamacpp_json_schema_to_grammar` function that wraps backend JSON schema conversion and allocates output grammar string.
LlamaCPP Backend Implementation `sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h`, `sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`, `sdk/runanywhere-commons/src/backends/llamacpp/rac_backend_llamacpp_register.cpp`	Implemented `convert_json_schema_to_grammar` method using `json-schema-to-grammar.h`; added grammar sampler initialization and caching in `generate_stream` and `generate_from_context`; wired grammar parameter through text generation request; added vtable adapter for service dispatch.
LlamaCPP JNI Bindings `sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp`, `sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.kt`	Updated `nativeGenerate` JNI signature to accept `jstring grammar`; added `nativeJsonSchemaToGrammar` JNI method; exposed new Kotlin native functions for direct model lifecycle and inference operations including grammar parameter.
Kotlin SDK Types `sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/LLM/LLMTypes.kt`	Added `StructuredOutputFallback` enum and extended `StructuredOutputConfig` with `useGrammar`, `maxRetries`, and `fallback` properties.
JNI Option Parsing `sdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cpp`	Updated option parsing in `racLlmComponentGenerate` and stream variants to recognize optional `grammar` field from JSON configuration.
Swift SDK Types `sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift`	Added `StructuredOutputFallback` enum; extended `StructuredOutputConfig` with `useGrammar`, `maxRetries`, and `fallback` properties and updated initializer.
Swift Implementation `sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift`	Updated structured output generation to use `rac_llm_component_generate_structured` instead of regular generate; construct config with grammar toggle, retry count, and fallback mode.
Web/WASM Support `sdk/runanywhere-web/packages/core/src/Foundation/StructOffsets.ts`, `sdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppOffsets.ts`, `sdk/runanywhere-web/wasm/src/wasm_exports.cpp`	Extended struct offset interfaces with `grammar`, `useGrammar`, `maxRetries`, `fallback` fields; added WASM helper functions to expose field offsets for JavaScript memory access.
Web SDK Types & Extensions `sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.ts`, `sdk/runanywhere-web/packages/llamacpp/src/index.ts`	Added `StructuredOutputFallback` enum; extended `StructuredOutputConfig` interface with `useGrammar`, `maxRetries`, `fallback` fields; updated config preparation and validation to write new fields into WASM struct.
Platform Backend `sdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cpp`	Explicitly initialized additional vtable operations to `nullptr` including `json_schema_to_grammar`, marking them as unsupported for the platform backend.
Export Manifest `sdk/runanywhere-commons/exports/RACommons.exports`	Added exported symbols: `_rac_llm_component_generate_structured`, `_rac_llm_component_generate_structured_stream`, `_rac_llm_json_schema_to_grammar`.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Component as LLM Component
    participant Service as LLM Service
    participant Backend as LlamaCPP Backend
    participant Converter as Grammar Converter

    Client->>Component: rac_llm_component_generate_structured(prompt, options, schema_config)
    activate Component
    Component->>Component: Validate inputs & config
    
    alt use_grammar enabled
        Component->>Service: rac_llm_json_schema_to_grammar(json_schema)
        activate Service
        Service->>Backend: Dispatch to vtable json_schema_to_grammar()
        activate Backend
        Backend->>Converter: convert_json_schema_to_grammar(schema)
        activate Converter
        Converter-->>Backend: GBNF grammar string
        deactivate Converter
        Backend-->>Service: grammar
        deactivate Backend
        Service-->>Component: grammar (caller owns)
        deactivate Service
        
        Component->>Component: Inject grammar into effective_options
    else grammar conversion failed
        Component->>Component: Log warning, proceed prompt-only
    end
    
    Component->>Service: rac_llm_component_generate(prompt, effective_options)
    activate Service
    Service->>Backend: Forward to backend generate
    activate Backend
    Backend->>Backend: Initialize grammar sampler if grammar present
    Backend->>Backend: Add grammar to sampler chain
    Backend->>Backend: Generate tokens constrained by grammar
    Backend-->>Service: Result
    deactivate Backend
    Service-->>Component: Result
    deactivate Service
    
    Component->>Component: Free allocated grammar
    Component-->>Client: Result
    deactivate Component

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PR #382: Implements structured-output support for Flutter SDK with overlapping changes to core structured output types and generation wiring.
PR #340: Modifies LlamaCPP backend and C API sampler/parameter handling, intersecting with grammar parameter plumbing.

Suggested labels

enhancement, WIP

Suggested reviewers

shubhammalhotra28

Poem

🐰 A grammar takes flight, constraining the way,
JSON schemas bloom in GBNF display,
Samplers align with structured delight,
Across all the platforms, the tokens shine bright! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description includes a detailed summary of changes but the template sections (Type of Change, Testing, Labels, Checklist) are mostly unchecked and incomplete.	Mark 'New feature' checkbox under Type of Change, check relevant SDK labels (Commons, Swift, Kotlin, Web), and complete testing and checklist sections with actual results.
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.19% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and concisely summarizes the main feature: adding grammar-constrained structured output across SDKs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/structured-output-grammar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sanchitmonga22 · 2026-04-14T22:54:05Z

@greptile please review

sanchitmonga22 · 2026-04-14T22:54:11Z

@coderabbitai please review

coderabbitai · 2026-04-14T22:54:22Z

@sanchitmonga22: Sure! I'll start reviewing the PR now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

greptile-apps · 2026-04-14T22:59:21Z

+    std::lock_guard<std::mutex> lock(component->mtx);
+
+    // Get service from lifecycle manager
+    rac_handle_t service = nullptr;
+    rac_result_t result = rac_lifecycle_require_service(component->lifecycle, &service);
+    if (result != RAC_SUCCESS) {
+        log_error("LLM.Component", "No model loaded - cannot generate structured");
+        return result;
+    }
+
+    // Use provided options or defaults
+    const rac_llm_options_t* base_options = options ? options : &component->default_options;
+
+    // Build effective options with grammar if requested
+    rac_llm_options_t effective_options = *base_options;
+
+    char* grammar_str = nullptr;
+    if (so_config->use_grammar && so_config->json_schema != nullptr) {
+        // Convert JSON schema to GBNF grammar via backend
+        result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
+        if (result == RAC_SUCCESS && grammar_str) {
+            effective_options.grammar = grammar_str;
+            RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
+        } else {
+            RAC_LOG_WARNING("LLM.Component",
+                            "Grammar conversion failed (result=%d), falling back to prompt-only",
+                            result);
+            // Proceed without grammar — prompt-only fallback
+        }
+    }
+
+    // Delegate to standard generate with grammar-augmented options
+    result = rac_llm_component_generate(handle, prompt, &effective_options, out_result);


Guaranteed deadlock via non-recursive mutex re-acquisition

rac_llm_component_generate_structured acquires component->mtx (line 786) and then delegates to rac_llm_component_generate (line 818). rac_llm_component_generate unconditionally acquires the same component->mtx (line 336). Because std::mutex is not recursive, this will deadlock every time the function is called.

The same problem occurs in rac_llm_component_generate_structured_stream (line 840) → rac_llm_component_generate_stream (line 582), which also immediately acquires component->mtx.

The fix is to extract the mutex-free core logic of generate and generate_stream into internal helpers (e.g., generate_locked) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp Line: 786-818 Comment: **Guaranteed deadlock via non-recursive mutex re-acquisition** `rac_llm_component_generate_structured` acquires `component->mtx` (line 786) and then delegates to `rac_llm_component_generate` (line 818). `rac_llm_component_generate` unconditionally acquires the same `component->mtx` (line 336). Because `std::mutex` is not recursive, this will deadlock every time the function is called. The same problem occurs in `rac_llm_component_generate_structured_stream` (line 840) → `rac_llm_component_generate_stream` (line 582), which also immediately acquires `component->mtx`. The fix is to extract the mutex-free core logic of `generate` and `generate_stream` into internal helpers (e.g., `generate_locked`) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-14T22:59:22Z

+    rac_llm_options_t effective_options = *base_options;
+
+    char* grammar_str = nullptr;
+    if (so_config->use_grammar && so_config->json_schema != nullptr) {
+        // Convert JSON schema to GBNF grammar via backend
+        result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
+        if (result == RAC_SUCCESS && grammar_str) {
+            effective_options.grammar = grammar_str;
+            RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
+        } else {
+            RAC_LOG_WARNING("LLM.Component",
+                            "Grammar conversion failed (result=%d), falling back to prompt-only",
+                            result);
+            // Proceed without grammar — prompt-only fallback
+        }
+    }
+
+    // Delegate to standard generate with grammar-augmented options
+    result = rac_llm_component_generate(handle, prompt, &effective_options, out_result);
+
+    // Free grammar string if we allocated one
+    if (grammar_str) {
+        free(grammar_str);
+    }
+
+    return result;


max_retries and fallback silently ignored

The public API exposes so_config->max_retries (default 3) and so_config->fallback (default RETRY), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in rac_llm_component_generate_structured or rac_llm_component_generate_structured_stream. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the RETRY default is never honoured.

Callers relying on fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY or a non-zero max_retries to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp Line: 800-825 Comment: **`max_retries` and `fallback` silently ignored** The public API exposes `so_config->max_retries` (default 3) and `so_config->fallback` (default `RETRY`), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in `rac_llm_component_generate_structured` or `rac_llm_component_generate_structured_stream`. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the `RETRY` default is never honoured. Callers relying on `fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY` or a non-zero `max_retries` to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-14T22:59:23Z

+    jstring result = env->NewStringUTF(grammarOut);
+    free(grammarOut);


free instead of rac_free violates API contract

Both rac_llm_llamacpp_json_schema_to_grammar and rac_llm_json_schema_to_grammar document their output pointer as "caller must free with rac_free()". Here and in llm_component.cpp lines 822 and 878, the raw free() is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.

Suggested change

jstring result = env->NewStringUTF(grammarOut);

free(grammarOut);

jstring result = env->NewStringUTF(grammarOut);

rac_free(grammarOut);

return result;

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp Line: 311-312 Comment: **`free` instead of `rac_free` violates API contract** Both `rac_llm_llamacpp_json_schema_to_grammar` and `rac_llm_json_schema_to_grammar` document their output pointer as "caller must free with `rac_free()`". Here and in `llm_component.cpp` lines 822 and 878, the raw `free()` is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory. ```suggestion jstring result = env->NewStringUTF(grammarOut); rac_free(grammarOut); return result; ``` How can I resolve this? If you propose a fix, please make it concise.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift (2)
268-326: ⚠️ Potential issue | 🟠 Major

Free rac_llm_result_t before returning.

This path copies llmResult.text into Swift and then returns without calling rac_llm_result_free(&llmResult), so every structured generation leaks native memory. The same leak can happen on an error path if the native call partially populated the result.
💡 Proposed fix
-        var llmResult = rac_llm_result_t()
+        var llmResult = rac_llm_result_t()
+        defer { rac_llm_result_free(&llmResult) }
         let generateResult: rac_result_t
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift
around lines 268 - 326, The native rac_llm_result_t (llmResult) is never freed,
leaking native memory; ensure you call rac_llm_result_free(&llmResult) before
every exit from the function — both the success return that constructs
LLMGenerationResult and any early throw when generateResult != RAC_SUCCESS. The
simplest fix is to register a defer immediately after creating llmResult (var
llmResult = rac_llm_result_t()) that calls rac_llm_result_free(&llmResult), so
the result is freed automatically even on error, and then proceed to copy
llmResult.text and build the LLMGenerationResult as before.
227-305: ⚠️ Potential issue | 🟠 Major

Fix streaming structured output to use the structured native API.

The blocking generateForStructuredOutput correctly routes through rac_llm_component_generate_structured with structured output config, but generateStream in TextGeneration.swift ignores the structuredOutput field in options and always calls the regular rac_llm_component_generate_stream. This causes useGrammar, maxRetries, fallback, and schema enforcement to be silently ignored for streaming. Modify generateStream to check options.structuredOutput and route through rac_llm_component_generate_structured_stream (which exists in the C API) when structured output is requested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift
around lines 227 - 305, The streaming path in generateStream
(TextGeneration.swift) currently ignores options.structuredOutput and always
calls rac_llm_component_generate_stream; update generateStream to detect
options.structuredOutput and when present build a rac_structured_output_config_t
(same fields set in generateForStructuredOutput: include_schema_in_prompt,
use_grammar, max_retries, fallback and set soConfig.json_schema when
options.structuredOutput.type.jsonSchema exists) and call
rac_llm_component_generate_structured_stream(handle, promptPtr, &cOptions,
&soConfig, &streamCallback) instead of rac_llm_component_generate_stream;
preserve existing systemPrompt handling (set cOptions.system_prompt with
.withCString) and mirror the nested .withCString usage for the schema string so
the structured-stream API receives the schema pointer and config.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (1)
1148-1150: ⚠️ Potential issue | 🟠 Major

generate_from_context still drops trailing multi-byte UTF-8 bytes.

Before the final stop_window emit, partial_utf8_buffer should be flushed (same pattern already used in generate_stream), otherwise trailing codepoints can be truncated.
🧩 Suggested consistency fix
+    // Flush any remaining partial UTF-8 bytes before final emit
+    if (!cancel_requested_.load() && !stop_sequence_hit && !partial_utf8_buffer.empty()) {
+        stop_window.append(partial_utf8_buffer);
+    }
+
     if (!cancel_requested_.load() && !stop_sequence_hit && !stop_window.empty()) {
         generated_text += stop_window;
     }
Based on learnings: “In sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp, generate_from_context is missing the `partial_utf8_buffer` flush before the final stop_window emit (unlike generate_stream which has it). This causes trailing multi-byte codepoints to be silently dropped.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around
lines 1148 - 1150, The function generate_from_context currently appends
stop_window to generated_text without first flushing partial_utf8_buffer, which
can drop trailing multi-byte UTF-8 bytes; modify generate_from_context to follow
the same pattern used in generate_stream by checking/consuming
partial_utf8_buffer (append its contents to generated_text and clear it) before
the final block that checks cancel_requested_ and appends stop_window so that
any pending partial UTF-8 bytes are emitted intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp`:
- Around line 246-261: The code must fail fast if
env->GetStringUTFChars(grammar, ...) returns null: after calling
env->GetStringUTFChars for grammar (grammarStr), detect a null return, release
the previously acquired promptStr via env->ReleaseStringUTFChars(prompt,
promptStr), and immediately return (or propagate an error) instead of continuing
to call rac_llm_llamacpp_generate; update the block around grammar/grammarStr
and the call to rac_llm_llamacpp_generate to mirror the existing null-handling
pattern used for modelPath, prompt, and jsonSchema so a pending Java exception
is respected and native work is not performed when grammar conversion fails.

In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 1408-1422: The translation unit uses undefined LOGW/LOGI in
LlamaCppTextGeneration::convert_json_schema_to_grammar; replace those calls with
the RAC logger macros (use RAC_LOG_WARNING for the LOGW call and RAC_LOG_INFO
for the LOGI call, and consider using RAC_LOG_ERROR if changing the exception
log level) and add the required include for rac/core/rac_logger.h at the top of
the file so the macros are available; update the messages' arguments to match
the RAC_LOG_* macro signatures used elsewhere in the codebase.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp`:
- Around line 562-563: The strdup call that assigns *out_grammar =
strdup(grammar.c_str()) must be checked for failure before returning
RAC_SUCCESS; if strdup returns NULL set *out_grammar to NULL (if not already)
and return an allocation/error code (e.g., RAC_OOM or RAC_FAIL) instead of
RAC_SUCCESS so callers don't see a false success with a null buffer; update the
block containing *out_grammar = strdup(grammar.c_str()) and the subsequent
return RAC_SUCCESS to test the returned pointer and return the appropriate error
code on NULL.

In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp`:
- Around line 803-815: The current branch only checks so_config->use_grammar and
calls rac_llm_json_schema_to_grammar once, so max_retries and fallback are
ignored; change this by wrapping rac_llm_json_schema_to_grammar in a retry loop
that honors so_config->max_retries (retry with a small backoff) and only gives
up after retries, and then consult so_config->fallback: if fallback ==
PROMPT_ONLY, inject the JSON schema into the prompt (e.g., append to
effective_options.prompt or whatever field drives the prompt) so the schema is
used even without grammar, if fallback == NONE/FAIL return an error/propagate
failure instead of silently falling back to unconstrained generation; ensure
when rac_llm_json_schema_to_grammar succeeds you still set
effective_options.grammar = grammar_str as now, and when it ultimately fails you
perform the chosen fallback path so PROMPT_ONLY actually takes effect.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift`:
- Around line 570-582: The initializer for the structured generation config
allows negative maxRetries which can lead to invalid runtime behavior; in the
public init(type: Generatable.Type, includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback
= .retry) validate maxRetries at init time (e.g., ensure >= 0 or clamp to a
minimum) and handle invalid values by either throwing/preconditionFailure or
assigning a safe default; update the init in LLMTypes.swift to check the
maxRetries parameter before assigning to self.maxRetries and document the chosen
behavior.

In
`@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+StructuredOutput.ts:
- Around line 125-127: The code writes config.maxRetries directly into WASM
using m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'),
which allows NaN/Infinity/negative/fractional values to be coerced unexpectedly;
normalize and clamp maxRetries to a safe integer before writing (e.g., coerce to
Number, if NaN/!isFinite use default 3, clamp to a minimum 0 and Math.floor to
remove fractions) and apply the same normalization inside the validate(...)
logic as well (update all occurrences that set soConf.maxRetries including the
other block at the 210-212 equivalent to use the normalized value).

---

Outside diff comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 1148-1150: The function generate_from_context currently appends
stop_window to generated_text without first flushing partial_utf8_buffer, which
can drop trailing multi-byte UTF-8 bytes; modify generate_from_context to follow
the same pattern used in generate_stream by checking/consuming
partial_utf8_buffer (append its contents to generated_text and clear it) before
the final block that checks cancel_requested_ and appends stop_window so that
any pending partial UTF-8 bytes are emitted intact.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift:
- Around line 268-326: The native rac_llm_result_t (llmResult) is never freed,
leaking native memory; ensure you call rac_llm_result_free(&llmResult) before
every exit from the function — both the success return that constructs
LLMGenerationResult and any early throw when generateResult != RAC_SUCCESS. The
simplest fix is to register a defer immediately after creating llmResult (var
llmResult = rac_llm_result_t()) that calls rac_llm_result_free(&llmResult), so
the result is freed automatically even on error, and then proceed to copy
llmResult.text and build the LLMGenerationResult as before.
- Around line 227-305: The streaming path in generateStream
(TextGeneration.swift) currently ignores options.structuredOutput and always
calls rac_llm_component_generate_stream; update generateStream to detect
options.structuredOutput and when present build a rac_structured_output_config_t
(same fields set in generateForStructuredOutput: include_schema_in_prompt,
use_grammar, max_retries, fallback and set soConfig.json_schema when
options.structuredOutput.type.jsonSchema exists) and call
rac_llm_component_generate_structured_stream(handle, promptPtr, &cOptions,
&soConfig, &streamCallback) instead of rac_llm_component_generate_stream;
preserve existing systemPrompt handling (set cOptions.system_prompt with
.withCString) and mirror the nested .withCString usage for the schema string so
the structured-stream API receives the schema pointer and config.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f193adfc-1c6b-4ea5-a3bf-e2b1cffd846a

📥 Commits

Reviewing files that changed from the base of the PR and between b81095e and 446d592.

📒 Files selected for processing (26)

sdk/runanywhere-commons/exports/RACommons.exports
sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_component.h
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_service.h
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.h
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h
sdk/runanywhere-commons/src/backends/llamacpp/rac_backend_llamacpp_register.cpp
sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
sdk/runanywhere-commons/src/features/llm/llm_component.cpp
sdk/runanywhere-commons/src/features/llm/rac_llm_service.cpp
sdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cpp
sdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cpp
sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.kt
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/LLM/LLMTypes.kt
sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_component.h
sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_service.h
sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_types.h
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift
sdk/runanywhere-web/packages/core/src/Foundation/StructOffsets.ts
sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.ts
sdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppOffsets.ts
sdk/runanywhere-web/packages/llamacpp/src/index.ts
sdk/runanywhere-web/wasm/src/wasm_exports.cpp

coderabbitai · 2026-04-14T23:04:31Z

+    if (grammar != nullptr) {
+        grammarStr = env->GetStringUTFChars(grammar, nullptr);
+        if (grammarStr && grammarStr[0] != '\0') {
+            options.grammar = grammarStr;
+        }
+    }
+
    rac_llm_result_t result = {};
    rac_result_t status = rac_llm_llamacpp_generate(
        reinterpret_cast<rac_handle_t>(handle),
        promptStr, &options, &result);

    env->ReleaseStringUTFChars(prompt, promptStr);
+    if (grammarStr) {
+        env->ReleaseStringUTFChars(grammar, grammarStr);
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify all JNI string acquisitions in this file and inspect local null-handling branches. rg -n -C3 'GetStringUTFChars\(' sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 1146

Fail fast when GetStringUTFChars for grammar returns null.

At Line 247, GetStringUTFChars can fail and leave a pending Java exception. The current flow skips grammar assignment but continues native work; it should return immediately after releasing promptStr, consistent with how modelPath (line 174), prompt (line 231), and jsonSchema (line 293) are handled elsewhere in the file.

🔧 Suggested fix

const char* grammarStr = nullptr; if (grammar != nullptr) { grammarStr = env->GetStringUTFChars(grammar, nullptr); + if (!grammarStr) { + env->ReleaseStringUTFChars(prompt, promptStr); + LOGe("nativeGenerate: Failed to get grammar"); + return nullptr; + } if (grammarStr[0] != '\0') { options.grammar = grammarStr; } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp` around lines 246 - 261, The code must fail fast if env->GetStringUTFChars(grammar, ...) returns null: after calling env->GetStringUTFChars for grammar (grammarStr), detect a null return, release the previously acquired promptStr via env->ReleaseStringUTFChars(prompt, promptStr), and immediately return (or propagate an error) instead of continuing to call rac_llm_llamacpp_generate; update the block around grammar/grammarStr and the call to rac_llm_llamacpp_generate to mirror the existing null-handling pattern used for modelPath, prompt, and jsonSchema so a pending Java exception is respected and native work is not performed when grammar conversion fails.

coderabbitai · 2026-04-14T23:04:31Z

+std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {
+    if (json_schema.empty()) {
+        LOGW("convert_json_schema_to_grammar: empty schema");
+        return "";
+    }
+
+    try {
+        auto schema = nlohmann::ordered_json::parse(json_schema);
+        std::string grammar = json_schema_to_grammar(schema);
+        LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
+             json_schema.size(), grammar.size());
+        return grammar;
+    } catch (const std::exception& e) {
+        LOGW("Failed to convert JSON schema to GBNF: %s", e.what());
+        return "";


⚠️ Potential issue | 🔴 Critical

Build blocker: LOGW/LOGI are undefined in this translation unit.

This currently fails compilation (as confirmed by the pipeline). Use the RAC logger macros directly here.

🛠️ Compile-fix patch

std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) { if (json_schema.empty()) { - LOGW("convert_json_schema_to_grammar: empty schema"); + RAC_LOG_WARNING("LLM.LlamaCpp", "convert_json_schema_to_grammar: empty schema"); return ""; } try { auto schema = nlohmann::ordered_json::parse(json_schema); std::string grammar = json_schema_to_grammar(schema); - LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)", - json_schema.size(), grammar.size()); + RAC_LOG_INFO("LLM.LlamaCpp", + "Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)", + json_schema.size(), grammar.size()); return grammar; } catch (const std::exception& e) { - LOGW("Failed to convert JSON schema to GBNF: %s", e.what()); + RAC_LOG_WARNING("LLM.LlamaCpp", "Failed to convert JSON schema to GBNF: %s", e.what()); return ""; } }

As per coding guidelines: “sdk/runanywhere-commons/**/*.{cpp,cc,cxx}: Logging must use the RAC logger API (RAC_LOG_INFO, RAC_LOG_ERROR, RAC_LOG_WARNING, RAC_LOG_DEBUG) via rac/core/rac_logger.h”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {

if (json_schema.empty()) {

LOGW("convert_json_schema_to_grammar: empty schema");

return "";

}

try {

auto schema = nlohmann::ordered_json::parse(json_schema);

std::string grammar = json_schema_to_grammar(schema);

LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",

json_schema.size(), grammar.size());

return grammar;

} catch (const std::exception& e) {

LOGW("Failed to convert JSON schema to GBNF: %s", e.what());

return "";

std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {

if (json_schema.empty()) {

RAC_LOG_WARNING("LLM.LlamaCpp", "convert_json_schema_to_grammar: empty schema");

return "";

}

try {

auto schema = nlohmann::ordered_json::parse(json_schema);

std::string grammar = json_schema_to_grammar(schema);

RAC_LOG_INFO("LLM.LlamaCpp",

"Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",

json_schema.size(), grammar.size());

return grammar;

} catch (const std::exception& e) {

RAC_LOG_WARNING("LLM.LlamaCpp", "Failed to convert JSON schema to GBNF: %s", e.what());

return "";

}

}

🧰 Tools

🪛 GitHub Actions: Build and Release Backends

[error] 1410-1410: Build failed: use of undeclared identifier 'LOGW' (llamacpp_backend.cpp:1410).

[error] 1417-1417: Build failed: use of undeclared identifier 'LOGI' (llamacpp_backend.cpp:1417).

[error] 1421-1421: Build failed: use of undeclared identifier 'LOGW' (llamacpp_backend.cpp:1421).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around lines 1408 - 1422, The translation unit uses undefined LOGW/LOGI in LlamaCppTextGeneration::convert_json_schema_to_grammar; replace those calls with the RAC logger macros (use RAC_LOG_WARNING for the LOGW call and RAC_LOG_INFO for the LOGI call, and consider using RAC_LOG_ERROR if changing the exception log level) and add the required include for rac/core/rac_logger.h at the top of the file so the macros are available; update the messages' arguments to match the RAC_LOG_* macro signatures used elsewhere in the codebase.

coderabbitai · 2026-04-14T23:04:31Z

+        *out_grammar = strdup(grammar.c_str());
+        return RAC_SUCCESS;


⚠️ Potential issue | 🟠 Major

Check strdup before returning success.

If strdup fails here, the function still returns RAC_SUCCESS with *out_grammar == nullptr. Callers will treat the conversion as successful and then mis-handle a null grammar buffer.

💡 Proposed fix

- *out_grammar = strdup(grammar.c_str()); - return RAC_SUCCESS; + *out_grammar = strdup(grammar.c_str()); + if (*out_grammar == nullptr) { + rac_error_set_details("Out of memory duplicating GBNF grammar"); + return RAC_ERROR_OUT_OF_MEMORY; + } + return RAC_SUCCESS;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

*out_grammar = strdup(grammar.c_str());

return RAC_SUCCESS;

*out_grammar = strdup(grammar.c_str());

if (*out_grammar == nullptr) {

rac_error_set_details("Out of memory duplicating GBNF grammar");

return RAC_ERROR_OUT_OF_MEMORY;

}

return RAC_SUCCESS;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp` around lines 562 - 563, The strdup call that assigns *out_grammar = strdup(grammar.c_str()) must be checked for failure before returning RAC_SUCCESS; if strdup returns NULL set *out_grammar to NULL (if not already) and return an allocation/error code (e.g., RAC_OOM or RAC_FAIL) instead of RAC_SUCCESS so callers don't see a false success with a null buffer; update the block containing *out_grammar = strdup(grammar.c_str()) and the subsequent return RAC_SUCCESS to test the returned pointer and return the appropriate error code on NULL.

coderabbitai · 2026-04-14T23:04:31Z

+    std::lock_guard<std::mutex> lock(component->mtx);
+
+    // Get service from lifecycle manager
+    rac_handle_t service = nullptr;
+    rac_result_t result = rac_lifecycle_require_service(component->lifecycle, &service);
+    if (result != RAC_SUCCESS) {
+        log_error("LLM.Component", "No model loaded - cannot generate structured");
+        return result;
+    }
+
+    // Use provided options or defaults
+    const rac_llm_options_t* base_options = options ? options : &component->default_options;
+
+    // Build effective options with grammar if requested
+    rac_llm_options_t effective_options = *base_options;
+
+    char* grammar_str = nullptr;
+    if (so_config->use_grammar && so_config->json_schema != nullptr) {
+        // Convert JSON schema to GBNF grammar via backend
+        result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
+        if (result == RAC_SUCCESS && grammar_str) {
+            effective_options.grammar = grammar_str;
+            RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
+        } else {
+            RAC_LOG_WARNING("LLM.Component",
+                            "Grammar conversion failed (result=%d), falling back to prompt-only",
+                            result);
+            // Proceed without grammar — prompt-only fallback
+        }
+    }
+
+    // Delegate to standard generate with grammar-augmented options
+    result = rac_llm_component_generate(handle, prompt, &effective_options, out_result);


⚠️ Potential issue | 🔴 Critical

Avoid re-entering component->mtx through the base generate paths.

Line 786 and Line 840 lock component->mtx, then Line 818 and Lines 873-874 call rac_llm_component_generate*, which lock the same std::mutex again at Line 336 and Line 582. Because std::mutex is non-recursive, both structured entry points deadlock on the first call. This needs a shared internal helper or another service-pinning path that performs generation without reacquiring the component mutex.

Also applies to: 840-874

coderabbitai · 2026-04-14T23:04:31Z

+    if (so_config->use_grammar && so_config->json_schema != nullptr) {
+        // Convert JSON schema to GBNF grammar via backend
+        result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
+        if (result == RAC_SUCCESS && grammar_str) {
+            effective_options.grammar = grammar_str;
+            RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
+        } else {
+            RAC_LOG_WARNING("LLM.Component",
+                            "Grammar conversion failed (result=%d), falling back to prompt-only",
+                            result);
+            // Proceed without grammar — prompt-only fallback
+        }
+    }


⚠️ Potential issue | 🟠 Major

Implement the new fallback/retry contract before exposing it.

These branches only consult so_config->use_grammar. If schema→grammar conversion fails, Line 810-Line 814 and Line 866-Line 868 always fall through to generation with the original prompt, so the newly added rac_structured_output_config_t.max_retries and .fallback fields never affect behavior, and PROMPT_ONLY never actually injects the schema into the prompt. On backends without grammar support, this silently returns unconstrained text from an API that is supposed to be structured.

Also applies to: 860-874

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp` around lines 803 - 815, The current branch only checks so_config->use_grammar and calls rac_llm_json_schema_to_grammar once, so max_retries and fallback are ignored; change this by wrapping rac_llm_json_schema_to_grammar in a retry loop that honors so_config->max_retries (retry with a small backoff) and only gives up after retries, and then consult so_config->fallback: if fallback == PROMPT_ONLY, inject the JSON schema into the prompt (e.g., append to effective_options.prompt or whatever field drives the prompt) so the schema is used even without grammar, if fallback == NONE/FAIL return an error/propagate failure instead of silently falling back to unconstrained generation; ensure when rac_llm_json_schema_to_grammar succeeds you still set effective_options.grammar = grammar_str as now, and when it ultimately fails you perform the chosen fallback path so PROMPT_ONLY actually takes effect.

coderabbitai · 2026-04-14T23:04:31Z

    public init(
        type: Generatable.Type,
-        includeSchemaInPrompt: Bool = true
+        includeSchemaInPrompt: Bool = true,
+        useGrammar: Bool = true,
+        maxRetries: Int = 3,
+        fallback: StructuredOutputFallback = .retry
    ) {
        self.type = type
        self.includeSchemaInPrompt = includeSchemaInPrompt
+        self.useGrammar = useGrammar
+        self.maxRetries = maxRetries
+        self.fallback = fallback
    }


⚠️ Potential issue | 🟡 Minor

Validate maxRetries bounds at init time.

Line 574/Line 580 currently accept negative retry counts, which can propagate invalid runtime behavior into structured generation.

🔧 Suggested guard

public init( type: Generatable.Type, includeSchemaInPrompt: Bool = true, useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback = .retry ) { + precondition(maxRetries >= 0, "maxRetries must be non-negative") self.type = type self.includeSchemaInPrompt = includeSchemaInPrompt self.useGrammar = useGrammar self.maxRetries = maxRetries self.fallback = fallback }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

public init(

type: Generatable.Type,

includeSchemaInPrompt: Bool = true

includeSchemaInPrompt: Bool = true,

useGrammar: Bool = true,

maxRetries: Int = 3,

fallback: StructuredOutputFallback = .retry

) {

self.type = type

self.includeSchemaInPrompt = includeSchemaInPrompt

self.useGrammar = useGrammar

self.maxRetries = maxRetries

self.fallback = fallback

}

public init(

type: Generatable.Type,

includeSchemaInPrompt: Bool = true,

useGrammar: Bool = true,

maxRetries: Int = 3,

fallback: StructuredOutputFallback = .retry

) {

precondition(maxRetries >= 0, "maxRetries must be non-negative")

self.type = type

self.includeSchemaInPrompt = includeSchemaInPrompt

self.useGrammar = useGrammar

self.maxRetries = maxRetries

self.fallback = fallback

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift` around lines 570 - 582, The initializer for the structured generation config allows negative maxRetries which can lead to invalid runtime behavior; in the public init(type: Generatable.Type, includeSchemaInPrompt: Bool = true, useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback = .retry) validate maxRetries at init time (e.g., ensure >= 0 or clamp to a minimum) and handle invalid values by either throwing/preconditionFailure or assigning a safe default; update the init in LLMTypes.swift to check the maxRetries parameter before assigning to self.maxRetries and document the chosen behavior.

coderabbitai · 2026-04-14T23:04:31Z

+    m.setValue(configPtr + soConf.useGrammar, (config.useGrammar !== false) ? 1 : 0, 'i32');
+    m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32');
+    m.setValue(configPtr + soConf.fallback, config.fallback ?? StructuredOutputFallback.Retry, 'i32');


⚠️ Potential issue | 🟡 Minor

Normalize maxRetries before writing it into WASM.

config.maxRetries ?? 3 still lets NaN, Infinity, negative, and fractional values through, and setValue(..., 'i32') will silently coerce them. That can turn an invalid JS value into an unintended native retry count.

💡 Proposed fix

+ const maxRetries = + Number.isFinite(config.maxRetries) + ? Math.max(0, Math.trunc(config.maxRetries as number)) + : 3; + m.setValue(configPtr + soConf.useGrammar, (config.useGrammar !== false) ? 1 : 0, 'i32'); - m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'); + m.setValue(configPtr + soConf.maxRetries, maxRetries, 'i32'); m.setValue(configPtr + soConf.fallback, config.fallback ?? StructuredOutputFallback.Retry, 'i32');

Apply the same normalization in validate(...) too.

Also applies to: 210-212

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+StructuredOutput.ts around lines 125 - 127, The code writes config.maxRetries directly into WASM using m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'), which allows NaN/Infinity/negative/fractional values to be coerced unexpectedly; normalize and clamp maxRetries to a safe integer before writing (e.g., coerce to Number, if NaN/!isFinite use default 3, clamp to a minimum 0 and Math.floor to remove fractions) and apply the same normalization inside the validate(...) logic as well (update all occurrences that set soConf.maxRetries including the other block at the 210-212 equivalent to use the normalized value).

greptile-apps Bot reviewed Apr 14, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

		jstring result = env->NewStringUTF(grammarOut);
		free(grammarOut);

Conversation

sanchitmonga22 commented Apr 14, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Platform-Specific Testing (check all that apply)

Labels

Checklist

Screenshots

Summary by CodeRabbit

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

sanchitmonga22 commented Apr 14, 2026

Uh oh!

sanchitmonga22 commented Apr 14, 2026

Uh oh!

coderabbitai Bot commented Apr 14, 2026

Uh oh!

greptile-apps Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sanchitmonga22 commented Apr 14, 2026 •

edited by greptile-apps Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading