Skip to content

feat: add grammar-constrained structured output across SDKs#468

Draft
sanchitmonga22 wants to merge 1 commit intomainfrom
feat/structured-output-grammar
Draft

feat: add grammar-constrained structured output across SDKs#468
sanchitmonga22 wants to merge 1 commit intomainfrom
feat/structured-output-grammar

Conversation

@sanchitmonga22
Copy link
Copy Markdown
Contributor

@sanchitmonga22 sanchitmonga22 commented Apr 14, 2026

Introduces GBNF grammar-constrained decoding for guaranteed valid JSON output matching a developer schema. Implementation layered through the C++ core, with bindings exposed in all platform SDKs.

  • Commons (C++): llamacpp grammar sampler at head of sampler chain, JSON Schema → GBNF converter, new component-level structured generate/stream APIs, vtable op json_schema_to_grammar, updated platform backend vtable with explicit NULL entries
  • JNI: wire grammar field through commons + llamacpp bridges
  • Swift: StructuredOutputFallback, extended StructuredOutputConfig, generate via rac_llm_component_generate_structured
  • Web: WASM offset helpers for new fields, TypeScript types updated, StructuredOutputFallback exported
  • Kotlin: StructuredOutputFallback enum, extended config, LlamaCPPBridge JNI declarations for direct LLM ops + schema-to-grammar

Description

Brief description of the changes made.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactoring

Testing

  • Lint passes locally
  • Added/updated tests for changes

Platform-Specific Testing (check all that apply)

Swift SDK / iOS Sample:

  • Tested on iPhone (Simulator or Device)
  • Tested on iPad / Tablet
  • Tested on Mac (macOS target)

Kotlin SDK / Android Sample:

  • Tested on Android Phone (Emulator or Device)
  • Tested on Android Tablet

Flutter SDK / Flutter Sample:

  • Tested on iOS
  • Tested on Android

React Native SDK / React Native Sample:

  • Tested on iOS
  • Tested on Android

Playground:

  • Tested on target platform
  • Verified no regressions in existing Playground projects
    Web SDK / Web Sample:
  • Tested in Chrome (Desktop)
  • Tested in Firefox
  • Tested in Safari
  • WASM backends load (LlamaCpp + ONNX)
  • OPFS storage persistence verified (survives page refresh)
  • Settings persistence verified (localStorage)

Labels

Please add the appropriate label(s):

SDKs:

  • Swift SDK - Changes to Swift SDK (sdk/runanywhere-swift)
  • Kotlin SDK - Changes to Kotlin SDK (sdk/runanywhere-kotlin)
  • Flutter SDK - Changes to Flutter SDK (sdk/runanywhere-flutter)
  • React Native SDK - Changes to React Native SDK (sdk/runanywhere-react-native)
  • Web SDK - Changes to Web SDK (sdk/runanywhere-web)
  • Commons - Changes to shared native code (sdk/runanywhere-commons)

Sample Apps:

  • iOS Sample - Changes to iOS example app (examples/ios)
  • Android Sample - Changes to Android example app (examples/android)
  • Flutter Sample - Changes to Flutter example app (examples/flutter)
  • React Native Sample - Changes to React Native example app (examples/react-native)
  • Web Sample - Changes to Web example app (examples/web)

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)

Screenshots

Attach relevant UI screenshots for changes (if applicable):

  • Mobile (Phone)
  • Tablet / iPad
  • Desktop / Mac

Summary by CodeRabbit

  • New Features
    • Grammar-constrained structured output generation now available, enabling models to produce outputs that strictly conform to specified JSON schemas.
    • Added automatic JSON Schema to GBNF grammar conversion support for all backends.
    • New configuration options: fallback strategies (raw output, retry, prompt-only) and retry limits for grammar-constrained generation.

Greptile Summary

This PR wires GBNF grammar-constrained decoding through the full stack — C++ core, JNI, Swift, Kotlin, and Web — to enable guaranteed-valid JSON output matching a developer schema. The C++ backend integration (grammar sampler placement, JSON Schema → GBNF conversion, vtable extension) is well-structured, but the new component-level structured-output APIs in llm_component.cpp have a critical concurrency defect.

  • P0 deadlock: Both rac_llm_component_generate_structured and _stream acquire component->mtx, then immediately delegate to rac_llm_component_generate / rac_llm_component_generate_stream, which unconditionally re-acquire the same non-recursive std::mutex. Every call through the new API will hang indefinitely.
  • P1 silent no-op: max_retries and fallback are documented in the public API and set by all four platform SDKs, but neither field is read anywhere in the C++ implementation; the advertised retry-on-failure behaviour is never executed.

Confidence Score: 4/5

Not safe to merge — the new structured-output component APIs will deadlock on every call due to non-recursive mutex re-acquisition.

The underlying backend work (grammar sampler, schema converter, vtable wiring, platform SDK types) is solid, but the glue layer in llm_component.cpp introduces a guaranteed deadlock that makes both new exported functions completely unusable. Once the mutex re-acquisition is fixed and the max_retries/fallback fields are either implemented or documented as future work, the PR can be merged.

sdk/runanywhere-commons/src/features/llm/llm_component.cpp — both new structured-output functions deadlock; max_retries and fallback are silently dropped.

Important Files Changed

Filename Overview
sdk/runanywhere-commons/src/features/llm/llm_component.cpp Adds rac_llm_component_generate_structured and _stream — both acquire component->mtx then delegate to generate/generate_stream which re-acquire the same non-recursive mutex, causing a guaranteed deadlock; max_retries and fallback fields are also silently ignored.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp Adds grammar sampler to the sampler chain (correctly placed first before temperature/top-p/top-k samplers), adds convert_json_schema_to_grammar using llama.cpp's built-in converter, and caches grammar string for sampler invalidation — implementation looks correct.
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp Wires grammar through nativeGenerate and adds nativeJsonSchemaToGrammar; JNI string lifecycle is handled correctly, but free() is used instead of rac_free() for the allocated grammar string.
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.h Adds grammar field to rac_llm_options_t, rac_structured_output_fallback_t enum, and extends rac_structured_output_config_t — struct layout and defaults look correct.
sdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cpp Adds explicit NULL entries for all optional vtable ops including json_schema_to_grammar — prevents undefined behaviour from uninitialized function pointers.
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift Migrates generateForStructuredOutput to call rac_llm_component_generate_structured with correct C struct mapping; will deadlock at the C layer, but Swift-side code itself is structurally sound.
sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.kt Adds JNI declarations for direct LlamaCPP ops (nativeCreate, nativeDestroy, nativeGenerate with grammar, nativeJsonSchemaToGrammar, nativeCancel, nativeGetModelInfo) — signatures are consistent with the JNI C implementations.
sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.ts Adds StructuredOutputFallback enum and wires useGrammar/maxRetries/fallback into the WASM struct memory — offset names are consistent with the new wasm_exports.cpp helpers.
sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp Adds grammar field wiring in generate/generate_stream/generate_from_context C-API paths and implements rac_llm_llamacpp_json_schema_to_grammar — implementation is clean and handles null/error cases.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant generate_structured
    participant mtx as component->mtx
    participant generate

    Caller->>generate_structured: rac_llm_component_generate_structured()
    generate_structured->>mtx: lock_guard acquire ✓
    generate_structured->>generate_structured: rac_llm_json_schema_to_grammar()
    generate_structured->>generate: rac_llm_component_generate()
    generate->>mtx: lock_guard acquire ✗ DEADLOCK
    Note over generate,mtx: std::mutex is non-recursive — hangs forever
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 786-818

Comment:
**Guaranteed deadlock via non-recursive mutex re-acquisition**

`rac_llm_component_generate_structured` acquires `component->mtx` (line 786) and then delegates to `rac_llm_component_generate` (line 818). `rac_llm_component_generate` unconditionally acquires the same `component->mtx` (line 336). Because `std::mutex` is not recursive, this will deadlock every time the function is called.

The same problem occurs in `rac_llm_component_generate_structured_stream` (line 840) → `rac_llm_component_generate_stream` (line 582), which also immediately acquires `component->mtx`.

The fix is to extract the mutex-free core logic of `generate` and `generate_stream` into internal helpers (e.g., `generate_locked`) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 800-825

Comment:
**`max_retries` and `fallback` silently ignored**

The public API exposes `so_config->max_retries` (default 3) and `so_config->fallback` (default `RETRY`), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in `rac_llm_component_generate_structured` or `rac_llm_component_generate_structured_stream`. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the `RETRY` default is never honoured.

Callers relying on `fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY` or a non-zero `max_retries` to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 311-312

Comment:
**`free` instead of `rac_free` violates API contract**

Both `rac_llm_llamacpp_json_schema_to_grammar` and `rac_llm_json_schema_to_grammar` document their output pointer as "caller must free with `rac_free()`". Here and in `llm_component.cpp` lines 822 and 878, the raw `free()` is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.

```suggestion
    jstring result = env->NewStringUTF(grammarOut);
    rac_free(grammarOut);
    return result;
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: add grammar-constrained structured..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

Introduces GBNF grammar-constrained decoding for guaranteed valid JSON
output matching a developer schema. Implementation layered through the
C++ core, with bindings exposed in all platform SDKs.

- Commons (C++): llamacpp grammar sampler at head of sampler chain,
  JSON Schema → GBNF converter, new component-level structured
  generate/stream APIs, vtable op json_schema_to_grammar, updated
  platform backend vtable with explicit NULL entries
- JNI: wire grammar field through commons + llamacpp bridges
- Swift: StructuredOutputFallback, extended StructuredOutputConfig,
  generate via rac_llm_component_generate_structured
- Web: WASM offset helpers for new fields, TypeScript types updated,
  StructuredOutputFallback exported
- Kotlin: StructuredOutputFallback enum, extended config, LlamaCPPBridge
  JNI declarations for direct LLM ops + schema-to-grammar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

This pull request adds grammar-constrained structured output generation across the RunAnywhere SDK. It introduces new public C APIs for structured generation with JSON schema validation, extends the LLM options and configuration types to support GBNF grammar parameters, implements JSON-to-grammar conversion in the LlamaCPP backend, and provides language bindings for Kotlin, Swift, and Web platforms.

Changes

Cohort / File(s) Summary
C API Type Definitions
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.h, sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_types.h
Extended rac_llm_options_t with grammar field for GBNF constraints; added rac_structured_output_fallback_t enum and extended rac_structured_output_config_t with use_grammar, max_retries, and fallback fields.
C API Service Layer
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_service.h, sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_service.h, sdk/runanywhere-commons/src/features/llm/rac_llm_service.cpp
Added vtable operation json_schema_to_grammar to rac_llm_service_ops_t and new public function rac_llm_json_schema_to_grammar for dispatching grammar conversion to backend implementations.
C API Component Layer
sdk/runanywhere-commons/include/rac/features/llm/rac_llm_component.h, sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_component.h, sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Added rac_llm_component_generate_structured and _stream variants that accept rac_structured_output_config_t, perform schema-to-grammar conversion, inject grammar into options, and delegate to existing generation functions.
LlamaCPP Backend C API
sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h, sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
Added rac_llm_llamacpp_json_schema_to_grammar function that wraps backend JSON schema conversion and allocates output grammar string.
LlamaCPP Backend Implementation
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h, sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp, sdk/runanywhere-commons/src/backends/llamacpp/rac_backend_llamacpp_register.cpp
Implemented convert_json_schema_to_grammar method using json-schema-to-grammar.h; added grammar sampler initialization and caching in generate_stream and generate_from_context; wired grammar parameter through text generation request; added vtable adapter for service dispatch.
LlamaCPP JNI Bindings
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp, sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.kt
Updated nativeGenerate JNI signature to accept jstring grammar; added nativeJsonSchemaToGrammar JNI method; exposed new Kotlin native functions for direct model lifecycle and inference operations including grammar parameter.
Kotlin SDK Types
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/LLM/LLMTypes.kt
Added StructuredOutputFallback enum and extended StructuredOutputConfig with useGrammar, maxRetries, and fallback properties.
JNI Option Parsing
sdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cpp
Updated option parsing in racLlmComponentGenerate and stream variants to recognize optional grammar field from JSON configuration.
Swift SDK Types
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift
Added StructuredOutputFallback enum; extended StructuredOutputConfig with useGrammar, maxRetries, and fallback properties and updated initializer.
Swift Implementation
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift
Updated structured output generation to use rac_llm_component_generate_structured instead of regular generate; construct config with grammar toggle, retry count, and fallback mode.
Web/WASM Support
sdk/runanywhere-web/packages/core/src/Foundation/StructOffsets.ts, sdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppOffsets.ts, sdk/runanywhere-web/wasm/src/wasm_exports.cpp
Extended struct offset interfaces with grammar, useGrammar, maxRetries, fallback fields; added WASM helper functions to expose field offsets for JavaScript memory access.
Web SDK Types & Extensions
sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.ts, sdk/runanywhere-web/packages/llamacpp/src/index.ts
Added StructuredOutputFallback enum; extended StructuredOutputConfig interface with useGrammar, maxRetries, fallback fields; updated config preparation and validation to write new fields into WASM struct.
Platform Backend
sdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cpp
Explicitly initialized additional vtable operations to nullptr including json_schema_to_grammar, marking them as unsupported for the platform backend.
Export Manifest
sdk/runanywhere-commons/exports/RACommons.exports
Added exported symbols: _rac_llm_component_generate_structured, _rac_llm_component_generate_structured_stream, _rac_llm_json_schema_to_grammar.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Component as LLM Component
    participant Service as LLM Service
    participant Backend as LlamaCPP Backend
    participant Converter as Grammar Converter

    Client->>Component: rac_llm_component_generate_structured(prompt, options, schema_config)
    activate Component
    Component->>Component: Validate inputs & config
    
    alt use_grammar enabled
        Component->>Service: rac_llm_json_schema_to_grammar(json_schema)
        activate Service
        Service->>Backend: Dispatch to vtable json_schema_to_grammar()
        activate Backend
        Backend->>Converter: convert_json_schema_to_grammar(schema)
        activate Converter
        Converter-->>Backend: GBNF grammar string
        deactivate Converter
        Backend-->>Service: grammar
        deactivate Backend
        Service-->>Component: grammar (caller owns)
        deactivate Service
        
        Component->>Component: Inject grammar into effective_options
    else grammar conversion failed
        Component->>Component: Log warning, proceed prompt-only
    end
    
    Component->>Service: rac_llm_component_generate(prompt, effective_options)
    activate Service
    Service->>Backend: Forward to backend generate
    activate Backend
    Backend->>Backend: Initialize grammar sampler if grammar present
    Backend->>Backend: Add grammar to sampler chain
    Backend->>Backend: Generate tokens constrained by grammar
    Backend-->>Service: Result
    deactivate Backend
    Service-->>Component: Result
    deactivate Service
    
    Component->>Component: Free allocated grammar
    Component-->>Client: Result
    deactivate Component
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #382: Implements structured-output support for Flutter SDK with overlapping changes to core structured output types and generation wiring.
  • PR #340: Modifies LlamaCPP backend and C API sampler/parameter handling, intersecting with grammar parameter plumbing.

Suggested labels

enhancement, WIP

Suggested reviewers

  • shubhammalhotra28

Poem

🐰 A grammar takes flight, constraining the way,
JSON schemas bloom in GBNF display,
Samplers align with structured delight,
Across all the platforms, the tokens shine bright! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description includes a detailed summary of changes but the template sections (Type of Change, Testing, Labels, Checklist) are mostly unchecked and incomplete. Mark 'New feature' checkbox under Type of Change, check relevant SDK labels (Commons, Swift, Kotlin, Web), and complete testing and checklist sections with actual results.
Docstring Coverage ⚠️ Warning Docstring coverage is 44.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely summarizes the main feature: adding grammar-constrained structured output across SDKs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/structured-output-grammar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sanchitmonga22
Copy link
Copy Markdown
Contributor Author

@greptile please review

@sanchitmonga22
Copy link
Copy Markdown
Contributor Author

@coderabbitai please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

@sanchitmonga22: Sure! I'll start reviewing the PR now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment on lines +786 to +818
std::lock_guard<std::mutex> lock(component->mtx);

// Get service from lifecycle manager
rac_handle_t service = nullptr;
rac_result_t result = rac_lifecycle_require_service(component->lifecycle, &service);
if (result != RAC_SUCCESS) {
log_error("LLM.Component", "No model loaded - cannot generate structured");
return result;
}

// Use provided options or defaults
const rac_llm_options_t* base_options = options ? options : &component->default_options;

// Build effective options with grammar if requested
rac_llm_options_t effective_options = *base_options;

char* grammar_str = nullptr;
if (so_config->use_grammar && so_config->json_schema != nullptr) {
// Convert JSON schema to GBNF grammar via backend
result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
if (result == RAC_SUCCESS && grammar_str) {
effective_options.grammar = grammar_str;
RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
} else {
RAC_LOG_WARNING("LLM.Component",
"Grammar conversion failed (result=%d), falling back to prompt-only",
result);
// Proceed without grammar — prompt-only fallback
}
}

// Delegate to standard generate with grammar-augmented options
result = rac_llm_component_generate(handle, prompt, &effective_options, out_result);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Guaranteed deadlock via non-recursive mutex re-acquisition

rac_llm_component_generate_structured acquires component->mtx (line 786) and then delegates to rac_llm_component_generate (line 818). rac_llm_component_generate unconditionally acquires the same component->mtx (line 336). Because std::mutex is not recursive, this will deadlock every time the function is called.

The same problem occurs in rac_llm_component_generate_structured_stream (line 840) → rac_llm_component_generate_stream (line 582), which also immediately acquires component->mtx.

The fix is to extract the mutex-free core logic of generate and generate_stream into internal helpers (e.g., generate_locked) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 786-818

Comment:
**Guaranteed deadlock via non-recursive mutex re-acquisition**

`rac_llm_component_generate_structured` acquires `component->mtx` (line 786) and then delegates to `rac_llm_component_generate` (line 818). `rac_llm_component_generate` unconditionally acquires the same `component->mtx` (line 336). Because `std::mutex` is not recursive, this will deadlock every time the function is called.

The same problem occurs in `rac_llm_component_generate_structured_stream` (line 840) → `rac_llm_component_generate_stream` (line 582), which also immediately acquires `component->mtx`.

The fix is to extract the mutex-free core logic of `generate` and `generate_stream` into internal helpers (e.g., `generate_locked`) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +800 to +825
rac_llm_options_t effective_options = *base_options;

char* grammar_str = nullptr;
if (so_config->use_grammar && so_config->json_schema != nullptr) {
// Convert JSON schema to GBNF grammar via backend
result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
if (result == RAC_SUCCESS && grammar_str) {
effective_options.grammar = grammar_str;
RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
} else {
RAC_LOG_WARNING("LLM.Component",
"Grammar conversion failed (result=%d), falling back to prompt-only",
result);
// Proceed without grammar — prompt-only fallback
}
}

// Delegate to standard generate with grammar-augmented options
result = rac_llm_component_generate(handle, prompt, &effective_options, out_result);

// Free grammar string if we allocated one
if (grammar_str) {
free(grammar_str);
}

return result;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 max_retries and fallback silently ignored

The public API exposes so_config->max_retries (default 3) and so_config->fallback (default RETRY), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in rac_llm_component_generate_structured or rac_llm_component_generate_structured_stream. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the RETRY default is never honoured.

Callers relying on fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY or a non-zero max_retries to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 800-825

Comment:
**`max_retries` and `fallback` silently ignored**

The public API exposes `so_config->max_retries` (default 3) and `so_config->fallback` (default `RETRY`), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in `rac_llm_component_generate_structured` or `rac_llm_component_generate_structured_stream`. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the `RETRY` default is never honoured.

Callers relying on `fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY` or a non-zero `max_retries` to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +311 to +312
jstring result = env->NewStringUTF(grammarOut);
free(grammarOut);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 free instead of rac_free violates API contract

Both rac_llm_llamacpp_json_schema_to_grammar and rac_llm_json_schema_to_grammar document their output pointer as "caller must free with rac_free()". Here and in llm_component.cpp lines 822 and 878, the raw free() is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.

Suggested change
jstring result = env->NewStringUTF(grammarOut);
free(grammarOut);
jstring result = env->NewStringUTF(grammarOut);
rac_free(grammarOut);
return result;
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 311-312

Comment:
**`free` instead of `rac_free` violates API contract**

Both `rac_llm_llamacpp_json_schema_to_grammar` and `rac_llm_json_schema_to_grammar` document their output pointer as "caller must free with `rac_free()`". Here and in `llm_component.cpp` lines 822 and 878, the raw `free()` is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.

```suggestion
    jstring result = env->NewStringUTF(grammarOut);
    rac_free(grammarOut);
    return result;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift (2)

268-326: ⚠️ Potential issue | 🟠 Major

Free rac_llm_result_t before returning.

This path copies llmResult.text into Swift and then returns without calling rac_llm_result_free(&llmResult), so every structured generation leaks native memory. The same leak can happen on an error path if the native call partially populated the result.

💡 Proposed fix
-        var llmResult = rac_llm_result_t()
+        var llmResult = rac_llm_result_t()
+        defer { rac_llm_result_free(&llmResult) }
         let generateResult: rac_result_t
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift
around lines 268 - 326, The native rac_llm_result_t (llmResult) is never freed,
leaking native memory; ensure you call rac_llm_result_free(&llmResult) before
every exit from the function — both the success return that constructs
LLMGenerationResult and any early throw when generateResult != RAC_SUCCESS. The
simplest fix is to register a defer immediately after creating llmResult (var
llmResult = rac_llm_result_t()) that calls rac_llm_result_free(&llmResult), so
the result is freed automatically even on error, and then proceed to copy
llmResult.text and build the LLMGenerationResult as before.

227-305: ⚠️ Potential issue | 🟠 Major

Fix streaming structured output to use the structured native API.

The blocking generateForStructuredOutput correctly routes through rac_llm_component_generate_structured with structured output config, but generateStream in TextGeneration.swift ignores the structuredOutput field in options and always calls the regular rac_llm_component_generate_stream. This causes useGrammar, maxRetries, fallback, and schema enforcement to be silently ignored for streaming. Modify generateStream to check options.structuredOutput and route through rac_llm_component_generate_structured_stream (which exists in the C API) when structured output is requested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift
around lines 227 - 305, The streaming path in generateStream
(TextGeneration.swift) currently ignores options.structuredOutput and always
calls rac_llm_component_generate_stream; update generateStream to detect
options.structuredOutput and when present build a rac_structured_output_config_t
(same fields set in generateForStructuredOutput: include_schema_in_prompt,
use_grammar, max_retries, fallback and set soConfig.json_schema when
options.structuredOutput.type.jsonSchema exists) and call
rac_llm_component_generate_structured_stream(handle, promptPtr, &cOptions,
&soConfig, &streamCallback) instead of rac_llm_component_generate_stream;
preserve existing systemPrompt handling (set cOptions.system_prompt with
.withCString) and mirror the nested .withCString usage for the schema string so
the structured-stream API receives the schema pointer and config.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (1)

1148-1150: ⚠️ Potential issue | 🟠 Major

generate_from_context still drops trailing multi-byte UTF-8 bytes.

Before the final stop_window emit, partial_utf8_buffer should be flushed (same pattern already used in generate_stream), otherwise trailing codepoints can be truncated.

🧩 Suggested consistency fix
+    // Flush any remaining partial UTF-8 bytes before final emit
+    if (!cancel_requested_.load() && !stop_sequence_hit && !partial_utf8_buffer.empty()) {
+        stop_window.append(partial_utf8_buffer);
+    }
+
     if (!cancel_requested_.load() && !stop_sequence_hit && !stop_window.empty()) {
         generated_text += stop_window;
     }
Based on learnings: “In sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp, generate_from_context is missing the `partial_utf8_buffer` flush before the final stop_window emit (unlike generate_stream which has it). This causes trailing multi-byte codepoints to be silently dropped.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around
lines 1148 - 1150, The function generate_from_context currently appends
stop_window to generated_text without first flushing partial_utf8_buffer, which
can drop trailing multi-byte UTF-8 bytes; modify generate_from_context to follow
the same pattern used in generate_stream by checking/consuming
partial_utf8_buffer (append its contents to generated_text and clear it) before
the final block that checks cancel_requested_ and appends stop_window so that
any pending partial UTF-8 bytes are emitted intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp`:
- Around line 246-261: The code must fail fast if
env->GetStringUTFChars(grammar, ...) returns null: after calling
env->GetStringUTFChars for grammar (grammarStr), detect a null return, release
the previously acquired promptStr via env->ReleaseStringUTFChars(prompt,
promptStr), and immediately return (or propagate an error) instead of continuing
to call rac_llm_llamacpp_generate; update the block around grammar/grammarStr
and the call to rac_llm_llamacpp_generate to mirror the existing null-handling
pattern used for modelPath, prompt, and jsonSchema so a pending Java exception
is respected and native work is not performed when grammar conversion fails.

In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 1408-1422: The translation unit uses undefined LOGW/LOGI in
LlamaCppTextGeneration::convert_json_schema_to_grammar; replace those calls with
the RAC logger macros (use RAC_LOG_WARNING for the LOGW call and RAC_LOG_INFO
for the LOGI call, and consider using RAC_LOG_ERROR if changing the exception
log level) and add the required include for rac/core/rac_logger.h at the top of
the file so the macros are available; update the messages' arguments to match
the RAC_LOG_* macro signatures used elsewhere in the codebase.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp`:
- Around line 562-563: The strdup call that assigns *out_grammar =
strdup(grammar.c_str()) must be checked for failure before returning
RAC_SUCCESS; if strdup returns NULL set *out_grammar to NULL (if not already)
and return an allocation/error code (e.g., RAC_OOM or RAC_FAIL) instead of
RAC_SUCCESS so callers don't see a false success with a null buffer; update the
block containing *out_grammar = strdup(grammar.c_str()) and the subsequent
return RAC_SUCCESS to test the returned pointer and return the appropriate error
code on NULL.

In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp`:
- Around line 803-815: The current branch only checks so_config->use_grammar and
calls rac_llm_json_schema_to_grammar once, so max_retries and fallback are
ignored; change this by wrapping rac_llm_json_schema_to_grammar in a retry loop
that honors so_config->max_retries (retry with a small backoff) and only gives
up after retries, and then consult so_config->fallback: if fallback ==
PROMPT_ONLY, inject the JSON schema into the prompt (e.g., append to
effective_options.prompt or whatever field drives the prompt) so the schema is
used even without grammar, if fallback == NONE/FAIL return an error/propagate
failure instead of silently falling back to unconstrained generation; ensure
when rac_llm_json_schema_to_grammar succeeds you still set
effective_options.grammar = grammar_str as now, and when it ultimately fails you
perform the chosen fallback path so PROMPT_ONLY actually takes effect.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift`:
- Around line 570-582: The initializer for the structured generation config
allows negative maxRetries which can lead to invalid runtime behavior; in the
public init(type: Generatable.Type, includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback
= .retry) validate maxRetries at init time (e.g., ensure >= 0 or clamp to a
minimum) and handle invalid values by either throwing/preconditionFailure or
assigning a safe default; update the init in LLMTypes.swift to check the
maxRetries parameter before assigning to self.maxRetries and document the chosen
behavior.

In
`@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+StructuredOutput.ts:
- Around line 125-127: The code writes config.maxRetries directly into WASM
using m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'),
which allows NaN/Infinity/negative/fractional values to be coerced unexpectedly;
normalize and clamp maxRetries to a safe integer before writing (e.g., coerce to
Number, if NaN/!isFinite use default 3, clamp to a minimum 0 and Math.floor to
remove fractions) and apply the same normalization inside the validate(...)
logic as well (update all occurrences that set soConf.maxRetries including the
other block at the 210-212 equivalent to use the normalized value).

---

Outside diff comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 1148-1150: The function generate_from_context currently appends
stop_window to generated_text without first flushing partial_utf8_buffer, which
can drop trailing multi-byte UTF-8 bytes; modify generate_from_context to follow
the same pattern used in generate_stream by checking/consuming
partial_utf8_buffer (append its contents to generated_text and clear it) before
the final block that checks cancel_requested_ and appends stop_window so that
any pending partial UTF-8 bytes are emitted intact.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift:
- Around line 268-326: The native rac_llm_result_t (llmResult) is never freed,
leaking native memory; ensure you call rac_llm_result_free(&llmResult) before
every exit from the function — both the success return that constructs
LLMGenerationResult and any early throw when generateResult != RAC_SUCCESS. The
simplest fix is to register a defer immediately after creating llmResult (var
llmResult = rac_llm_result_t()) that calls rac_llm_result_free(&llmResult), so
the result is freed automatically even on error, and then proceed to copy
llmResult.text and build the LLMGenerationResult as before.
- Around line 227-305: The streaming path in generateStream
(TextGeneration.swift) currently ignores options.structuredOutput and always
calls rac_llm_component_generate_stream; update generateStream to detect
options.structuredOutput and when present build a rac_structured_output_config_t
(same fields set in generateForStructuredOutput: include_schema_in_prompt,
use_grammar, max_retries, fallback and set soConfig.json_schema when
options.structuredOutput.type.jsonSchema exists) and call
rac_llm_component_generate_structured_stream(handle, promptPtr, &cOptions,
&soConfig, &streamCallback) instead of rac_llm_component_generate_stream;
preserve existing systemPrompt handling (set cOptions.system_prompt with
.withCString) and mirror the nested .withCString usage for the schema string so
the structured-stream API receives the schema pointer and config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f193adfc-1c6b-4ea5-a3bf-e2b1cffd846a

📥 Commits

Reviewing files that changed from the base of the PR and between b81095e and 446d592.

📒 Files selected for processing (26)
  • sdk/runanywhere-commons/exports/RACommons.exports
  • sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h
  • sdk/runanywhere-commons/include/rac/features/llm/rac_llm_component.h
  • sdk/runanywhere-commons/include/rac/features/llm/rac_llm_service.h
  • sdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.h
  • sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
  • sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
  • sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h
  • sdk/runanywhere-commons/src/backends/llamacpp/rac_backend_llamacpp_register.cpp
  • sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
  • sdk/runanywhere-commons/src/features/llm/llm_component.cpp
  • sdk/runanywhere-commons/src/features/llm/rac_llm_service.cpp
  • sdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cpp
  • sdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cpp
  • sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.kt
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/LLM/LLMTypes.kt
  • sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_component.h
  • sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_service.h
  • sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_types.h
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift
  • sdk/runanywhere-web/packages/core/src/Foundation/StructOffsets.ts
  • sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.ts
  • sdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppOffsets.ts
  • sdk/runanywhere-web/packages/llamacpp/src/index.ts
  • sdk/runanywhere-web/wasm/src/wasm_exports.cpp

Comment on lines +246 to +261
if (grammar != nullptr) {
grammarStr = env->GetStringUTFChars(grammar, nullptr);
if (grammarStr && grammarStr[0] != '\0') {
options.grammar = grammarStr;
}
}

rac_llm_result_t result = {};
rac_result_t status = rac_llm_llamacpp_generate(
reinterpret_cast<rac_handle_t>(handle),
promptStr, &options, &result);

env->ReleaseStringUTFChars(prompt, promptStr);
if (grammarStr) {
env->ReleaseStringUTFChars(grammar, grammarStr);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify all JNI string acquisitions in this file and inspect local null-handling branches.
rg -n -C3 'GetStringUTFChars\(' sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 1146


Fail fast when GetStringUTFChars for grammar returns null.

At Line 247, GetStringUTFChars can fail and leave a pending Java exception. The current flow skips grammar assignment but continues native work; it should return immediately after releasing promptStr, consistent with how modelPath (line 174), prompt (line 231), and jsonSchema (line 293) are handled elsewhere in the file.

🔧 Suggested fix
     const char* grammarStr = nullptr;
     if (grammar != nullptr) {
         grammarStr = env->GetStringUTFChars(grammar, nullptr);
+        if (!grammarStr) {
+            env->ReleaseStringUTFChars(prompt, promptStr);
+            LOGe("nativeGenerate: Failed to get grammar");
+            return nullptr;
+        }
         if (grammarStr[0] != '\0') {
             options.grammar = grammarStr;
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp`
around lines 246 - 261, The code must fail fast if
env->GetStringUTFChars(grammar, ...) returns null: after calling
env->GetStringUTFChars for grammar (grammarStr), detect a null return, release
the previously acquired promptStr via env->ReleaseStringUTFChars(prompt,
promptStr), and immediately return (or propagate an error) instead of continuing
to call rac_llm_llamacpp_generate; update the block around grammar/grammarStr
and the call to rac_llm_llamacpp_generate to mirror the existing null-handling
pattern used for modelPath, prompt, and jsonSchema so a pending Java exception
is respected and native work is not performed when grammar conversion fails.

Comment on lines +1408 to +1422
std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {
if (json_schema.empty()) {
LOGW("convert_json_schema_to_grammar: empty schema");
return "";
}

try {
auto schema = nlohmann::ordered_json::parse(json_schema);
std::string grammar = json_schema_to_grammar(schema);
LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
json_schema.size(), grammar.size());
return grammar;
} catch (const std::exception& e) {
LOGW("Failed to convert JSON schema to GBNF: %s", e.what());
return "";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Build blocker: LOGW/LOGI are undefined in this translation unit.

This currently fails compilation (as confirmed by the pipeline). Use the RAC logger macros directly here.

🛠️ Compile-fix patch
 std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {
     if (json_schema.empty()) {
-        LOGW("convert_json_schema_to_grammar: empty schema");
+        RAC_LOG_WARNING("LLM.LlamaCpp", "convert_json_schema_to_grammar: empty schema");
         return "";
     }

     try {
         auto schema = nlohmann::ordered_json::parse(json_schema);
         std::string grammar = json_schema_to_grammar(schema);
-        LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
-             json_schema.size(), grammar.size());
+        RAC_LOG_INFO("LLM.LlamaCpp",
+                     "Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
+                     json_schema.size(), grammar.size());
         return grammar;
     } catch (const std::exception& e) {
-        LOGW("Failed to convert JSON schema to GBNF: %s", e.what());
+        RAC_LOG_WARNING("LLM.LlamaCpp", "Failed to convert JSON schema to GBNF: %s", e.what());
         return "";
     }
 }
As per coding guidelines: “sdk/runanywhere-commons/**/*.{cpp,cc,cxx}: Logging must use the RAC logger API (RAC_LOG_INFO, RAC_LOG_ERROR, RAC_LOG_WARNING, RAC_LOG_DEBUG) via rac/core/rac_logger.h”.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {
if (json_schema.empty()) {
LOGW("convert_json_schema_to_grammar: empty schema");
return "";
}
try {
auto schema = nlohmann::ordered_json::parse(json_schema);
std::string grammar = json_schema_to_grammar(schema);
LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
json_schema.size(), grammar.size());
return grammar;
} catch (const std::exception& e) {
LOGW("Failed to convert JSON schema to GBNF: %s", e.what());
return "";
std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {
if (json_schema.empty()) {
RAC_LOG_WARNING("LLM.LlamaCpp", "convert_json_schema_to_grammar: empty schema");
return "";
}
try {
auto schema = nlohmann::ordered_json::parse(json_schema);
std::string grammar = json_schema_to_grammar(schema);
RAC_LOG_INFO("LLM.LlamaCpp",
"Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
json_schema.size(), grammar.size());
return grammar;
} catch (const std::exception& e) {
RAC_LOG_WARNING("LLM.LlamaCpp", "Failed to convert JSON schema to GBNF: %s", e.what());
return "";
}
}
🧰 Tools
🪛 GitHub Actions: Build and Release Backends

[error] 1410-1410: Build failed: use of undeclared identifier 'LOGW' (llamacpp_backend.cpp:1410).


[error] 1417-1417: Build failed: use of undeclared identifier 'LOGI' (llamacpp_backend.cpp:1417).


[error] 1421-1421: Build failed: use of undeclared identifier 'LOGW' (llamacpp_backend.cpp:1421).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around
lines 1408 - 1422, The translation unit uses undefined LOGW/LOGI in
LlamaCppTextGeneration::convert_json_schema_to_grammar; replace those calls with
the RAC logger macros (use RAC_LOG_WARNING for the LOGW call and RAC_LOG_INFO
for the LOGI call, and consider using RAC_LOG_ERROR if changing the exception
log level) and add the required include for rac/core/rac_logger.h at the top of
the file so the macros are available; update the messages' arguments to match
the RAC_LOG_* macro signatures used elsewhere in the codebase.

Comment on lines +562 to +563
*out_grammar = strdup(grammar.c_str());
return RAC_SUCCESS;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check strdup before returning success.

If strdup fails here, the function still returns RAC_SUCCESS with *out_grammar == nullptr. Callers will treat the conversion as successful and then mis-handle a null grammar buffer.

💡 Proposed fix
-        *out_grammar = strdup(grammar.c_str());
-        return RAC_SUCCESS;
+        *out_grammar = strdup(grammar.c_str());
+        if (*out_grammar == nullptr) {
+            rac_error_set_details("Out of memory duplicating GBNF grammar");
+            return RAC_ERROR_OUT_OF_MEMORY;
+        }
+        return RAC_SUCCESS;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
*out_grammar = strdup(grammar.c_str());
return RAC_SUCCESS;
*out_grammar = strdup(grammar.c_str());
if (*out_grammar == nullptr) {
rac_error_set_details("Out of memory duplicating GBNF grammar");
return RAC_ERROR_OUT_OF_MEMORY;
}
return RAC_SUCCESS;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp` around
lines 562 - 563, The strdup call that assigns *out_grammar =
strdup(grammar.c_str()) must be checked for failure before returning
RAC_SUCCESS; if strdup returns NULL set *out_grammar to NULL (if not already)
and return an allocation/error code (e.g., RAC_OOM or RAC_FAIL) instead of
RAC_SUCCESS so callers don't see a false success with a null buffer; update the
block containing *out_grammar = strdup(grammar.c_str()) and the subsequent
return RAC_SUCCESS to test the returned pointer and return the appropriate error
code on NULL.

Comment on lines +786 to +818
std::lock_guard<std::mutex> lock(component->mtx);

// Get service from lifecycle manager
rac_handle_t service = nullptr;
rac_result_t result = rac_lifecycle_require_service(component->lifecycle, &service);
if (result != RAC_SUCCESS) {
log_error("LLM.Component", "No model loaded - cannot generate structured");
return result;
}

// Use provided options or defaults
const rac_llm_options_t* base_options = options ? options : &component->default_options;

// Build effective options with grammar if requested
rac_llm_options_t effective_options = *base_options;

char* grammar_str = nullptr;
if (so_config->use_grammar && so_config->json_schema != nullptr) {
// Convert JSON schema to GBNF grammar via backend
result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
if (result == RAC_SUCCESS && grammar_str) {
effective_options.grammar = grammar_str;
RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
} else {
RAC_LOG_WARNING("LLM.Component",
"Grammar conversion failed (result=%d), falling back to prompt-only",
result);
// Proceed without grammar — prompt-only fallback
}
}

// Delegate to standard generate with grammar-augmented options
result = rac_llm_component_generate(handle, prompt, &effective_options, out_result);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid re-entering component->mtx through the base generate paths.

Line 786 and Line 840 lock component->mtx, then Line 818 and Lines 873-874 call rac_llm_component_generate*, which lock the same std::mutex again at Line 336 and Line 582. Because std::mutex is non-recursive, both structured entry points deadlock on the first call. This needs a shared internal helper or another service-pinning path that performs generation without reacquiring the component mutex.

Also applies to: 840-874

Comment on lines +803 to +815
if (so_config->use_grammar && so_config->json_schema != nullptr) {
// Convert JSON schema to GBNF grammar via backend
result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str);
if (result == RAC_SUCCESS && grammar_str) {
effective_options.grammar = grammar_str;
RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled");
} else {
RAC_LOG_WARNING("LLM.Component",
"Grammar conversion failed (result=%d), falling back to prompt-only",
result);
// Proceed without grammar — prompt-only fallback
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Implement the new fallback/retry contract before exposing it.

These branches only consult so_config->use_grammar. If schema→grammar conversion fails, Line 810-Line 814 and Line 866-Line 868 always fall through to generation with the original prompt, so the newly added rac_structured_output_config_t.max_retries and .fallback fields never affect behavior, and PROMPT_ONLY never actually injects the schema into the prompt. On backends without grammar support, this silently returns unconstrained text from an API that is supposed to be structured.

Also applies to: 860-874

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp` around lines 803
- 815, The current branch only checks so_config->use_grammar and calls
rac_llm_json_schema_to_grammar once, so max_retries and fallback are ignored;
change this by wrapping rac_llm_json_schema_to_grammar in a retry loop that
honors so_config->max_retries (retry with a small backoff) and only gives up
after retries, and then consult so_config->fallback: if fallback == PROMPT_ONLY,
inject the JSON schema into the prompt (e.g., append to effective_options.prompt
or whatever field drives the prompt) so the schema is used even without grammar,
if fallback == NONE/FAIL return an error/propagate failure instead of silently
falling back to unconstrained generation; ensure when
rac_llm_json_schema_to_grammar succeeds you still set effective_options.grammar
= grammar_str as now, and when it ultimately fails you perform the chosen
fallback path so PROMPT_ONLY actually takes effect.

Comment on lines 570 to 582
public init(
type: Generatable.Type,
includeSchemaInPrompt: Bool = true
includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true,
maxRetries: Int = 3,
fallback: StructuredOutputFallback = .retry
) {
self.type = type
self.includeSchemaInPrompt = includeSchemaInPrompt
self.useGrammar = useGrammar
self.maxRetries = maxRetries
self.fallback = fallback
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate maxRetries bounds at init time.

Line 574/Line 580 currently accept negative retry counts, which can propagate invalid runtime behavior into structured generation.

🔧 Suggested guard
 public init(
     type: Generatable.Type,
     includeSchemaInPrompt: Bool = true,
     useGrammar: Bool = true,
     maxRetries: Int = 3,
     fallback: StructuredOutputFallback = .retry
 ) {
+    precondition(maxRetries >= 0, "maxRetries must be non-negative")
     self.type = type
     self.includeSchemaInPrompt = includeSchemaInPrompt
     self.useGrammar = useGrammar
     self.maxRetries = maxRetries
     self.fallback = fallback
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public init(
type: Generatable.Type,
includeSchemaInPrompt: Bool = true
includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true,
maxRetries: Int = 3,
fallback: StructuredOutputFallback = .retry
) {
self.type = type
self.includeSchemaInPrompt = includeSchemaInPrompt
self.useGrammar = useGrammar
self.maxRetries = maxRetries
self.fallback = fallback
}
public init(
type: Generatable.Type,
includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true,
maxRetries: Int = 3,
fallback: StructuredOutputFallback = .retry
) {
precondition(maxRetries >= 0, "maxRetries must be non-negative")
self.type = type
self.includeSchemaInPrompt = includeSchemaInPrompt
self.useGrammar = useGrammar
self.maxRetries = maxRetries
self.fallback = fallback
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift`
around lines 570 - 582, The initializer for the structured generation config
allows negative maxRetries which can lead to invalid runtime behavior; in the
public init(type: Generatable.Type, includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback
= .retry) validate maxRetries at init time (e.g., ensure >= 0 or clamp to a
minimum) and handle invalid values by either throwing/preconditionFailure or
assigning a safe default; update the init in LLMTypes.swift to check the
maxRetries parameter before assigning to self.maxRetries and document the chosen
behavior.

Comment on lines +125 to +127
m.setValue(configPtr + soConf.useGrammar, (config.useGrammar !== false) ? 1 : 0, 'i32');
m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32');
m.setValue(configPtr + soConf.fallback, config.fallback ?? StructuredOutputFallback.Retry, 'i32');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize maxRetries before writing it into WASM.

config.maxRetries ?? 3 still lets NaN, Infinity, negative, and fractional values through, and setValue(..., 'i32') will silently coerce them. That can turn an invalid JS value into an unintended native retry count.

💡 Proposed fix
+    const maxRetries =
+      Number.isFinite(config.maxRetries)
+        ? Math.max(0, Math.trunc(config.maxRetries as number))
+        : 3;
+
     m.setValue(configPtr + soConf.useGrammar, (config.useGrammar !== false) ? 1 : 0, 'i32');
-    m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32');
+    m.setValue(configPtr + soConf.maxRetries, maxRetries, 'i32');
     m.setValue(configPtr + soConf.fallback, config.fallback ?? StructuredOutputFallback.Retry, 'i32');

Apply the same normalization in validate(...) too.

Also applies to: 210-212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+StructuredOutput.ts
around lines 125 - 127, The code writes config.maxRetries directly into WASM
using m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'),
which allows NaN/Infinity/negative/fractional values to be coerced unexpectedly;
normalize and clamp maxRetries to a safe integer before writing (e.g., coerce to
Number, if NaN/!isFinite use default 3, clamp to a minimum 0 and Math.floor to
remove fractions) and apply the same normalization inside the validate(...)
logic as well (update all occurrences that set soConf.maxRetries including the
other block at the 210-212 equivalent to use the normalized value).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant