feat: add grammar-constrained structured output across SDKs#468
feat: add grammar-constrained structured output across SDKs#468sanchitmonga22 wants to merge 1 commit intomainfrom
Conversation
Introduces GBNF grammar-constrained decoding for guaranteed valid JSON output matching a developer schema. Implementation layered through the C++ core, with bindings exposed in all platform SDKs. - Commons (C++): llamacpp grammar sampler at head of sampler chain, JSON Schema → GBNF converter, new component-level structured generate/stream APIs, vtable op json_schema_to_grammar, updated platform backend vtable with explicit NULL entries - JNI: wire grammar field through commons + llamacpp bridges - Swift: StructuredOutputFallback, extended StructuredOutputConfig, generate via rac_llm_component_generate_structured - Web: WASM offset helpers for new fields, TypeScript types updated, StructuredOutputFallback exported - Kotlin: StructuredOutputFallback enum, extended config, LlamaCPPBridge JNI declarations for direct LLM ops + schema-to-grammar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis pull request adds grammar-constrained structured output generation across the RunAnywhere SDK. It introduces new public C APIs for structured generation with JSON schema validation, extends the LLM options and configuration types to support GBNF grammar parameters, implements JSON-to-grammar conversion in the LlamaCPP backend, and provides language bindings for Kotlin, Swift, and Web platforms. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Component as LLM Component
participant Service as LLM Service
participant Backend as LlamaCPP Backend
participant Converter as Grammar Converter
Client->>Component: rac_llm_component_generate_structured(prompt, options, schema_config)
activate Component
Component->>Component: Validate inputs & config
alt use_grammar enabled
Component->>Service: rac_llm_json_schema_to_grammar(json_schema)
activate Service
Service->>Backend: Dispatch to vtable json_schema_to_grammar()
activate Backend
Backend->>Converter: convert_json_schema_to_grammar(schema)
activate Converter
Converter-->>Backend: GBNF grammar string
deactivate Converter
Backend-->>Service: grammar
deactivate Backend
Service-->>Component: grammar (caller owns)
deactivate Service
Component->>Component: Inject grammar into effective_options
else grammar conversion failed
Component->>Component: Log warning, proceed prompt-only
end
Component->>Service: rac_llm_component_generate(prompt, effective_options)
activate Service
Service->>Backend: Forward to backend generate
activate Backend
Backend->>Backend: Initialize grammar sampler if grammar present
Backend->>Backend: Add grammar to sampler chain
Backend->>Backend: Generate tokens constrained by grammar
Backend-->>Service: Result
deactivate Backend
Service-->>Component: Result
deactivate Service
Component->>Component: Free allocated grammar
Component-->>Client: Result
deactivate Component
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@greptile please review |
|
@coderabbitai please review |
|
✅ Actions performedReview triggered.
|
| std::lock_guard<std::mutex> lock(component->mtx); | ||
|
|
||
| // Get service from lifecycle manager | ||
| rac_handle_t service = nullptr; | ||
| rac_result_t result = rac_lifecycle_require_service(component->lifecycle, &service); | ||
| if (result != RAC_SUCCESS) { | ||
| log_error("LLM.Component", "No model loaded - cannot generate structured"); | ||
| return result; | ||
| } | ||
|
|
||
| // Use provided options or defaults | ||
| const rac_llm_options_t* base_options = options ? options : &component->default_options; | ||
|
|
||
| // Build effective options with grammar if requested | ||
| rac_llm_options_t effective_options = *base_options; | ||
|
|
||
| char* grammar_str = nullptr; | ||
| if (so_config->use_grammar && so_config->json_schema != nullptr) { | ||
| // Convert JSON schema to GBNF grammar via backend | ||
| result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str); | ||
| if (result == RAC_SUCCESS && grammar_str) { | ||
| effective_options.grammar = grammar_str; | ||
| RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled"); | ||
| } else { | ||
| RAC_LOG_WARNING("LLM.Component", | ||
| "Grammar conversion failed (result=%d), falling back to prompt-only", | ||
| result); | ||
| // Proceed without grammar — prompt-only fallback | ||
| } | ||
| } | ||
|
|
||
| // Delegate to standard generate with grammar-augmented options | ||
| result = rac_llm_component_generate(handle, prompt, &effective_options, out_result); |
There was a problem hiding this comment.
Guaranteed deadlock via non-recursive mutex re-acquisition
rac_llm_component_generate_structured acquires component->mtx (line 786) and then delegates to rac_llm_component_generate (line 818). rac_llm_component_generate unconditionally acquires the same component->mtx (line 336). Because std::mutex is not recursive, this will deadlock every time the function is called.
The same problem occurs in rac_llm_component_generate_structured_stream (line 840) → rac_llm_component_generate_stream (line 582), which also immediately acquires component->mtx.
The fix is to extract the mutex-free core logic of generate and generate_stream into internal helpers (e.g., generate_locked) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 786-818
Comment:
**Guaranteed deadlock via non-recursive mutex re-acquisition**
`rac_llm_component_generate_structured` acquires `component->mtx` (line 786) and then delegates to `rac_llm_component_generate` (line 818). `rac_llm_component_generate` unconditionally acquires the same `component->mtx` (line 336). Because `std::mutex` is not recursive, this will deadlock every time the function is called.
The same problem occurs in `rac_llm_component_generate_structured_stream` (line 840) → `rac_llm_component_generate_stream` (line 582), which also immediately acquires `component->mtx`.
The fix is to extract the mutex-free core logic of `generate` and `generate_stream` into internal helpers (e.g., `generate_locked`) and have both the public APIs and the new structured wrappers call those helpers, acquiring the mutex only once at the public API boundary.
How can I resolve this? If you propose a fix, please make it concise.| rac_llm_options_t effective_options = *base_options; | ||
|
|
||
| char* grammar_str = nullptr; | ||
| if (so_config->use_grammar && so_config->json_schema != nullptr) { | ||
| // Convert JSON schema to GBNF grammar via backend | ||
| result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str); | ||
| if (result == RAC_SUCCESS && grammar_str) { | ||
| effective_options.grammar = grammar_str; | ||
| RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled"); | ||
| } else { | ||
| RAC_LOG_WARNING("LLM.Component", | ||
| "Grammar conversion failed (result=%d), falling back to prompt-only", | ||
| result); | ||
| // Proceed without grammar — prompt-only fallback | ||
| } | ||
| } | ||
|
|
||
| // Delegate to standard generate with grammar-augmented options | ||
| result = rac_llm_component_generate(handle, prompt, &effective_options, out_result); | ||
|
|
||
| // Free grammar string if we allocated one | ||
| if (grammar_str) { | ||
| free(grammar_str); | ||
| } | ||
|
|
||
| return result; |
There was a problem hiding this comment.
max_retries and fallback silently ignored
The public API exposes so_config->max_retries (default 3) and so_config->fallback (default RETRY), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in rac_llm_component_generate_structured or rac_llm_component_generate_structured_stream. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the RETRY default is never honoured.
Callers relying on fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY or a non-zero max_retries to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/llm/llm_component.cpp
Line: 800-825
Comment:
**`max_retries` and `fallback` silently ignored**
The public API exposes `so_config->max_retries` (default 3) and `so_config->fallback` (default `RETRY`), and the header docs promise retry behaviour on grammar failure. However, neither field is read anywhere in `rac_llm_component_generate_structured` or `rac_llm_component_generate_structured_stream`. Grammar conversion is attempted exactly once, and if it fails the code unconditionally falls through to prompt-only mode — the `RETRY` default is never honoured.
Callers relying on `fallback = RAC_STRUCTURED_OUTPUT_FALLBACK_RETRY` or a non-zero `max_retries` to get a second attempt at constrained decoding will silently receive prompt-only output instead. Either implement the retry loop or document these fields as reserved/future.
How can I resolve this? If you propose a fix, please make it concise.| jstring result = env->NewStringUTF(grammarOut); | ||
| free(grammarOut); |
There was a problem hiding this comment.
free instead of rac_free violates API contract
Both rac_llm_llamacpp_json_schema_to_grammar and rac_llm_json_schema_to_grammar document their output pointer as "caller must free with rac_free()". Here and in llm_component.cpp lines 822 and 878, the raw free() is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.
| jstring result = env->NewStringUTF(grammarOut); | |
| free(grammarOut); | |
| jstring result = env->NewStringUTF(grammarOut); | |
| rac_free(grammarOut); | |
| return result; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 311-312
Comment:
**`free` instead of `rac_free` violates API contract**
Both `rac_llm_llamacpp_json_schema_to_grammar` and `rac_llm_json_schema_to_grammar` document their output pointer as "caller must free with `rac_free()`". Here and in `llm_component.cpp` lines 822 and 878, the raw `free()` is called on memory owned by the RAC API. If the allocator strategy ever changes (e.g., a custom arena), this will silently corrupt memory.
```suggestion
jstring result = env->NewStringUTF(grammarOut);
rac_free(grammarOut);
return result;
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swift (2)
268-326:⚠️ Potential issue | 🟠 MajorFree
rac_llm_result_tbefore returning.This path copies
llmResult.textinto Swift and then returns without callingrac_llm_result_free(&llmResult), so every structured generation leaks native memory. The same leak can happen on an error path if the native call partially populated the result.💡 Proposed fix
- var llmResult = rac_llm_result_t() + var llmResult = rac_llm_result_t() + defer { rac_llm_result_free(&llmResult) } let generateResult: rac_result_t🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift around lines 268 - 326, The native rac_llm_result_t (llmResult) is never freed, leaking native memory; ensure you call rac_llm_result_free(&llmResult) before every exit from the function — both the success return that constructs LLMGenerationResult and any early throw when generateResult != RAC_SUCCESS. The simplest fix is to register a defer immediately after creating llmResult (var llmResult = rac_llm_result_t()) that calls rac_llm_result_free(&llmResult), so the result is freed automatically even on error, and then proceed to copy llmResult.text and build the LLMGenerationResult as before.
227-305:⚠️ Potential issue | 🟠 MajorFix streaming structured output to use the structured native API.
The blocking
generateForStructuredOutputcorrectly routes throughrac_llm_component_generate_structuredwith structured output config, butgenerateStreamin TextGeneration.swift ignores thestructuredOutputfield in options and always calls the regularrac_llm_component_generate_stream. This causesuseGrammar,maxRetries,fallback, and schema enforcement to be silently ignored for streaming. ModifygenerateStreamto checkoptions.structuredOutputand route throughrac_llm_component_generate_structured_stream(which exists in the C API) when structured output is requested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift around lines 227 - 305, The streaming path in generateStream (TextGeneration.swift) currently ignores options.structuredOutput and always calls rac_llm_component_generate_stream; update generateStream to detect options.structuredOutput and when present build a rac_structured_output_config_t (same fields set in generateForStructuredOutput: include_schema_in_prompt, use_grammar, max_retries, fallback and set soConfig.json_schema when options.structuredOutput.type.jsonSchema exists) and call rac_llm_component_generate_structured_stream(handle, promptPtr, &cOptions, &soConfig, &streamCallback) instead of rac_llm_component_generate_stream; preserve existing systemPrompt handling (set cOptions.system_prompt with .withCString) and mirror the nested .withCString usage for the schema string so the structured-stream API receives the schema pointer and config.sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (1)
1148-1150:⚠️ Potential issue | 🟠 Major
generate_from_contextstill drops trailing multi-byte UTF-8 bytes.Before the final
stop_windowemit,partial_utf8_buffershould be flushed (same pattern already used ingenerate_stream), otherwise trailing codepoints can be truncated.Based on learnings: “In sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp, generate_from_context is missing the `partial_utf8_buffer` flush before the final stop_window emit (unlike generate_stream which has it). This causes trailing multi-byte codepoints to be silently dropped.”🧩 Suggested consistency fix
+ // Flush any remaining partial UTF-8 bytes before final emit + if (!cancel_requested_.load() && !stop_sequence_hit && !partial_utf8_buffer.empty()) { + stop_window.append(partial_utf8_buffer); + } + if (!cancel_requested_.load() && !stop_sequence_hit && !stop_window.empty()) { generated_text += stop_window; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around lines 1148 - 1150, The function generate_from_context currently appends stop_window to generated_text without first flushing partial_utf8_buffer, which can drop trailing multi-byte UTF-8 bytes; modify generate_from_context to follow the same pattern used in generate_stream by checking/consuming partial_utf8_buffer (append its contents to generated_text and clear it) before the final block that checks cancel_requested_ and appends stop_window so that any pending partial UTF-8 bytes are emitted intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp`:
- Around line 246-261: The code must fail fast if
env->GetStringUTFChars(grammar, ...) returns null: after calling
env->GetStringUTFChars for grammar (grammarStr), detect a null return, release
the previously acquired promptStr via env->ReleaseStringUTFChars(prompt,
promptStr), and immediately return (or propagate an error) instead of continuing
to call rac_llm_llamacpp_generate; update the block around grammar/grammarStr
and the call to rac_llm_llamacpp_generate to mirror the existing null-handling
pattern used for modelPath, prompt, and jsonSchema so a pending Java exception
is respected and native work is not performed when grammar conversion fails.
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 1408-1422: The translation unit uses undefined LOGW/LOGI in
LlamaCppTextGeneration::convert_json_schema_to_grammar; replace those calls with
the RAC logger macros (use RAC_LOG_WARNING for the LOGW call and RAC_LOG_INFO
for the LOGI call, and consider using RAC_LOG_ERROR if changing the exception
log level) and add the required include for rac/core/rac_logger.h at the top of
the file so the macros are available; update the messages' arguments to match
the RAC_LOG_* macro signatures used elsewhere in the codebase.
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp`:
- Around line 562-563: The strdup call that assigns *out_grammar =
strdup(grammar.c_str()) must be checked for failure before returning
RAC_SUCCESS; if strdup returns NULL set *out_grammar to NULL (if not already)
and return an allocation/error code (e.g., RAC_OOM or RAC_FAIL) instead of
RAC_SUCCESS so callers don't see a false success with a null buffer; update the
block containing *out_grammar = strdup(grammar.c_str()) and the subsequent
return RAC_SUCCESS to test the returned pointer and return the appropriate error
code on NULL.
In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp`:
- Around line 803-815: The current branch only checks so_config->use_grammar and
calls rac_llm_json_schema_to_grammar once, so max_retries and fallback are
ignored; change this by wrapping rac_llm_json_schema_to_grammar in a retry loop
that honors so_config->max_retries (retry with a small backoff) and only gives
up after retries, and then consult so_config->fallback: if fallback ==
PROMPT_ONLY, inject the JSON schema into the prompt (e.g., append to
effective_options.prompt or whatever field drives the prompt) so the schema is
used even without grammar, if fallback == NONE/FAIL return an error/propagate
failure instead of silently falling back to unconstrained generation; ensure
when rac_llm_json_schema_to_grammar succeeds you still set
effective_options.grammar = grammar_str as now, and when it ultimately fails you
perform the chosen fallback path so PROMPT_ONLY actually takes effect.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift`:
- Around line 570-582: The initializer for the structured generation config
allows negative maxRetries which can lead to invalid runtime behavior; in the
public init(type: Generatable.Type, includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback
= .retry) validate maxRetries at init time (e.g., ensure >= 0 or clamp to a
minimum) and handle invalid values by either throwing/preconditionFailure or
assigning a safe default; update the init in LLMTypes.swift to check the
maxRetries parameter before assigning to self.maxRetries and document the chosen
behavior.
In
`@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+StructuredOutput.ts:
- Around line 125-127: The code writes config.maxRetries directly into WASM
using m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'),
which allows NaN/Infinity/negative/fractional values to be coerced unexpectedly;
normalize and clamp maxRetries to a safe integer before writing (e.g., coerce to
Number, if NaN/!isFinite use default 3, clamp to a minimum 0 and Math.floor to
remove fractions) and apply the same normalization inside the validate(...)
logic as well (update all occurrences that set soConf.maxRetries including the
other block at the 210-212 equivalent to use the normalized value).
---
Outside diff comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 1148-1150: The function generate_from_context currently appends
stop_window to generated_text without first flushing partial_utf8_buffer, which
can drop trailing multi-byte UTF-8 bytes; modify generate_from_context to follow
the same pattern used in generate_stream by checking/consuming
partial_utf8_buffer (append its contents to generated_text and clear it) before
the final block that checks cancel_requested_ and appends stop_window so that
any pending partial UTF-8 bytes are emitted intact.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+StructuredOutput.swift:
- Around line 268-326: The native rac_llm_result_t (llmResult) is never freed,
leaking native memory; ensure you call rac_llm_result_free(&llmResult) before
every exit from the function — both the success return that constructs
LLMGenerationResult and any early throw when generateResult != RAC_SUCCESS. The
simplest fix is to register a defer immediately after creating llmResult (var
llmResult = rac_llm_result_t()) that calls rac_llm_result_free(&llmResult), so
the result is freed automatically even on error, and then proceed to copy
llmResult.text and build the LLMGenerationResult as before.
- Around line 227-305: The streaming path in generateStream
(TextGeneration.swift) currently ignores options.structuredOutput and always
calls rac_llm_component_generate_stream; update generateStream to detect
options.structuredOutput and when present build a rac_structured_output_config_t
(same fields set in generateForStructuredOutput: include_schema_in_prompt,
use_grammar, max_retries, fallback and set soConfig.json_schema when
options.structuredOutput.type.jsonSchema exists) and call
rac_llm_component_generate_structured_stream(handle, promptPtr, &cOptions,
&soConfig, &streamCallback) instead of rac_llm_component_generate_stream;
preserve existing systemPrompt handling (set cOptions.system_prompt with
.withCString) and mirror the nested .withCString usage for the schema string so
the structured-stream API receives the schema pointer and config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f193adfc-1c6b-4ea5-a3bf-e2b1cffd846a
📒 Files selected for processing (26)
sdk/runanywhere-commons/exports/RACommons.exportssdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.hsdk/runanywhere-commons/include/rac/features/llm/rac_llm_component.hsdk/runanywhere-commons/include/rac/features/llm/rac_llm_service.hsdk/runanywhere-commons/include/rac/features/llm/rac_llm_types.hsdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cppsdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cppsdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.hsdk/runanywhere-commons/src/backends/llamacpp/rac_backend_llamacpp_register.cppsdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cppsdk/runanywhere-commons/src/features/llm/llm_component.cppsdk/runanywhere-commons/src/features/llm/rac_llm_service.cppsdk/runanywhere-commons/src/features/platform/rac_backend_platform_register.cppsdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cppsdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCPPBridge.ktsdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/LLM/LLMTypes.ktsdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_component.hsdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_service.hsdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_llm_types.hsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+StructuredOutput.swiftsdk/runanywhere-web/packages/core/src/Foundation/StructOffsets.tssdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+StructuredOutput.tssdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppOffsets.tssdk/runanywhere-web/packages/llamacpp/src/index.tssdk/runanywhere-web/wasm/src/wasm_exports.cpp
| if (grammar != nullptr) { | ||
| grammarStr = env->GetStringUTFChars(grammar, nullptr); | ||
| if (grammarStr && grammarStr[0] != '\0') { | ||
| options.grammar = grammarStr; | ||
| } | ||
| } | ||
|
|
||
| rac_llm_result_t result = {}; | ||
| rac_result_t status = rac_llm_llamacpp_generate( | ||
| reinterpret_cast<rac_handle_t>(handle), | ||
| promptStr, &options, &result); | ||
|
|
||
| env->ReleaseStringUTFChars(prompt, promptStr); | ||
| if (grammarStr) { | ||
| env->ReleaseStringUTFChars(grammar, grammarStr); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify all JNI string acquisitions in this file and inspect local null-handling branches.
rg -n -C3 'GetStringUTFChars\(' sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cppRepository: RunanywhereAI/runanywhere-sdks
Length of output: 1146
Fail fast when GetStringUTFChars for grammar returns null.
At Line 247, GetStringUTFChars can fail and leave a pending Java exception. The current flow skips grammar assignment but continues native work; it should return immediately after releasing promptStr, consistent with how modelPath (line 174), prompt (line 231), and jsonSchema (line 293) are handled elsewhere in the file.
🔧 Suggested fix
const char* grammarStr = nullptr;
if (grammar != nullptr) {
grammarStr = env->GetStringUTFChars(grammar, nullptr);
+ if (!grammarStr) {
+ env->ReleaseStringUTFChars(prompt, promptStr);
+ LOGe("nativeGenerate: Failed to get grammar");
+ return nullptr;
+ }
if (grammarStr[0] != '\0') {
options.grammar = grammarStr;
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp`
around lines 246 - 261, The code must fail fast if
env->GetStringUTFChars(grammar, ...) returns null: after calling
env->GetStringUTFChars for grammar (grammarStr), detect a null return, release
the previously acquired promptStr via env->ReleaseStringUTFChars(prompt,
promptStr), and immediately return (or propagate an error) instead of continuing
to call rac_llm_llamacpp_generate; update the block around grammar/grammarStr
and the call to rac_llm_llamacpp_generate to mirror the existing null-handling
pattern used for modelPath, prompt, and jsonSchema so a pending Java exception
is respected and native work is not performed when grammar conversion fails.
| std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) { | ||
| if (json_schema.empty()) { | ||
| LOGW("convert_json_schema_to_grammar: empty schema"); | ||
| return ""; | ||
| } | ||
|
|
||
| try { | ||
| auto schema = nlohmann::ordered_json::parse(json_schema); | ||
| std::string grammar = json_schema_to_grammar(schema); | ||
| LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)", | ||
| json_schema.size(), grammar.size()); | ||
| return grammar; | ||
| } catch (const std::exception& e) { | ||
| LOGW("Failed to convert JSON schema to GBNF: %s", e.what()); | ||
| return ""; |
There was a problem hiding this comment.
Build blocker: LOGW/LOGI are undefined in this translation unit.
This currently fails compilation (as confirmed by the pipeline). Use the RAC logger macros directly here.
🛠️ Compile-fix patch
std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) {
if (json_schema.empty()) {
- LOGW("convert_json_schema_to_grammar: empty schema");
+ RAC_LOG_WARNING("LLM.LlamaCpp", "convert_json_schema_to_grammar: empty schema");
return "";
}
try {
auto schema = nlohmann::ordered_json::parse(json_schema);
std::string grammar = json_schema_to_grammar(schema);
- LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
- json_schema.size(), grammar.size());
+ RAC_LOG_INFO("LLM.LlamaCpp",
+ "Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)",
+ json_schema.size(), grammar.size());
return grammar;
} catch (const std::exception& e) {
- LOGW("Failed to convert JSON schema to GBNF: %s", e.what());
+ RAC_LOG_WARNING("LLM.LlamaCpp", "Failed to convert JSON schema to GBNF: %s", e.what());
return "";
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) { | |
| if (json_schema.empty()) { | |
| LOGW("convert_json_schema_to_grammar: empty schema"); | |
| return ""; | |
| } | |
| try { | |
| auto schema = nlohmann::ordered_json::parse(json_schema); | |
| std::string grammar = json_schema_to_grammar(schema); | |
| LOGI("Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)", | |
| json_schema.size(), grammar.size()); | |
| return grammar; | |
| } catch (const std::exception& e) { | |
| LOGW("Failed to convert JSON schema to GBNF: %s", e.what()); | |
| return ""; | |
| std::string LlamaCppTextGeneration::convert_json_schema_to_grammar(const std::string& json_schema) { | |
| if (json_schema.empty()) { | |
| RAC_LOG_WARNING("LLM.LlamaCpp", "convert_json_schema_to_grammar: empty schema"); | |
| return ""; | |
| } | |
| try { | |
| auto schema = nlohmann::ordered_json::parse(json_schema); | |
| std::string grammar = json_schema_to_grammar(schema); | |
| RAC_LOG_INFO("LLM.LlamaCpp", | |
| "Converted JSON schema to GBNF grammar (schema=%zu chars, grammar=%zu chars)", | |
| json_schema.size(), grammar.size()); | |
| return grammar; | |
| } catch (const std::exception& e) { | |
| RAC_LOG_WARNING("LLM.LlamaCpp", "Failed to convert JSON schema to GBNF: %s", e.what()); | |
| return ""; | |
| } | |
| } |
🧰 Tools
🪛 GitHub Actions: Build and Release Backends
[error] 1410-1410: Build failed: use of undeclared identifier 'LOGW' (llamacpp_backend.cpp:1410).
[error] 1417-1417: Build failed: use of undeclared identifier 'LOGI' (llamacpp_backend.cpp:1417).
[error] 1421-1421: Build failed: use of undeclared identifier 'LOGW' (llamacpp_backend.cpp:1421).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around
lines 1408 - 1422, The translation unit uses undefined LOGW/LOGI in
LlamaCppTextGeneration::convert_json_schema_to_grammar; replace those calls with
the RAC logger macros (use RAC_LOG_WARNING for the LOGW call and RAC_LOG_INFO
for the LOGI call, and consider using RAC_LOG_ERROR if changing the exception
log level) and add the required include for rac/core/rac_logger.h at the top of
the file so the macros are available; update the messages' arguments to match
the RAC_LOG_* macro signatures used elsewhere in the codebase.
| *out_grammar = strdup(grammar.c_str()); | ||
| return RAC_SUCCESS; |
There was a problem hiding this comment.
Check strdup before returning success.
If strdup fails here, the function still returns RAC_SUCCESS with *out_grammar == nullptr. Callers will treat the conversion as successful and then mis-handle a null grammar buffer.
💡 Proposed fix
- *out_grammar = strdup(grammar.c_str());
- return RAC_SUCCESS;
+ *out_grammar = strdup(grammar.c_str());
+ if (*out_grammar == nullptr) {
+ rac_error_set_details("Out of memory duplicating GBNF grammar");
+ return RAC_ERROR_OUT_OF_MEMORY;
+ }
+ return RAC_SUCCESS;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| *out_grammar = strdup(grammar.c_str()); | |
| return RAC_SUCCESS; | |
| *out_grammar = strdup(grammar.c_str()); | |
| if (*out_grammar == nullptr) { | |
| rac_error_set_details("Out of memory duplicating GBNF grammar"); | |
| return RAC_ERROR_OUT_OF_MEMORY; | |
| } | |
| return RAC_SUCCESS; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp` around
lines 562 - 563, The strdup call that assigns *out_grammar =
strdup(grammar.c_str()) must be checked for failure before returning
RAC_SUCCESS; if strdup returns NULL set *out_grammar to NULL (if not already)
and return an allocation/error code (e.g., RAC_OOM or RAC_FAIL) instead of
RAC_SUCCESS so callers don't see a false success with a null buffer; update the
block containing *out_grammar = strdup(grammar.c_str()) and the subsequent
return RAC_SUCCESS to test the returned pointer and return the appropriate error
code on NULL.
| std::lock_guard<std::mutex> lock(component->mtx); | ||
|
|
||
| // Get service from lifecycle manager | ||
| rac_handle_t service = nullptr; | ||
| rac_result_t result = rac_lifecycle_require_service(component->lifecycle, &service); | ||
| if (result != RAC_SUCCESS) { | ||
| log_error("LLM.Component", "No model loaded - cannot generate structured"); | ||
| return result; | ||
| } | ||
|
|
||
| // Use provided options or defaults | ||
| const rac_llm_options_t* base_options = options ? options : &component->default_options; | ||
|
|
||
| // Build effective options with grammar if requested | ||
| rac_llm_options_t effective_options = *base_options; | ||
|
|
||
| char* grammar_str = nullptr; | ||
| if (so_config->use_grammar && so_config->json_schema != nullptr) { | ||
| // Convert JSON schema to GBNF grammar via backend | ||
| result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str); | ||
| if (result == RAC_SUCCESS && grammar_str) { | ||
| effective_options.grammar = grammar_str; | ||
| RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled"); | ||
| } else { | ||
| RAC_LOG_WARNING("LLM.Component", | ||
| "Grammar conversion failed (result=%d), falling back to prompt-only", | ||
| result); | ||
| // Proceed without grammar — prompt-only fallback | ||
| } | ||
| } | ||
|
|
||
| // Delegate to standard generate with grammar-augmented options | ||
| result = rac_llm_component_generate(handle, prompt, &effective_options, out_result); |
There was a problem hiding this comment.
Avoid re-entering component->mtx through the base generate paths.
Line 786 and Line 840 lock component->mtx, then Line 818 and Lines 873-874 call rac_llm_component_generate*, which lock the same std::mutex again at Line 336 and Line 582. Because std::mutex is non-recursive, both structured entry points deadlock on the first call. This needs a shared internal helper or another service-pinning path that performs generation without reacquiring the component mutex.
Also applies to: 840-874
| if (so_config->use_grammar && so_config->json_schema != nullptr) { | ||
| // Convert JSON schema to GBNF grammar via backend | ||
| result = rac_llm_json_schema_to_grammar(service, so_config->json_schema, &grammar_str); | ||
| if (result == RAC_SUCCESS && grammar_str) { | ||
| effective_options.grammar = grammar_str; | ||
| RAC_LOG_INFO("LLM.Component", "Grammar-constrained structured output enabled"); | ||
| } else { | ||
| RAC_LOG_WARNING("LLM.Component", | ||
| "Grammar conversion failed (result=%d), falling back to prompt-only", | ||
| result); | ||
| // Proceed without grammar — prompt-only fallback | ||
| } | ||
| } |
There was a problem hiding this comment.
Implement the new fallback/retry contract before exposing it.
These branches only consult so_config->use_grammar. If schema→grammar conversion fails, Line 810-Line 814 and Line 866-Line 868 always fall through to generation with the original prompt, so the newly added rac_structured_output_config_t.max_retries and .fallback fields never affect behavior, and PROMPT_ONLY never actually injects the schema into the prompt. On backends without grammar support, this silently returns unconstrained text from an API that is supposed to be structured.
Also applies to: 860-874
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp` around lines 803
- 815, The current branch only checks so_config->use_grammar and calls
rac_llm_json_schema_to_grammar once, so max_retries and fallback are ignored;
change this by wrapping rac_llm_json_schema_to_grammar in a retry loop that
honors so_config->max_retries (retry with a small backoff) and only gives up
after retries, and then consult so_config->fallback: if fallback == PROMPT_ONLY,
inject the JSON schema into the prompt (e.g., append to effective_options.prompt
or whatever field drives the prompt) so the schema is used even without grammar,
if fallback == NONE/FAIL return an error/propagate failure instead of silently
falling back to unconstrained generation; ensure when
rac_llm_json_schema_to_grammar succeeds you still set effective_options.grammar
= grammar_str as now, and when it ultimately fails you perform the chosen
fallback path so PROMPT_ONLY actually takes effect.
| public init( | ||
| type: Generatable.Type, | ||
| includeSchemaInPrompt: Bool = true | ||
| includeSchemaInPrompt: Bool = true, | ||
| useGrammar: Bool = true, | ||
| maxRetries: Int = 3, | ||
| fallback: StructuredOutputFallback = .retry | ||
| ) { | ||
| self.type = type | ||
| self.includeSchemaInPrompt = includeSchemaInPrompt | ||
| self.useGrammar = useGrammar | ||
| self.maxRetries = maxRetries | ||
| self.fallback = fallback | ||
| } |
There was a problem hiding this comment.
Validate maxRetries bounds at init time.
Line 574/Line 580 currently accept negative retry counts, which can propagate invalid runtime behavior into structured generation.
🔧 Suggested guard
public init(
type: Generatable.Type,
includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true,
maxRetries: Int = 3,
fallback: StructuredOutputFallback = .retry
) {
+ precondition(maxRetries >= 0, "maxRetries must be non-negative")
self.type = type
self.includeSchemaInPrompt = includeSchemaInPrompt
self.useGrammar = useGrammar
self.maxRetries = maxRetries
self.fallback = fallback
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public init( | |
| type: Generatable.Type, | |
| includeSchemaInPrompt: Bool = true | |
| includeSchemaInPrompt: Bool = true, | |
| useGrammar: Bool = true, | |
| maxRetries: Int = 3, | |
| fallback: StructuredOutputFallback = .retry | |
| ) { | |
| self.type = type | |
| self.includeSchemaInPrompt = includeSchemaInPrompt | |
| self.useGrammar = useGrammar | |
| self.maxRetries = maxRetries | |
| self.fallback = fallback | |
| } | |
| public init( | |
| type: Generatable.Type, | |
| includeSchemaInPrompt: Bool = true, | |
| useGrammar: Bool = true, | |
| maxRetries: Int = 3, | |
| fallback: StructuredOutputFallback = .retry | |
| ) { | |
| precondition(maxRetries >= 0, "maxRetries must be non-negative") | |
| self.type = type | |
| self.includeSchemaInPrompt = includeSchemaInPrompt | |
| self.useGrammar = useGrammar | |
| self.maxRetries = maxRetries | |
| self.fallback = fallback | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/LLMTypes.swift`
around lines 570 - 582, The initializer for the structured generation config
allows negative maxRetries which can lead to invalid runtime behavior; in the
public init(type: Generatable.Type, includeSchemaInPrompt: Bool = true,
useGrammar: Bool = true, maxRetries: Int = 3, fallback: StructuredOutputFallback
= .retry) validate maxRetries at init time (e.g., ensure >= 0 or clamp to a
minimum) and handle invalid values by either throwing/preconditionFailure or
assigning a safe default; update the init in LLMTypes.swift to check the
maxRetries parameter before assigning to self.maxRetries and document the chosen
behavior.
| m.setValue(configPtr + soConf.useGrammar, (config.useGrammar !== false) ? 1 : 0, 'i32'); | ||
| m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'); | ||
| m.setValue(configPtr + soConf.fallback, config.fallback ?? StructuredOutputFallback.Retry, 'i32'); |
There was a problem hiding this comment.
Normalize maxRetries before writing it into WASM.
config.maxRetries ?? 3 still lets NaN, Infinity, negative, and fractional values through, and setValue(..., 'i32') will silently coerce them. That can turn an invalid JS value into an unintended native retry count.
💡 Proposed fix
+ const maxRetries =
+ Number.isFinite(config.maxRetries)
+ ? Math.max(0, Math.trunc(config.maxRetries as number))
+ : 3;
+
m.setValue(configPtr + soConf.useGrammar, (config.useGrammar !== false) ? 1 : 0, 'i32');
- m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32');
+ m.setValue(configPtr + soConf.maxRetries, maxRetries, 'i32');
m.setValue(configPtr + soConf.fallback, config.fallback ?? StructuredOutputFallback.Retry, 'i32');Apply the same normalization in validate(...) too.
Also applies to: 210-212
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+StructuredOutput.ts
around lines 125 - 127, The code writes config.maxRetries directly into WASM
using m.setValue(configPtr + soConf.maxRetries, config.maxRetries ?? 3, 'i32'),
which allows NaN/Infinity/negative/fractional values to be coerced unexpectedly;
normalize and clamp maxRetries to a safe integer before writing (e.g., coerce to
Number, if NaN/!isFinite use default 3, clamp to a minimum 0 and Math.floor to
remove fractions) and apply the same normalization inside the validate(...)
logic as well (update all occurrences that set soConf.maxRetries including the
other block at the 210-212 equivalent to use the normalized value).
Introduces GBNF grammar-constrained decoding for guaranteed valid JSON output matching a developer schema. Implementation layered through the C++ core, with bindings exposed in all platform SDKs.
Description
Brief description of the changes made.
Type of Change
Testing
Platform-Specific Testing (check all that apply)
Swift SDK / iOS Sample:
Kotlin SDK / Android Sample:
Flutter SDK / Flutter Sample:
React Native SDK / React Native Sample:
Playground:
Web SDK / Web Sample:
Labels
Please add the appropriate label(s):
SDKs:
Swift SDK- Changes to Swift SDK (sdk/runanywhere-swift)Kotlin SDK- Changes to Kotlin SDK (sdk/runanywhere-kotlin)Flutter SDK- Changes to Flutter SDK (sdk/runanywhere-flutter)React Native SDK- Changes to React Native SDK (sdk/runanywhere-react-native)Web SDK- Changes to Web SDK (sdk/runanywhere-web)Commons- Changes to shared native code (sdk/runanywhere-commons)Sample Apps:
iOS Sample- Changes to iOS example app (examples/ios)Android Sample- Changes to Android example app (examples/android)Flutter Sample- Changes to Flutter example app (examples/flutter)React Native Sample- Changes to React Native example app (examples/react-native)Web Sample- Changes to Web example app (examples/web)Checklist
Screenshots
Attach relevant UI screenshots for changes (if applicable):
Summary by CodeRabbit
Greptile Summary
This PR wires GBNF grammar-constrained decoding through the full stack — C++ core, JNI, Swift, Kotlin, and Web — to enable guaranteed-valid JSON output matching a developer schema. The C++ backend integration (grammar sampler placement, JSON Schema → GBNF conversion, vtable extension) is well-structured, but the new component-level structured-output APIs in
llm_component.cpphave a critical concurrency defect.rac_llm_component_generate_structuredand_streamacquirecomponent->mtx, then immediately delegate torac_llm_component_generate/rac_llm_component_generate_stream, which unconditionally re-acquire the same non-recursivestd::mutex. Every call through the new API will hang indefinitely.max_retriesandfallbackare documented in the public API and set by all four platform SDKs, but neither field is read anywhere in the C++ implementation; the advertised retry-on-failure behaviour is never executed.Confidence Score: 4/5
Not safe to merge — the new structured-output component APIs will deadlock on every call due to non-recursive mutex re-acquisition.
The underlying backend work (grammar sampler, schema converter, vtable wiring, platform SDK types) is solid, but the glue layer in llm_component.cpp introduces a guaranteed deadlock that makes both new exported functions completely unusable. Once the mutex re-acquisition is fixed and the max_retries/fallback fields are either implemented or documented as future work, the PR can be merged.
sdk/runanywhere-commons/src/features/llm/llm_component.cpp — both new structured-output functions deadlock; max_retries and fallback are silently dropped.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Caller participant generate_structured participant mtx as component->mtx participant generate Caller->>generate_structured: rac_llm_component_generate_structured() generate_structured->>mtx: lock_guard acquire ✓ generate_structured->>generate_structured: rac_llm_json_schema_to_grammar() generate_structured->>generate: rac_llm_component_generate() generate->>mtx: lock_guard acquire ✗ DEADLOCK Note over generate,mtx: std::mutex is non-recursive — hangs foreverPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "feat: add grammar-constrained structured..." | Re-trigger Greptile