-
Notifications
You must be signed in to change notification settings - Fork 353
C++ layer full cleanup: correctness, ABI, backend abstraction, JNI safety #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ce9c7ba
fdf4b61
bcdf071
2a6b382
d365858
de9f97f
ed7ee41
77f19f4
42d8546
d4a5c9f
049f98f
817b4e9
f655e0c
459c324
4630546
2ed7fb2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,161 @@ | ||||||
| # C++ Layer Cleanup — Branch Status | ||||||
|
|
||||||
| > **Branch**: `smonga/cpp-layer-full-cleanup` (12 commits, branched from `main @ bc7db9bd0`). | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commit-count metadata is already stale. Line [3] says “12 commits,” but the same document’s table has more entries. Prefer removing the hard-coded count or making it “N+ commits” to avoid immediate drift. 📝 Suggested doc tweak-> **Branch**: `smonga/cpp-layer-full-cleanup` (12 commits, branched from `main @ bc7db9bd0`).
+> **Branch**: `smonga/cpp-layer-full-cleanup` (branched from `main @ bc7db9bd0`).📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| > **Scope**: Full refactor of the C++ layer per `CPP_LAYER_AUDIT.md` — correctness bugs, ABI stability, tooling, architectural foundations, SDK consumer coordination. | ||||||
| > **Backward compatibility**: Intentionally broken (no `version=0` grace period, full ABI change). | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Commit sequence | ||||||
|
|
||||||
| | # | SHA | Phase | Summary | | ||||||
| |---|-----|-------|---------| | ||||||
| | 1 | `ce9c7ba80` | 1 | `cmake/Sanitizers.cmake` + dev-asan/ubsan/tsan CMake presets + tightened `.clang-tidy` | | ||||||
| | 2 | `fdf4b6111` | 2 | `RAC_NODISCARD` on 479 public functions via new `rac_attrs.h` | | ||||||
| | 3 | `bcdf071ff` | 3 | sherpa-onnx runtime ABI version check + `std::atomic` download flags + `RAC_ERROR_BACKEND_INCOMPATIBLE_VERSION` / `RAC_ERROR_JNI_EXCEPTION` | | ||||||
| | 4 | `2a6b38240` | 4 | Moved `rac_platform_compat.h` from `include/` → `src/internal/` (fixes PR #383 deferred TODO) | | ||||||
| | 5 | `d36585868` | 5 | Wired `wakeword_service` to existing `wakeword_onnx` via new provider vtable (7 TODOs closed) | | ||||||
| | 6 | `de9f97fc0` | 6a | `JniScope` RAII helper + 3 highest-frequency platform adapter callbacks converted | | ||||||
| | 7 | `ed7ee41c0` | 8 | `rac_hardware_query_capabilities()` public API + `IInferenceBackend` abstract base | | ||||||
| | 8 | `77f19f4bc` | 9 | `rac_platform_adapter_t.version` field + validation in `rac_init` | | ||||||
| | 9 | `42d854659` | — | First status document | | ||||||
| | 10 | `d4a5c9fa9` | 6b | Completed JNI sweep — all 17 native→Java callbacks now use `JniScope`; removed `version=0` grace period from `rac_init` | | ||||||
| | 11 | `049f98f7b` | — | `AttachCurrentThread` cast for macOS-host-JDK portability (enables local JNI build) | | ||||||
| | 12 | `817b4e9c9` | 10 | `adapter.version = RAC_PLATFORM_ADAPTER_VERSION` set by all 5 SDK consumers | | ||||||
| | 13 | `f655e0c08` | 7 + tests | JNI shared-state scaffold (`jni_shared.h`) + 4 new unit tests for version check / hardware API | | ||||||
|
|
||||||
| **Diff stat**: 100+ files changed, +3000 / −600 total. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Per-phase status | ||||||
|
|
||||||
| ### Phase 1 — Tooling — **COMPLETE** | ||||||
| - `cmake/Sanitizers.cmake` with `ENABLE_ASAN/UBSAN/TSAN/MSAN`, mutual-exclusion guards, `target_enable_sanitizers()` helper. | ||||||
| - `CMakePresets.json` extended with `dev-asan`, `dev-ubsan`, `dev-tsan`. | ||||||
| - `.clang-tidy` enables `cppcoreguidelines-owning-memory`, `-slicing`, `-pro-type-*`; removed `-modernize-use-nodiscard` disable so the check is active. | ||||||
|
|
||||||
| ### Phase 2 — NODISCARD sweep — **COMPLETE** | ||||||
| - New header `include/rac/core/rac_attrs.h` provides portable `RAC_NODISCARD`, `RAC_NONNULL`, `RAC_DEPRECATED`, `RAC_ATTR_PRINTF`, `RAC_NORETURN`, `RAC_PURE`. | ||||||
| - Applied `RAC_NODISCARD` to all 479 `RAC_API rac_result_t ...` public function declarations via scripted sed pass. | ||||||
| - `rac_types.h` transitively includes `rac_attrs.h` so every header picks up the macros. | ||||||
|
|
||||||
| ### Phase 3 — Critical bug fixes — **COMPLETE** | ||||||
| - **sherpa-onnx runtime version check** in `rac_backend_onnx_register()` — rejects ABI mismatch with `RAC_ERROR_BACKEND_INCOMPATIBLE_VERSION` before first inference (was SIGSEGV). | ||||||
| - **Download manager atomics** — `is_healthy` / `is_paused` are now `std::atomic<bool>{true/false}` with explicit member init. | ||||||
| - **New error codes** — `RAC_ERROR_BACKEND_INCOMPATIBLE_VERSION` (-605), `RAC_ERROR_JNI_EXCEPTION` (-606). | ||||||
| - WhisperKit CoreML "leak" — verified as **false positive**; `rac_stt_destroy` already frees everything. No change. | ||||||
| - HTTP client `malloc`/`realloc` — deliberately unchanged; void-return function can't propagate alloc failures. | ||||||
|
|
||||||
| ### Phase 4 — Windows namespace de-pollution — **COMPLETE** | ||||||
| - Moved `include/rac/core/rac_platform_compat.h` → `src/internal/rac_platform_compat.h`. | ||||||
| - Updated 10 `#include` call sites to use the new path. | ||||||
| - Updated `src/backends/onnx/CMakeLists.txt`, `src/features/rag/CMakeLists.txt`, and `tests/CMakeLists.txt` to add `src/` as a PRIVATE include root for targets that need the internal shim. | ||||||
|
|
||||||
| ### Phase 5 — Wakeword wire-up — **COMPLETE** | ||||||
| - New `rac_wakeword_provider_ops_t` vtable in `include/rac/features/wakeword/rac_wakeword_service.h`. | ||||||
| - New public API `rac_wakeword_provider_set()` / `rac_wakeword_has_provider()`. | ||||||
| - `wakeword_service.cpp` all 7 TODOs closed — create/load_model/load_vad/process/reset/unload/destroy all dispatch through the provider vtable. | ||||||
| - `wakeword_onnx.cpp` `rac_backend_wakeword_onnx_register()` now installs the ONNX adapter as the wakeword provider. | ||||||
|
|
||||||
| ### Phase 6 — JNI exception-safety sweep — **COMPLETE** | ||||||
| - `src/jni/jni_scope.h`: `rac::jni::JniScope` RAII class + `Local<T>` wrapper + `RAC_JNI_TRY` macros. | ||||||
| - **All 17 native→Java callback sites** in `runanywhere_commons_jni.cpp` migrated to `JniScope`: | ||||||
| `jni_log_callback`, `jni_file_exists_callback`, `jni_file_read_callback`, `jni_file_write_callback`, `jni_file_delete_callback`, `jni_secure_get_callback`, `jni_secure_set_callback`, `jni_secure_delete_callback`, `jni_now_ms_callback`, `llm_stream_callback_token`, `jni_device_get_info`, `jni_device_get_id`, `jni_device_is_registered`, `jni_device_set_registered`, `jni_device_http_post`, `jni_telemetry_http_callback`, `model_assignment_http_get_callback`. | ||||||
| - **Exception-check coverage**: 11/149 → 100% of native→Java call sites (Java→native `JNIEXPORT` returns correctly propagate exceptions to JVM without needing `JniScope`). | ||||||
| - `AttachCurrentThread` calls now cast through `void**` so the target builds on macOS-host-JDK (was Android-NDK-only). | ||||||
|
|
||||||
| ### Phase 7 — JNI file split — **SCAFFOLDED (incremental follow-up)** | ||||||
| - `src/jni/jni_shared.h` defines the shared-state contract + `LOGi/d/w/e` macros that future per-feature shards will include. | ||||||
| - Planned shard layout documented in the header: | ||||||
| `jni_platform_adapter.cpp`, `jni_llm.cpp`, `jni_stt.cpp`, `jni_tts.cpp`, `jni_vad_wakeword.cpp`, `jni_vlm.cpp`, `jni_model_registry.cpp`, `jni_device.cpp`, `jni_telemetry.cpp`, `jni_benchmark.cpp`. | ||||||
| - Actual `.cpp` extraction deferred: a 4,800-line diff is unreviewable, shard-by-shard is the correct approach. The scaffold header is in place so each shard commit only needs to extract one feature's entry points. | ||||||
|
|
||||||
| ### Phase 8 — IInferenceBackend + hardware API — **FOUNDATION COMPLETE** | ||||||
| - New public `include/rac/core/rac_hardware.h` + `src/core/rac_hardware.cpp`: `rac_hardware_query_capabilities(rac_hardware_report_t*)`. | ||||||
| - `rac_hardware_report_t` versioned, with compile-time-honest flags for NEON/SSE/AVX/AVX2/AVX-512, Metal/CUDA/Vulkan/OpenCL/WebGPU, ANE/QNN/Genio, Apple Silicon / iOS Simulator / Android Emulator. Plus CPU count probes and Apple-Silicon unified-memory estimate. | ||||||
| - Abstract `src/backends/backend_interface.h` with `rac::backends::IInferenceBackend` pure virtual class: `advertise()`, `supports_hardware(report)`, `load_model()`, `unload_model()`, `cancel()`, `health_check()`. | ||||||
| - **Backend retrofit** (existing 5 backends inheriting from `IInferenceBackend`) is intentional follow-up — touch-every-backend boilerplate with no observable behaviour change, best done alongside Phase 7's per-feature split. | ||||||
|
|
||||||
| ### Phase 9 — ABI versioning — **COMPLETE (strict mode)** | ||||||
| - `rac_platform_adapter_t.version` field (first in struct) + `RAC_PLATFORM_ADAPTER_VERSION = 1` constant. | ||||||
| - `rac_init` rejects `version == 0` with `RAC_ERROR_BACKEND_INCOMPATIBLE_VERSION` — **no grace period** per user request. A zero-inited adapter is a programming error caught at init. | ||||||
| - `rac_init` rejects `version > RAC_PLATFORM_ADAPTER_VERSION` (future-compiled caller, older runtime). | ||||||
| - Covered by 2 unit tests (`test_adapter_version_zero_rejected`, `test_adapter_version_future_rejected`). | ||||||
|
|
||||||
| ### Phase 10 — SDK consumer updates — **COMPLETE** | ||||||
|
|
||||||
| All five SDKs now set `adapter.version = RAC_PLATFORM_ADAPTER_VERSION` before calling `rac_init`: | ||||||
|
|
||||||
| | SDK | File | Change | | ||||||
| |-----|------|--------| | ||||||
| | Swift | `sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge+PlatformAdapter.swift` | `adapter.version = UInt32(RAC_PLATFORM_ADAPTER_VERSION)` | | ||||||
| | Swift (header sync) | `sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_platform_adapter.h` | Added version field + constant | | ||||||
| | Swift (header sync) | `sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_error.h` | Added 3 new error codes | | ||||||
| | Kotlin/JNI | `sdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cpp` | `g_c_adapter.version = RAC_PLATFORM_ADAPTER_VERSION` in `racSetPlatformAdapter` | | ||||||
| | React Native | `sdk/runanywhere-react-native/packages/core/cpp/bridges/InitBridge.cpp` | `adapter_.version = RAC_PLATFORM_ADAPTER_VERSION` | | ||||||
| | Flutter | `sdk/runanywhere-flutter/packages/runanywhere/lib/native/ffi_types.dart` | Added `@Uint32() version` field + `racPlatformAdapterVersion` constant | | ||||||
| | Flutter | `sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_platform.dart` | `adapter.ref.version = racPlatformAdapterVersion` | | ||||||
| | Web | `sdk/runanywhere-web/packages/llamacpp/src/Foundation/PlatformAdapter.ts` | Writes 4-byte version uint32 as first struct field before function pointers | | ||||||
|
|
||||||
| ### Phase 11 — Multi-platform build verification — **N/A per user instruction** | ||||||
| User explicitly said to disregard CI/CD. Phase 1 put the sanitizer CMake plumbing in place for CI to adopt later. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Local verification matrix (what I ran on macOS) | ||||||
|
|
||||||
| | Target | Command | Result | | ||||||
| |--------|---------|--------| | ||||||
| | `rac_commons` (dev-asan) | `cmake --build build/dev-asan --target rac_commons` | ✅ clean build w/ ASan+UBSan | | ||||||
| | All 4 backend libraries | `cmake --build build/dev-asan --target rac_commons rac_backend_onnx rac_backend_llamacpp rac_backend_rag` | ✅ clean | | ||||||
| | `runanywhere_commons_jni` (dev-core) | `cmake --build build/dev-core --target runanywhere_commons_jni` | ✅ clean build (after `void**` cast fix) | | ||||||
| | All tests | `cd build/dev-asan && ctest` | ✅ **68 / 70 passed**; 2 failures are pre-existing `rac_rag_backend_thread_safety_test` (missing `IEmbeddingProvider`) | | ||||||
| | `test_core` | `./build/dev-asan/tests/test_core --run-all` | ✅ **17 / 17 passed** (4 new + 13 existing) | | ||||||
| | Swift SDK | `swift build` | ✅ Build complete | | ||||||
| | Kotlin SDK | (covered by JNI target build above) | ✅ | | ||||||
| | React Native core | `npm run typecheck` | ✅ no errors | | ||||||
| | Flutter | `flutter pub get && flutter analyze` | ✅ 0 errors (2 pre-existing unrelated warnings) | | ||||||
| | Web/WASM | `npx tsc --noEmit` | ✅ my `PlatformAdapter.ts` compiles clean (pre-existing `@runanywhere/web` workspace resolution errors in other files, unrelated) | | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Known pre-existing issues (not caused by this branch) | ||||||
|
|
||||||
| 1. `tests/rag_backend_thread_safety_test.cpp` doesn't compile — `IEmbeddingProvider` class missing. Predates this branch; surfaces now only because my `target_enable_sanitizers()` change makes test linking stricter. | ||||||
| 2. Web `@runanywhere/web` workspace package resolution — pre-existing tsconfig/package.json issue in llamacpp Web package. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Deferred / follow-up work | ||||||
|
|
||||||
| ### Phase 7 completion (per-feature JNI file extraction) | ||||||
| The shared-state header is in place. Per-feature extraction is mechanical: | ||||||
| 1. Define `jni_shared.cpp` that owns `g_jvm`, `g_platform_adapter`, all `g_method_*` globals as non-`static`. | ||||||
| 2. For each feature (LLM, STT, TTS, VAD, VLM, device, model_registry, telemetry, benchmark), extract its `JNIEXPORT` functions into a new `.cpp` file that includes `jni_shared.h`. | ||||||
| 3. Add each new `.cpp` to `src/jni/CMakeLists.txt`. | ||||||
| 4. Final `runanywhere_commons_jni.cpp` keeps only `JNI_OnLoad` and the `racInit`/`racShutdown`/adapter-registration entry points. | ||||||
|
|
||||||
| ### Phase 8 backend retrofit (existing 5 backends → `IInferenceBackend`) | ||||||
| For each of `LlamaCppBackend`, `ONNXBackendNew`, `WhisperCppBackend`, MetalRT wrapper, WhisperKit CoreML wrapper: | ||||||
| 1. Add `: public rac::backends::IInferenceBackend` to the class declaration. | ||||||
| 2. Implement the 6 pure virtual methods (most are thin wrappers around existing methods). | ||||||
| 3. Call `rac_hardware_query_capabilities()` in each backend's `can_handle()` and refuse on wrong hardware. | ||||||
|
|
||||||
| ### CRACommons header fork | ||||||
| The Swift SDK maintains its own copy of commons headers in `sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/`. The `build-swift.sh` sync script only covers backend headers. The manual fork is a divergence risk. Long-term fix: drop the fork and point the Swift SPM module to `../runanywhere-commons/include` directly, or make the sync comprehensive. | ||||||
|
|
||||||
| ### Prebuilt xcframeworks | ||||||
| The `sdk/runanywhere-swift/Binaries/` prebuilt xcframeworks were produced from pre-versioning commons. Downstream apps will need fresh builds after this branch merges. | ||||||
|
|
||||||
| ### Open call-site `[[nodiscard]]` warnings | ||||||
| ~40 pre-existing silent return-value drops surface as warnings with `RAC_NODISCARD` in place. Each needs a per-site judgement call (log / propagate / intentional `(void)` cast). Not fixed in this branch. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## PR description (for gh pr create) | ||||||
|
|
||||||
| - Title: `C++ layer full cleanup: correctness, ABI, backend abstraction, JNI safety` | ||||||
| - Breaking change: Yes — adapter struct layout changed (`version` field added), all SDK consumers updated in this PR. | ||||||
| - Risk: Medium — platform-adapter ABI change affects every SDK. Version-field validation catches mis-updates at runtime with a clear error. | ||||||
| - Test coverage: 68/70 ctest passing, 17/17 test_core passing, all 5 SDK wrappers build locally on macOS. | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: RunanywhereAI/runanywhere-sdks
Length of output: 193
🏁 Script executed:
Repository: RunanywhereAI/runanywhere-sdks
Length of output: 1882
🏁 Script executed:
Repository: RunanywhereAI/runanywhere-sdks
Length of output: 1928
🏁 Script executed:
Repository: RunanywhereAI/runanywhere-sdks
Length of output: 109
🏁 Script executed:
Repository: RunanywhereAI/runanywhere-sdks
Length of output: 1345
Add
CC/CXXcompiler env vars to all dev presets.All four dev presets lack explicit compiler configuration. Per C++ Commons guidelines,
gcc/g++must be pinned to avoid clang toolchain drift, which breaks builds on this platform due to missing C++ stdlib headers.Update three presets with existing environment sections (dev-asan at 84-87, dev-ubsan at 100-102, dev-tsan at 115-117) by adding:
Also add an "environment" section to dev-no-backends (currently missing) with the same compiler variables.
🤖 Prompt for AI Agents