You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
4. Re-apply ALL loaded adapters to the new context
544
+
5. Clear KV cache
553
545
554
546
This is handled by `recreate_context()` + `apply_lora_adapters()` in
555
-
`llamacpp_backend.cpp`.
556
-
557
-
### Pre-Generation Adapter Verification
558
-
559
-
Before each `generate_stream()` call, the implementation checks that all loaded
560
-
adapters have `applied == true`. If any adapter is not applied (e.g., due to a
561
-
prior failure), it attempts to re-apply via `apply_lora_adapters()`. If re-apply
562
-
fails, generation is aborted with an error rather than silently ignoring the
563
-
adapter.
547
+
`llamacpp_backend.cpp`. The approach keeps things simple while ensuring
548
+
correctness -- adapter memory overhead is typically 1-5% of the base model,
549
+
so the cost of re-applying all adapters is negligible.
564
550
565
551
### KV Cache Invalidation
566
552
@@ -579,16 +565,10 @@ is in progress. The lock hierarchy is:
579
565
- Component layer: `std::lock_guard<std::mutex>` on `component->mtx`
580
566
- Kotlin bridge layer: `synchronized(lock)` on the CppBridgeLLM lock object
581
567
582
-
### Input Validation
583
-
584
-
`load_lora_adapter()` performs multi-stage validation before touching llama.cpp:
568
+
### Duplicate Detection
585
569
586
-
1. **Scale validation** — must be positive and finite (`scale > 0.0f && isfinite(scale)`)
587
-
2. **Duplicate detection** — rejects if same path already loaded
588
-
3. **File existence** — opens file with `std::ifstream` to verify it exists
589
-
4. **GGUF magic check** — reads first 4 bytes and verifies `0x46554747` ("GGUF" LE)
590
-
5. **Tensor match validation** — after `llama_adapter_lora_init()`, checks `adapter->ab_map.size() > 0` to ensure the adapter actually matched model tensors (catches wrong-base-model errors)
`load_lora_adapter()` checks for duplicate adapter paths before loading. If the
571
+
same path is already loaded, it returns an error instead of loading twice.
592
572
593
573
### Rollback on Failure
594
574
@@ -652,14 +632,6 @@ the context and model are freed. This ordering prevents use-after-free.
652
632
| `sdk/runanywhere-kotlin/src/commonMain/.../RunAnywhere+LoRA.kt` | NEW file. `expect` declarations for 4 public API functions |
653
633
| `sdk/runanywhere-kotlin/src/jvmAndroidMain/.../RunAnywhere+LoRA.jvmAndroid.kt` | NEW file. `actual` implementations with init checks, CppBridgeLLM delegation, JSON parsing for adapter info |
| 2026-02-19 | Claude | Initial implementation of LoRA adapter support across all 6 layers (C++ through Kotlin public API). C++ desktop build verified. |
727
699
| 2026-02-19 | Claude | Fixed architecture: Component layer now dispatches LoRA ops through vtable (`rac_llm_service_ops_t`) instead of calling backend directly. This decouples `librac_commons.so` from `librac_backend_llamacpp.so`. Added 4 vtable entries and wrapper functions. Fixed `AttachCurrentThread` cast for Android NDK C++ build. Android native build verified. |
728
700
| 2026-02-19 | Claude | Added detailed Kotlin SDK usage guide with data types, code examples, error handling, Android ViewModel pattern, and table of contents with section links. Updated "How to Extend" to include vtable step. |
729
-
| 2026-03-09 | Claude |**LoRA fix & hardening.** Fixed LoRA adapter having no effect — root cause: wrong adapter file (4.3MB generic vs 17.6MB abliterated F16). Updated Android app to use `qwen2.5-0.5b-abliterated-lora-f16.gguf`. Added C++ validation: scale check, GGUF magic verification, tensor match count via `ab_map` (internal header `llama-adapter.h`), adapter metadata logging, pre-generation adapter state verification. Updated API from deprecated `llama_set_adapter_lora` to `llama_set_adapters_lora` (batch API, b8201). Updated docs to reflect llama.cpp b8201 API changes. |
0 commit comments