You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
5. Re-apply ALL loaded adapters via `llama_set_adapters_lora()`
552
+
6. KV cache is already empty from fresh context — no explicit clear needed
545
553
546
554
This is handled by `recreate_context()` + `apply_lora_adapters()` in
547
-
`llamacpp_backend.cpp`. The approach keeps things simple while ensuring
548
-
correctness -- adapter memory overhead is typically 1-5% of the base model,
549
-
so the cost of re-applying all adapters is negligible.
555
+
`llamacpp_backend.cpp`.
556
+
557
+
### Pre-Generation Adapter Verification
558
+
559
+
Before each `generate_stream()` call, the implementation checks that all loaded
560
+
adapters have `applied == true`. If any adapter is not applied (e.g., due to a
561
+
prior failure), it attempts to re-apply via `apply_lora_adapters()`. If re-apply
562
+
fails, generation is aborted with an error rather than silently ignoring the
563
+
adapter.
550
564
551
565
### KV Cache Invalidation
552
566
@@ -565,10 +579,16 @@ is in progress. The lock hierarchy is:
565
579
- Component layer: `std::lock_guard<std::mutex>` on `component->mtx`
566
580
- Kotlin bridge layer: `synchronized(lock)` on the CppBridgeLLM lock object
567
581
568
-
### Duplicate Detection
582
+
### Input Validation
583
+
584
+
`load_lora_adapter()` performs multi-stage validation before touching llama.cpp:
569
585
570
-
`load_lora_adapter()` checks for duplicate adapter paths before loading. If the
571
-
same path is already loaded, it returns an error instead of loading twice.
586
+
1. **Scale validation** — must be positive and finite (`scale > 0.0f && isfinite(scale)`)
587
+
2. **Duplicate detection** — rejects if same path already loaded
588
+
3. **File existence** — opens file with `std::ifstream` to verify it exists
589
+
4. **GGUF magic check** — reads first 4 bytes and verifies `0x46554747` ("GGUF" LE)
590
+
5. **Tensor match validation** — after `llama_adapter_lora_init()`, checks `adapter->ab_map.size() > 0` to ensure the adapter actually matched model tensors (catches wrong-base-model errors)
@@ -632,6 +652,14 @@ the context and model are freed. This ordering prevents use-after-free.
632
652
| `sdk/runanywhere-kotlin/src/commonMain/.../RunAnywhere+LoRA.kt` | NEW file. `expect` declarations for 4 public API functions |
633
653
| `sdk/runanywhere-kotlin/src/jvmAndroidMain/.../RunAnywhere+LoRA.jvmAndroid.kt` | NEW file. `actual` implementations with init checks, CppBridgeLLM delegation, JSON parsing for adapter info |
| 2026-02-19 | Claude | Initial implementation of LoRA adapter support across all 6 layers (C++ through Kotlin public API). C++ desktop build verified. |
699
727
| 2026-02-19 | Claude | Fixed architecture: Component layer now dispatches LoRA ops through vtable (`rac_llm_service_ops_t`) instead of calling backend directly. This decouples `librac_commons.so` from `librac_backend_llamacpp.so`. Added 4 vtable entries and wrapper functions. Fixed `AttachCurrentThread` cast for Android NDK C++ build. Android native build verified. |
700
728
| 2026-02-19 | Claude | Added detailed Kotlin SDK usage guide with data types, code examples, error handling, Android ViewModel pattern, and table of contents with section links. Updated "How to Extend" to include vtable step. |
729
+
| 2026-03-09 | Claude |**LoRA fix & hardening.** Fixed LoRA adapter having no effect — root cause: wrong adapter file (4.3MB generic vs 17.6MB abliterated F16). Updated Android app to use `qwen2.5-0.5b-abliterated-lora-f16.gguf`. Added C++ validation: scale check, GGUF magic verification, tensor match count via `ab_map` (internal header `llama-adapter.h`), adapter metadata logging, pre-generation adapter state verification. Updated API from deprecated `llama_set_adapter_lora` to `llama_set_adapters_lora` (batch API, b8201). Updated docs to reflect llama.cpp b8201 API changes. |
0 commit comments