-
Notifications
You must be signed in to change notification settings - Fork 20.2k
Issues
is:issue state:open
is:issue state:open
Issue creation is restricted in this repository
Search results
- Status: Open.#25259 In ggml-org/llama.cpp;
Feature Request: SSD streaming of MoE routed expert weights (run MoE models larger than system RAM)
enhancementNew feature or requestNew feature or requestStatus: Open.#25257 In ggml-org/llama.cpp;- Status: Open.#25254 In ggml-org/llama.cpp;
- Status: Open.#25253 In ggml-org/llama.cpp;
Feature Request: Add chat UX parity flags to llama-mtmd-cli (reasoning, template, IO and display controls)
enhancementNew feature or requestNew feature or requestStatus: Open.#25252 In ggml-org/llama.cpp;metal : small-batch mul_mat is compute-bound between the mv_ext and mm paths (bs 4..16), ~2x headroom; affects speculative decoding
enhancementNew feature or requestNew feature or requestStatus: Open.#25250 In ggml-org/llama.cpp;- Status: Open.#25248 In ggml-org/llama.cpp;
- Status: Open.#25227 In ggml-org/llama.cpp;
- Status: Open.#25224 In ggml-org/llama.cpp;
- Status: Open.#25221 In ggml-org/llama.cpp;
- Status: Open.#25213 In ggml-org/llama.cpp;
- Status: Open.#25210 In ggml-org/llama.cpp;