Issues · ggml-org/llama.cpp · GitHub

changelog : libllama API
#9289 · ggerganov opened on Sep 3, 2024
12
changelog : llama-server REST API
#9291 · ggerganov opened on Sep 3, 2024
19
tutorials : list for llama.cpp
#13523 · ggerganov opened on May 14, 2025
22

Labels Milestones

Eval bug: DeepSeek V4 Flash forgot context when hit cache or use checkpoint in long context

bug-unconfirmed

#25259

· wenyifancc opened

on Jul 3, 2026

Feature Request: SSD streaming of MoE routed expert weights (run MoE models larger than system RAM)

#25257

· freedomljc opened

on Jul 3, 2026

Misc. bug: Thinking tags before tool call cause cache miss on third request

bug-unconfirmed

#25254

· sunhy0316 opened

on Jul 3, 2026

New start

#25253

· PCorpCProNova opened

on Jul 3, 2026

Feature Request: Add chat UX parity flags to llama-mtmd-cli (reasoning, template, IO and display controls)

#25252

· CaioLimaViana opened

on Jul 3, 2026

metal : small-batch mul_mat is compute-bound between the mv_ext and mm paths (bs 4..16), ~2x headroom; affects speculative decoding

#25250

· discobot opened

on Jul 2, 2026

Eval bug: Qwen 3.6 crashes llama-server with prompts more than 132 tokens

bug-unconfirmed

#25248

· samteezy opened

on Jul 2, 2026

webui: model selector — org-less models visually attach to the previous org group

#25227

· erusev opened

on Jul 2, 2026

Eval bug: MoE models: regression in auto-fit layer allocation on 8GB VRAM after #24180 (granularity rounded to 128)

bug-unconfirmed

#25224

· chahualao opened

on Jul 2, 2026

Misc. bug: Overflow warning when quantizing separated MTP head to Q8_0 for Qwen3.6-35B-A3B

bug-unconfirmed

#25221

· congson1293 opened

on Jul 2, 2026

Misc. bug: Substantial prefill speed regression when resuming long session from a coding agent

bug-unconfirmed

#25213

· pinkfluid opened

on Jul 1, 2026

Misc. bug: (OpenAI compat) unsupported "dimensions" parameter in embeddings endpoint

bug-unconfirmed

#25210

· Irelynx opened

on Jul 1, 2026