Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[SYCL] support OP cross_entropy_loss, cross_entropy_loss_back documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25236 opened Jul 2, 2026 by arthw Contributor Loading…
common,server : fix custom preset dedup against cached models server
#25235 opened Jul 2, 2026 by angt Member Loading…
[UT] enhance UT to show all real unsupported backends testing Everything test related
#25234 opened Jul 2, 2026 by arthw Contributor Loading…
chat: sanitize invalid UTF-8 before peg-native parsing devops improvements to build systems and github actions
#25233 opened Jul 2, 2026 by iaa2005 Loading…
llama : clear error when MTP draft shares KV cache across backends
#25232 opened Jul 2, 2026 by liminfei-amd Contributor Loading…
1 task done
[SYCL] fix unsupported UT cases of CONT & CPY documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25231 opened Jul 2, 2026 by arthw Contributor Loading…
Ensure unique node names and add org_src to track the org tensor for OpenVINO backend ggml changes relating to the ggml tensor library for machine learning testing Everything test related
#25230 opened Jul 2, 2026 by zhaixuejun1993 Contributor Loading…
vulkan: when using transfer queue for async copies, sync on event_wait to avoid race ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#25229 opened Jul 2, 2026 by 0cc4m Contributor Loading…
CUDA: Support CUDA Virtual Devices CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning
#25228 opened Jul 2, 2026 by anavp-nvidia Contributor Loading…
server : don't list cached models when a preset is used server
#25226 opened Jul 2, 2026 by angt Member Loading…
[SYCL] Flash Attention with XMX engine via oneDNN graph API (SDPA) on KV f16; Qwen3.6-27b-Q8_0 prefill speed up x1.21 at p=512 and x4.26 at p=80k ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25222 opened Jul 2, 2026 by hmscider Loading…
common : add missing <fstream> include in common.h
#25220 opened Jul 2, 2026 by zhangrunda Loading…
1 task done
vendor : update cpp-httplib to 0.49.0 vendor
#25218 opened Jul 2, 2026 by cabelo Contributor Loading…
sycl: add fused top-k MoE ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25217 opened Jul 2, 2026 by newjordan Contributor Loading…
hexagon: add VISION RoPE support ggml changes relating to the ggml tensor library for machine learning Hexagon
#25216 opened Jul 2, 2026 by aparmp-quic Contributor Loading…
llama : skip K/V rotation input when its buffer is unallocated
#25215 opened Jul 2, 2026 by liminfei-amd Contributor Loading…
1 task done
server: add --no-sleep flag for GPU heartbeat on headless GPUs CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Vulkan Issues specific to the Vulkan backend
#25214 opened Jul 1, 2026 by johnkarlhill Loading…
rocm: fix mmap loading of large models CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning
#25212 opened Jul 1, 2026 by pwilkin Member Loading…
Optimize RWKV7 inference by fusing some graph operators Apple Metal https://en.wikipedia.org/wiki/Metal_(API) CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning model Model specific SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
#25206 opened Jul 1, 2026 by MollySophia Collaborator Draft
sycl: add GGML_SYCL_FATTN_VEC_NTHREADS build option ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25205 opened Jul 1, 2026 by Titaniumtown Loading…
llama: fix quantized kv-cache for dsv4 model Model specific
#25202 opened Jul 1, 2026 by am17an Contributor Loading…
ProTip! Exclude everything labeled bug with -label:bug.