triton-kernels: add vendor-neutral Triton skill by jaygala223 · Pull Request #675 · huggingface/kernels

jaygala223 · 2026-06-24T05:52:12Z

Related issue

Closes #302

What does this PR do?

Adds a new triton-kernels skill to kernel-builder/skills/, providing vendor-neutral guidance for writing portable Triton kernels that run on NVIDIA and AMD GPUs without modification.

Motivation

The existing skills are either raw CUDA (cuda-kernels) or backend-specific Triton (rocm-kernels, xpu-kernels). There's no shared base covering portable Triton patterns that the backend-specific skills can build on. This was identified in #302 and the skills/cuda-kernels link in the original issue is now dead since skills moved to kernel-builder/skills/ in #456.

Changes

Add kernel-builder/skills/triton-kernels/SKILL.md: core DSL patterns (program_id, masked loads, 2D tiling, reductions, tl.dot), autotune guidance, numerics (fp32 accumulation, safe softmax, dtype handling), reference kernel implementations (softmax, matmul, rmsnorm), benchmarking patterns, get_kernel integration, transformers patching, and common pitfalls table
Add references/autotune-guide.md: deep dive on configs, key parameter, num_warps/num_stages selection, when NOT to autotune, practical config sets
Add references/kernel-patterns.md: dropout (in-SRAM mask), fused add+rmsnorm, SiLU, SwiGLU, group-major PID ordering, pointer sliding
Add references/benchmarking-guide.md: GB/s vs TFLOPS, roofline model, do_bench usage, when to expect gains over PyTorch
Add scripts/benchmark_template.py: reusable benchmark harness with correctness check
Add scripts/correctness_template.py: multi-shape, multi-dtype test suite template
Add examples/fused_softmax.py: complete working kernel with test + benchmark
Add manifest.txt and CHANGELOG.md

Testing

The example kernel (examples/fused_softmax.py) passes correctness tests against torch.softmax on V100 with irregular shapes (1823x781) at atol=1e-3
Benchmark template runs and produces GB/s measurements
Verified the skill follows the same directory structure as rocm-kernels (SKILL.md, manifest.txt, CHANGELOG.md, references/, scripts/, examples/)

Checklist

This PR is linked to an issue that was discussed and approved
I have tested these changes locally
New/changed functionality has test coverage
LLM disclosure:
- I did not use an LLM to create this PR.
- I used an LLM for assistance while creating this PR.
- This PR was mostly or completely generated by an LLM.

Add a new skill for writing portable Triton kernels that run on NVIDIA and AMD GPUs without modification. Covers core DSL patterns, autotune, numerics, benchmarking, and HuggingFace Kernels Hub integration. Complements cuda-kernels (raw CUDA) and sits alongside rocm-kernels/ xpu-kernels as the vendor-neutral base they can specialize from. Contents: - SKILL.md: main instruction file with reference kernels (softmax, matmul, rmsnorm) - references/: autotune guide, kernel patterns, benchmarking guide - scripts/: benchmark and correctness test templates - examples/: fused softmax with test + benchmark harness Closes huggingface#302

github-actions · 2026-06-24T05:52:28Z

Hi @jaygala223, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

sayakpaul · 2026-06-24T05:57:32Z

Can you provide an example kernel that you wrote with this skill and tested? Feel free to apply to be able to publish kernel repo types on the Hub.

jaygala223 · 2026-06-24T06:23:45Z

Hi @sayakpaul I tested the skill by using it as context for an AI agent and asking it to write a LayerNorm kernel (which is not one of the reference kernels already in the skill). The agent followed the patterns from the SKILL.md: row-wise reduction, fp32 accumulation, masked loads, contiguity assert, num_warps heuristic.

Results on V100 (fp32, M=4096):

Correctness: All shapes pass against torch.nn.functional.layer_norm at atol=1e-3, including irregular shapes (1823x781), single element, and bf16.

Performance: ~1.45x faster than PyTorch LayerNorm across all tested N values:

   N    Triton    PyTorch
 256    497       365 GB/s
1024    727       548 GB/s
2048    771       534 GB/s

The generated kernel + test is here: examples/skill_test_layernorm.py (8d7da2e)

I also applied for kernel repo access on the Hub.

sayakpaul · 2026-06-24T06:29:13Z

What is your Hub username?

jaygala223 · 2026-06-24T06:33:17Z

jaygala223

sayakpaul · 2026-06-24T06:44:20Z

You have access to publish the kernels as proper kernel repos on the Hub. Could we please see the kernels generated with this skill on the Hub?

jaygala223 · 2026-06-24T08:20:50Z

Yes. Published the layernorm kernel generated using the skill as a kernel repo: https://huggingface.co/kernels/jaygala223/triton-layernorm

sayakpaul · 2026-06-24T14:05:05Z

Are the results in #675 (comment) for https://huggingface.co/kernels/jaygala223/triton-layernorm?

jaygala223 · 2026-06-24T14:57:45Z

Yes they are from the same kernel

…

On Wed, 24 Jun, 2026, 19:35 Sayak Paul, ***@***.***> wrote: *sayakpaul* left a comment (huggingface/kernels#675) <#675 (comment)> Are the results in #675 (comment) <#675 (comment)> for https://huggingface.co/kernels/jaygala223/triton-layernorm? — Reply to this email directly, view it on GitHub <#675?email_source=notifications&email_token=ANS4OMRPRM3JADW74F3SVPD5BPN2PA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZZGAYTCMZZGI42M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4790113929>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANS4OMTYIT2M3U2LQPLLIQL5BPN2PAVCNFSNUABFKJSXA33TNF2G64TZHM4DSNRRGI3DINBTHNEXG43VMU5TINZTGIYDONZXGQ42C5QC> . Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS <https://github.com/notifications/mobile/ios/ANS4OMQF2UCAN5HOVUDG4NT5BPN2PA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZZGAYTCMZZGI42M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJKTGN5XXIZLSL5UW64Y> and Android <https://github.com/notifications/mobile/android/ANS4OMSKXG2ZRRNERZM3ALT5BPN2PA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZZGAYTCMZZGI42M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLTGN5XXIZLSL5QW4ZDSN5UWI>. Download it today! You are receiving this because you were mentioned.Message ID: ***@***.***>

sayakpaul

We should also add skill to our docs.

jaygala223 · 2026-06-25T05:03:48Z

Hi @sayakpaul added triton-kernels to the supported skills list in docs/source/cli-skills.md.

sayakpaul

There should be a couple of changes skin to #614.

jaygala223 · 2026-06-26T15:49:40Z

Okay done!

Added a tip and example kernel.

HuggingFaceDocBuilderDev · 2026-06-26T22:12:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions Bot closed this Jun 24, 2026

jaygala223 mentioned this pull request Jun 24, 2026

[FEATURE] Triton Skill #302

Open

triton-kernels: add LayerNorm skill test (agent-generated kernel)

8d7da2e

sayakpaul reopened this Jun 24, 2026

sayakpaul reviewed Jun 24, 2026

View reviewed changes

docs: add triton kernels to supported skills list

7511a14

sayakpaul reviewed Jun 26, 2026

View reviewed changes

docs: add triton-kernels tip and example kernel to cli-skills

5b499cc

sayakpaul reviewed Jun 26, 2026

View reviewed changes

Comment thread docs/source/cli-skills.md

sayakpaul requested changes Jun 26, 2026

View reviewed changes

jaygala223 added 2 commits June 27, 2026 04:36

Merge branch 'upstream-main' into triton-kernels-skill

405bc12

triton-kernels: register skill in CLI, builder docs, and agents guide

7bfd50d

jaygala223 requested a review from sayakpaul June 27, 2026 04:40

Uh oh!

Conversation

jaygala223 commented Jun 24, 2026

Related issue

What does this PR do?

Motivation

Changes

Testing

Checklist

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

sayakpaul commented Jun 24, 2026

Uh oh!

jaygala223 commented Jun 24, 2026

Uh oh!

sayakpaul commented Jun 24, 2026

Uh oh!

jaygala223 commented Jun 24, 2026

Uh oh!

sayakpaul commented Jun 24, 2026

Uh oh!

jaygala223 commented Jun 24, 2026

Uh oh!

sayakpaul commented Jun 24, 2026

Uh oh!

jaygala223 commented Jun 24, 2026 via email

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

jaygala223 commented Jun 25, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

jaygala223 commented Jun 26, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants