Add hwmon infrastructure#1169
Open
hurryman2212 wants to merge 1 commit into
Open
Conversation
Integrate the hwmon implementation into the in-tree nvidia.ko build. Add per-GPU hwmon lifetime management, RM/RUSD helpers, fan control, thermal sensor, temperature limit, power, and sysfs registration support. Hook initialization into module load/unload and GPU probe/remove so hwmon devices are created and torn down with the NVIDIA device lifetime.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an in-kernel hwmon subsystem to the open NVIDIA driver to expose temperature, power, fan, and PWM telemetry/control for NVIDIA GPUs via standard /sys/class/hwmon interfaces. The hwmon code is built into nvidia.ko and bound to RM via the modeset interface ops.
Changes:
- New
hwmon-*sources/headers implementing the hwmon driver (RM glue, RUSD shared-memory reader, thermal sensors, temp limits, fan control, device registration). - Build integration in
nvidia-sources.Kbuild/nvidia.Kbuildto compile the new sources, apply per-object CFLAGS, link them intonvidia.ko, and include them innv-interface.o. - Hook hwmon init/exit and per-GPU add/remove from
nv.candnv-modeset-interface.c.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| kernel-open/nvidia/nvidia.Kbuild | Add HWMON_OBJECTS to nvidia-y, per-obj CFLAGS, and conftest deps |
| kernel-open/nvidia/nvidia-sources.Kbuild | Declare HWMON_SOURCES and list new hwmon source files |
| kernel-open/nvidia/nv.c | Call hwmon_driver_init/exit from module init/exit |
| kernel-open/nvidia/nv-modeset-interface.c | Call hwmon GPU add/remove during modeset probe/remove |
| kernel-open/nvidia/hwmon-entry.h | Public entry points exported to nvidia.ko |
| kernel-open/nvidia/hwmon-nvidia.h | Centralized include of NVIDIA private RMAPI/class headers |
| kernel-open/nvidia/hwmon-main.[ch] | Per-GPU object lifecycle and driver init/shutdown |
| kernel-open/nvidia/hwmon-rm.[ch] | Binding to RM ops and RM alloc/control/map wrappers |
| kernel-open/nvidia/hwmon-rusd.[ch] | RUSD shared-memory telemetry reader with caching |
| kernel-open/nvidia/hwmon-thermal.[ch] | Thermal sensor probing/reads via RM control |
| kernel-open/nvidia/hwmon-temp-limit.[ch] | Thermal threshold and power-policy max-temp limits |
| kernel-open/nvidia/hwmon-fan.[ch] | Fan info/status/control via RM, PWM enable/mode |
| kernel-open/nvidia/hwmon-device.[ch] | Linux hwmon core registration and ops dispatch |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| struct hwmon_gpu *gpu; | ||
| int ret; | ||
|
|
||
| gpu = kzalloc_obj(*gpu, GFP_KERNEL); |
Comment on lines
+142
to
+157
| void hwmon_rm_close_gpu(struct hwmon_gpu *gpu) | ||
| { | ||
| nvidia_modeset_stack_ptr stack = NULL; | ||
|
|
||
| if (!gpu->opened) | ||
| return; | ||
|
|
||
| if (rm_ops.alloc_stack(&stack) == 0) { | ||
| rm_ops.close_gpu(gpu->gpu_id, stack, NV_FALSE); | ||
| rm_ops.free_stack(stack); | ||
| gpu->opened = false; | ||
| } else { | ||
| pr_err("%s: failed to allocate NVIDIA stack while closing GPU 0x%x\n", | ||
| NVHWMON_DRIVER_NAME, gpu->gpu_id); | ||
| } | ||
| } |
Comment on lines
+998
to
+1004
| rc = hwmon_driver_init(); | ||
| if (rc < 0) { | ||
| nv_printf(NV_DBG_WARNINGS, | ||
| "NVRM: hwmon initialization failed: %d\n", rc); | ||
| } else { | ||
| hwmon_initialized = true; | ||
| } |
Comment on lines
+68
to
+70
| static int read_field(const void *src, void *dst, size_t size) | ||
| { | ||
| const struct field_header *header = src; |
Comment on lines
107
to
119
| void nvidia_modeset_probe(const nv_linux_state_t *nvl) | ||
| { | ||
| nv_gpu_info_t gpu_info; | ||
|
|
||
| nvidia_modeset_get_gpu_info(&gpu_info, nvl); | ||
|
|
||
| if (nv_modeset_callbacks && nv_modeset_callbacks->probe) | ||
| { | ||
| nv_gpu_info_t gpu_info; | ||
|
|
||
| nvidia_modeset_get_gpu_info(&gpu_info, nvl); | ||
| nv_modeset_callbacks->probe(&gpu_info); | ||
| } | ||
|
|
||
| hwmon_driver_gpu_add(&gpu_info); | ||
| } |
| hwmon_rm_unbind(); | ||
| } | ||
|
|
||
| int __init hwmon_driver_init(void) |
| return ret; | ||
| } | ||
|
|
||
| void __exit hwmon_driver_exit(void) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integrate the hwmon implementation into the in-tree nvidia.ko build.
Add per-GPU hwmon lifetime management, RM/RUSD helpers, fan control, thermal sensor, temperature limit, power, and sysfs registration support. Hook initialization into module load/unload and GPU probe/remove so hwmon devices are created and torn down with the NVIDIA device lifetime.