Skip to content

WIP: NRT: cache: fix assumedResource drift#958

Draft
ffromani wants to merge 4 commits intokubernetes-sigs:masterfrom
ffromani:assumed-res-unbounded
Draft

WIP: NRT: cache: fix assumedResource drift#958
ffromani wants to merge 4 commits intokubernetes-sigs:masterfrom
ffromani:assumed-res-unbounded

Conversation

@ffromani
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug
/kind cleanup

What this PR does / why we need it:

Prevent entries into cache's assumedResources to be added indefinitely, and remove interaction between Resync and Reserve that also may cause assumedResources to be initialized and grow, plus it may drift and don't be resynced correctly

Which issue(s) this PR fixes:

Fixes #N/A

Special notes for your reviewer:

TBD

Does this PR introduce a user-facing change?

Fix noderesourcetopology cache drift caused by TOCTOU race with its resync loop

Previously, when a node was marked dirty, a successful Reserve()
would have marked it clean. The reasoning was that if a node gets
selected - thus Reserve() called upon it, then obviously the
overallocation was wrong and the node could have run more workload.

The problem with this approach is that it compounds badly
with Resync(). Resync() is a complex long operation which needs to
do many steps and call multiple APIs, usually through client-go cache
but still. For that reason, Resync() only holds the OverReserve
cache lock for intervals, not continuously - that would be both dangerous
AND critically slow.

It's possible that Reserve() happens in between a Resync() cycle,
because Resync() is, and must be, asynchronous wrt the scheduling
cycles. So, Reserve() can clear the dirty node flag and corrupt the
internal state. If that happens, unflushed nodes become effectively
invisible to future resync cycles.
The assumedResources become stuck (never flushed) on the affected nodes.

The safest fix is remove the dirty node clearing shortcut in Reserve().
This would cause a node to stay dirty longer, but makes the scheduler
plugin operating more safely and predictably.

With this change, a node may stay dirty for one extra resync cycle,
but this ensures dirty flag and stale data are always resolved together.

Signed-off-by: Francesco Romani <fromani@redhat.com>
ReserveNodeResources() was unconditionally tracking assumedResources
regardless if a node actually had NRT data or not.
This caused `assumedResources` to grow unbounded in these cases.

The reason why the data is never cleared is that all the paths that
mark a node for resync (NodeMaybeOverReserved, NodeHasForeignPods,
nodesWithAttrUpdate) require the NRT to already be in the store,
otherwise the assumed resources are never flushed.

Nodes however are not expected to lack cached NRT data. But this can
happen in practice, and the filter code actually already accounts for
this case and always allows the nodes to be used.

Furthermore, if NRT objects are *created* after the scheduler starts,
which is not recommended but can happen, we have a time window on
which nodes don't have corresponding NRT data.

The fix is simple: exit early in ReserveNodeResources() if we process
a node which doesn't have NRT data.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
add thread safe test helpers. Don't expose, or allow access,
to internal data unguarded. Even though we're *probably* safe in
the controlled test environment, not worth the intrinsic risk.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 17, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 17, 2026

Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name Link
🔨 Latest commit 7474d68
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-scheduler-plugins/deploys/69e24c3f2a3e2100083aba0f

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants