WIP: NRT: cache: fix assumedResource drift#958
Draft
ffromani wants to merge 4 commits intokubernetes-sigs:masterfrom
Draft
WIP: NRT: cache: fix assumedResource drift#958ffromani wants to merge 4 commits intokubernetes-sigs:masterfrom
ffromani wants to merge 4 commits intokubernetes-sigs:masterfrom
Conversation
Previously, when a node was marked dirty, a successful Reserve() would have marked it clean. The reasoning was that if a node gets selected - thus Reserve() called upon it, then obviously the overallocation was wrong and the node could have run more workload. The problem with this approach is that it compounds badly with Resync(). Resync() is a complex long operation which needs to do many steps and call multiple APIs, usually through client-go cache but still. For that reason, Resync() only holds the OverReserve cache lock for intervals, not continuously - that would be both dangerous AND critically slow. It's possible that Reserve() happens in between a Resync() cycle, because Resync() is, and must be, asynchronous wrt the scheduling cycles. So, Reserve() can clear the dirty node flag and corrupt the internal state. If that happens, unflushed nodes become effectively invisible to future resync cycles. The assumedResources become stuck (never flushed) on the affected nodes. The safest fix is remove the dirty node clearing shortcut in Reserve(). This would cause a node to stay dirty longer, but makes the scheduler plugin operating more safely and predictably. With this change, a node may stay dirty for one extra resync cycle, but this ensures dirty flag and stale data are always resolved together. Signed-off-by: Francesco Romani <fromani@redhat.com>
ReserveNodeResources() was unconditionally tracking assumedResources regardless if a node actually had NRT data or not. This caused `assumedResources` to grow unbounded in these cases. The reason why the data is never cleared is that all the paths that mark a node for resync (NodeMaybeOverReserved, NodeHasForeignPods, nodesWithAttrUpdate) require the NRT to already be in the store, otherwise the assumed resources are never flushed. Nodes however are not expected to lack cached NRT data. But this can happen in practice, and the filter code actually already accounts for this case and always allows the nodes to be used. Furthermore, if NRT objects are *created* after the scheduler starts, which is not recommended but can happen, we have a time window on which nodes don't have corresponding NRT data. The fix is simple: exit early in ReserveNodeResources() if we process a node which doesn't have NRT data. Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
add thread safe test helpers. Don't expose, or allow access, to internal data unguarded. Even though we're *probably* safe in the controlled test environment, not worth the intrinsic risk. Signed-off-by: Francesco Romani <fromani@redhat.com>
Contributor
|
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
Contributor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
Prevent entries into cache's
assumedResourcesto be added indefinitely, and remove interaction betweenResyncandReservethat also may causeassumedResourcesto be initialized and grow, plus it may drift and don't be resynced correctlyWhich issue(s) this PR fixes:
Fixes #N/A
Special notes for your reviewer:
TBD
Does this PR introduce a user-facing change?