You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `dryRunResults`: Impact analysis for dry-run rules
44
+
## Configuring the Rule
87
45
88
-
### Dry Run Mode
46
+
### 1. Select Target Nodes
47
+
Use the `nodeSelector` to target specific nodes (eg., GPU nodes).
89
48
90
-
Test rules safely before applying:
49
+
> **Note**: These labels could be configured at node registration (e.g., via Kubelet `--node-labels`). Relying on labels added asynchronously by addons (like Node Feature Discovery) can create a race condition where the node remains schedulable until the labels appear.
91
50
92
-
```yaml
93
-
spec:
94
-
dryRun: true # Enable dry run mode
95
-
conditions:
96
-
- type: "csi.example.net/NodePluginRegistered"
97
-
requiredStatus: "True"
98
-
# ... rest of spec
99
-
```
51
+
### 2. Define Readiness Conditions
52
+
The `conditions` list defines the criteria. The controller watches the Node's status for these conditions.
53
+
* `type`: The exact string matching the NodeCondition type.
54
+
* `requiredStatus`: The status required (`True`, `False`, or `Unknown`).
100
55
101
-
Check dry run results:
56
+
### 3. Choose an Enforcement Mode
57
+
The `enforcementMode` determines how the controller manages the taint lifecycle.
102
58
103
-
```sh
104
-
kubectl get nodereadinessrule <rule-name> -o jsonpath='{.status.dryRunResults}'
105
-
```
59
+
* **`bootstrap-only`**: Use this for one-time initialization tasks (e.g., installing a kernel module or driver). Once the conditions are met once, the taint is removed and never reapplied.
60
+
* **`continuous`**: Use this for ongoing health checks (e.g., network connectivity). If the condition fails at any time, the taint is reapplied.
106
61
107
-
### Rule Validation and Constraints
62
+
> For more details on these modes, see [Concepts](./concepts.md#enforcement-modes).
108
63
109
-
#### NoExecute Taint Effect Warning
64
+
### 4. Configure the Taint
65
+
Define the taint that will block scheduling.
66
+
* **Key**: Must start with `readiness.k8s.io/` prefix.
67
+
* **Effect**:
68
+
* `NoSchedule`: Prevents new pods from scheduling (Recommended).
69
+
* `PreferNoSchedule`: Tries to avoid scheduling.
70
+
* `NoExecute`: Evicts running pods if they don't tolerate the taint.
110
71
111
-
**`NoExecute` with `continuous` enforcement mode will evict existing workloads when conditions fail.**
72
+
> **Note**: To eliminate startup race conditions, register nodes with this taint (e.g., via Kubelet `--register-with-taints`). The controller will remove it once conditions are met.
112
73
113
-
If a readiness condition on the node is failing temporarily (eg., the component restarted), all pods without matching tolerations are immediately evicted from the node, if configured with a `NoExecute` taint. Use `NoSchedule` to prevent new scheduling without disrupting running workloads.
74
+
> **Caution**: When using `NoExecute` with `continuous` mode: if a condition
75
+
> fails momentarily, all workloads on the node (without tolerations) will be
76
+
> immediately evicted, which can cause service disruption.
114
77
115
-
The admission webhook warns when using `NoExecute`:
116
78
117
-
```sh
118
-
# NoExecute + continuous enforcement
119
-
$ kubectl apply -f rule.yaml
120
-
Warning: CAUTION: NoExecute with continuous mode evicts pods when conditions fail, risking workload disruption. Consider NoSchedule or bootstrap-only
121
-
nodereadinessrule.readiness.node.x-k8s.io/my-rule created
79
+
The admission webhook warns when using `NoExecute` taint:
122
80
123
-
# NoExecute + bootstrap-only enforcement
81
+
```bash
124
82
$ kubectl apply -f rule.yaml
125
83
Warning: NOTE: NoExecute will evict existing pods without tolerations. Ensure critical system pods have appropriate tolerations
126
84
nodereadinessrule.readiness.node.x-k8s.io/my-rule created
127
85
```
128
86
129
-
See [Kubernetes taints documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) for taint behavior details.
87
+
### Rule Validations
130
88
131
89
#### Avoiding Taint Key Conflicts
132
90
@@ -159,13 +117,11 @@ spec:
159
117
effect: "NoSchedule"
160
118
```
161
119
162
-
Use unique, descriptive taint keys for different readiness checks.
163
-
164
120
#### Taint Key Naming
121
+
Taint keys must have the `readiness.k8s.io/` prefix to clearly identify
122
+
readiness-related taints and avoid conflicts with other controllers.
123
+
Use unique, descriptive taint keys for different readiness checks. Follow [Kubernetes naming conventions](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/).
Check the `status` of the rule to follow the results:
154
+
155
+
```sh
156
+
kubectl get nodereadinessrule my-rule -o yaml
212
157
```
158
+
159
+
Look for `dryRunResults` in the output to see which nodes would be tainted.
160
+
161
+
## Reporting Node Conditions
162
+
163
+
The Node Readiness Controller only 'reacts' to observed conditions on the Node object. These conditions can be set by various tools:
164
+
165
+
1. **Node Problem Detector (NPD)**: You can configure NPD with [custom plugins](https://github.com/kubernetes/node-problem-detector/blob/master/docs/custom_plugin_monitor.md) to monitor system state and report conditions.
166
+
2. **Custom Health-Checkers or Sidecars**: You can run a daemonset or a small sidecar (eg., [Readiness Condition Reporter](../examples/security-agent-readiness.md#1-deploy-the-readiness-condition-reporter)) that checks your application or driver and updates the Node status.
167
+
3. **External Controllers**: Any tool that can patch Node status can trigger these rules.
168
+
169
+
For a full example of setting up a custom condition for a security agent, see the [Security Agent Readiness Example](../examples/security-agent-readiness.md).
0 commit comments