update cni readiness example to use daemonset for reporter (#116)

ajaysundark · web-flow · commit 243aa892103f · 2026-02-02T08:58:27.000+05:30
diff --git a/docs/TEST_README.md b/docs/TEST_README.md
@@ -88,7 +88,8 @@ hack/test-workloads/apply-calico.sh
 
 1.  **Check for the new node condition on the application node:**
     ```bash
-    # Look for 'network.k8s.io/CalicoReady   True'
+    kubectl get node $NODE -o json | jq '.status.conditions[] | select(.type=="projectcalico.org/CalicoReady")'
+# Look for 'projectcalico.org/CalicoReady   True'
     kubectl get node nrr-test-worker2 -o jsonpath='Conditions:{"\n"}{range .status.conditions[*]}{.type}{"\t"}{.status}{"\n"}{end}'
     ```
 
diff --git a/docs/book/src/examples/cni-readiness.md b/docs/book/src/examples/cni-readiness.md
@@ -9,7 +9,7 @@ This guide demonstrates how to use the Node Readiness Controller to prevent pods
 
 The high-level steps are:
 1.  Node is bootstrapped with a [startup taint](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) `readiness.k8s.io/NetworkReady=pending:NoSchedule` immediately upon joining.
-2.  A sidecar is patched to the cni-agent to monitor the CNI's health and report it to the API server as node-condition (`network.k8s.io/CalicoReady`). 
+2.  A reporter DaemonSet is deployed to monitor the CNI's health and report it to the API server as node-condition (`projectcalico.org/CalicoReady`). 
 3. Node Readiness Controller will untaint the node only when the CNI reports it is ready.
 
 ## Step-by-Step Guide
@@ -20,43 +20,42 @@ This example uses **Calico**, but the pattern applies to any CNI.
 
 ### 1. Deploy the Readiness Condition Reporter
 
-We need to bridge Calico's internal health status to a Kubernetes Node Condition. We will add a **sidecar container** to the Calico DaemonSet.
+We need to bridge Calico's internal health status to a Kubernetes Node Condition. We will deploy a **reporter DaemonSet** that runs on every node.
 
-This sidecar checks Calico's local health endpoint (`http://localhost:9099/readiness`) and updates a node condition `network.k8s.io/CalicoReady`.
+This reporter checks Calico's local health endpoint (`http://localhost:9099/readiness`) and updates a node condition `projectcalico.org/CalicoReady`.
 
-**Patch your Calico DaemonSet:**
+Using a separate DaemonSet instead of a sidecar ensures that readiness reporting works even if the CNI pod is crashlooping or failing to start containers.
+
+**Deploy the Reporter DaemonSet:**
 
 ```yaml
-# cni-patcher-sidecar.yaml
-- name: cni-status-patcher
-  image: registry.k8s.io/node-readiness-controller/node-readiness-reporter:v0.1.1
-  imagePullPolicy: IfNotPresent
-  env:
-    - name: NODE_NAME
-      valueFrom:
-        fieldRef:
-          fieldPath: spec.nodeName
-    - name: CHECK_ENDPOINT
-      value: "http://localhost:9099/readiness" # update to your CNI health endpoint
-    - name: CONDITION_TYPE
-      value: "network.k8s.io/CalicoReady"     # update this node condition
-    - name: CHECK_INTERVAL
-      value: "15s"
-  resources:
-    limits:
-      cpu: "10m"
-      memory: "32Mi"
-    requests:
-      cpu: "10m"
-      memory: "32Mi"
+# cni-reporter-ds.yaml
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: cni-reporter
+  namespace: kube-system
+spec:
+  # ...
+  template:
+    spec:
+      hostNetwork: true
+      serviceAccountName: cni-reporter
+      tolerations:
+      - operator: Exists
+      containers:
+      - name: cni-status-patcher
+        image: registry.k8s.io/node-readiness-controller/node-readiness-reporter:v0.1.1
+        env:
+          - name: CHECK_ENDPOINT
+            value: "http://localhost:9099/readiness"
+          - name: CONDITION_TYPE
+            value: "projectcalico.org/CalicoReady"
 ```
 
-  > Note: In this example, the CNI pod health is monitored by a side-car, so watcher's lifecycle is same as the pod lifecycle.
-  If the Calico pod is crashlooping, the sidecar will not run and cannot report readiness. For robust 'continuous' readiness reporting, the watcher should be 'external' to the pod.
-
 ### 2. Grant Permissions (RBAC)
 
-The sidecar needs permission to update the Node object's status.
+The reporter needs permission to update the Node object's status.
 
 ```yaml
 # calico-rbac-node-status-patch-role.yaml
@@ -78,15 +77,15 @@ roleRef:
   kind: ClusterRole
   name: node-status-patch-role
 subjects:
-# Bind to CNI's ServiceAccount
+# Bind to CNI Reporter's ServiceAccount
 - kind: ServiceAccount
-  name: calico-node
+  name: cni-reporter
   namespace: kube-system
 ```
 
 ### 3. Create the Node Readiness Rule
 
-Now define the rule that enforces the requirement. This tells the controller: *"Keep the `readiness.k8s.io/NetworkReady` taint on the node until `network.k8s.io/CalicoReady` is True."*
+Now define the rule that enforces the requirement. This tells the controller: *"Keep the `readiness.k8s.io/NetworkReady` taint on the node until `projectcalico.org/CalicoReady` is True."*
 
 ```yaml
 # network-readiness-rule.yaml
@@ -97,7 +96,7 @@ metadata:
 spec:
   # The condition(s) to monitor
   conditions:
-    - type: "network.k8s.io/CalicoReady"
+    - type: "projectcalico.org/CalicoReady"
       requiredStatus: "True"
   
   # The taint to manage
@@ -139,8 +138,8 @@ To test this, add a new node to the cluster.
     `readiness.k8s.io/NetworkReady=pending:NoSchedule`.
 
 2.  **Check Node Conditions**:
-    Watch the node conditions. You will initially see `network.k8s.io/CalicoReady` as `False` or missing.
-    Once Calico starts, the sidecar will update it to `True`.
+    Watch the node conditions. You will initially see `projectcalico.org/CalicoReady` as `False` or missing.
+    Once Calico starts, the reporter will update it to `True`.
 
 3.  **Check Taint Removal**:
-    As soon as the condition becomes `True`, the Node Readiness Controller will remove the taint, and workloads will be scheduled.
+    As soon as the condition becomes `True`, the Node Readiness Controller will remove the taint, and workloads will be scheduled.
diff --git a/examples/cni-readiness/README.md b/examples/cni-readiness/README.md
@@ -0,0 +1,12 @@
+# CNI Readiness Example (Calico)
+
+This example demonstrates how to use the Node Readiness Controller to ensure nodes are only marked ready for workloads after the CNI (Calico) has fully initialized.
+
+### How it works:
+1. Nodes join with a `readiness.k8s.io/NetworkReady=pending:NoSchedule` taint.
+2. A lightweight DaemonSet (`cni-reporter-ds.yaml`)
+   monitors Calico's health endpoint (`localhost:9099/readiness`) and updates a
+   node condition `projectcalico.org/CalicoReady`.
+3. The `NodeReadinessRule` (`network-readiness-rule.yaml`) instructs the controller to remove the startup taint once the `projectcalico.org/CalicoReady` condition becomes `True`.
+4. The reporter is deployed with `hostNetwork: true` to reach Calico's local health endpoint.
+5. The reporter needs a dedicated ServiceAccount (`cni-reporter`) with permissions to patch node status.
diff --git a/examples/cni-readiness/apply-calico.sh b/examples/cni-readiness/apply-calico.sh
@@ -18,41 +18,16 @@ set -e
 
 KUBECTL_ARGS="$@"
 
-YQ_VERSION="v4.48.1"
-YQ_PATH="/tmp/yq"
-
-# Check if yq is installed, if not download it.
-if [ ! -f "$YQ_PATH" ]; then
-    echo "yq not found at $YQ_PATH, downloading..."
-    OS=$(uname -s | tr '[:upper:]' '[:lower:]')
-    ARCH=$(uname -m)
-    case $ARCH in
-        x86_64)
-            ARCH="amd64"
-            ;;
-        aarch64|arm64)
-            ARCH="arm64"
-            ;;
-        *)
-            echo "Unsupported architecture: $ARCH"
-            exit 1
-            ;;
-    esac
-    YQ_BINARY="yq_${OS}_${ARCH}"
-    curl -sL "https://github.com/mikefarah/yq/releases/download/${YQ_VERSION}/${YQ_BINARY}" -o "$YQ_PATH"
-    chmod +x "$YQ_PATH"
-fi
-
 # Download the Calico manifest
 curl -sL https://raw.githubusercontent.com/projectcalico/calico/v3.30.1/manifests/calico.yaml -o calico.yaml
 
-# Add the cni-status-patcher sidecar
-"$YQ_PATH" e -i 'select(.kind == "DaemonSet" and .metadata.name == "calico-node").spec.template.spec.containers += [load("hack/test-workloads/cni-patcher-sidecar.yaml")]' calico.yaml
-
 # Apply the manifest twice. The first time, it will create the CRDs and ServiceAccounts.
 # The second time, it will create the rest of the resources, which should now be able to find the ServiceAccount.
 kubectl apply $KUBECTL_ARGS -f calico.yaml || true
 kubectl apply $KUBECTL_ARGS -f calico.yaml
 
+# Apply the CNI readiness reporter DaemonSet
+kubectl apply $KUBECTL_ARGS -f ./cni-reporter-ds.yaml
+
 # Apply the RBAC rules
-kubectl apply $KUBECTL_ARGS -f ./calico-rbac-node-status-patch-role.yaml
+kubectl apply $KUBECTL_ARGS -f ./calico-rbac-node-status-patch-role.yaml
diff --git a/examples/cni-readiness/calico-rbac-node-status-patch-role.yaml b/examples/cni-readiness/calico-rbac-node-status-patch-role.yaml
@@ -3,6 +3,9 @@ kind: ClusterRole
 metadata:
   name: node-status-patch-role
 rules:
+- apiGroups: [""]
+  resources: ["nodes"]
+  verbs: ["get"]
 - apiGroups: [""]
   resources: ["nodes/status"]
   verbs: ["patch", "update"]
@@ -16,6 +19,7 @@ roleRef:
   kind: ClusterRole
   name: node-status-patch-role
 subjects:
+# Bind to CNI Reporter's ServiceAccount
 - kind: ServiceAccount
-  name: calico-node
+  name: cni-reporter
   namespace: kube-system
diff --git a/examples/cni-readiness/cni-patcher-sidecar.yaml b/examples/cni-readiness/cni-patcher-sidecar.yaml
diff --git a/examples/cni-readiness/cni-reporter-ds.yaml b/examples/cni-readiness/cni-reporter-ds.yaml
@@ -0,0 +1,48 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: cni-reporter
+  namespace: kube-system
+---
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: cni-reporter
+  namespace: kube-system
+  labels:
+    app: cni-reporter
+spec:
+  selector:
+    matchLabels:
+      app: cni-reporter
+  template:
+    metadata:
+      labels:
+        app: cni-reporter
+    spec:
+      hostNetwork: true
+      serviceAccountName: cni-reporter
+      tolerations:
+      - operator: Exists
+      containers:
+      - name: cni-status-patcher
+        image: registry.k8s.io/node-readiness-controller/node-readiness-reporter:v0.1.1
+        imagePullPolicy: IfNotPresent
+        env:
+          - name: NODE_NAME
+            valueFrom:
+              fieldRef:
+                fieldPath: spec.nodeName
+          - name: CHECK_ENDPOINT
+            value: "http://localhost:9099/readiness"
+          - name: CONDITION_TYPE
+            value: "projectcalico.org/CalicoReady"
+          - name: CHECK_INTERVAL
+            value: "5s"
+        resources:
+          limits:
+            cpu: "10m"
+            memory: "32Mi"
+          requests:
+            cpu: "10m"
+            memory: "32Mi"
diff --git a/examples/cni-readiness/network-readiness-dryrun-rule.yaml b/examples/cni-readiness/network-readiness-dryrun-rule.yaml
@@ -5,7 +5,7 @@ metadata:
 spec:
   dryRun: true
   conditions:
-    - type: "network.k8s.io/CalicoReady"
+    - type: "projectcalico.org/CalicoReady"
       requiredStatus: "True"
   taint:
     key: "readiness.k8s.io/NetworkReady"
diff --git a/examples/cni-readiness/network-readiness-rule.yaml b/examples/cni-readiness/network-readiness-rule.yaml
@@ -4,7 +4,7 @@ metadata:
   name: network-readiness-rule
 spec:
   conditions:
-    - type: "network.k8s.io/CalicoReady"
+    - type: "projectcalico.org/CalicoReady"
       requiredStatus: "True"
   taint:
     key: "readiness.k8s.io/NetworkReady"