Support for attachRequired in CSI driver#1826
Support for attachRequired in CSI driver#1826vmishra22 wants to merge 2 commits intokubernetes-sigs:masterfrom
Conversation
|
Hi @vmishra22. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vmishra22 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Is this a bug fix or adding new feature?
new feature
What is this PR about? / Why do we need it?
Pods mounting EFS PVCs can start before the EFS CSI node driver daemonset pod is running on the target node, causing mount failures at pod startup. This is a known Kubernetes race condition (kubernetes/kubernetes#95911) where the kubelet's volume manager attempts NodePublishVolume before the CSI driver socket is registered.
The existing workaround — manually tainting nodes with efs.csi.aws.com/agent-not-ready — requires users to modify their node bootstrap process and is opt-in. This PR implements a CSI-native alternative via attachRequired: true on the CSIDriver object, also as an opt-in feature, giving users a choice between the two approaches.
Implementation:
Default Behavior Unchanged
attachRequired still defaults to false in both the Helm chart (values.yaml) and the kustomize base manifest (
csidriver.yaml
). Existing installations are unaffected by this change. Users must explicitly opt in by setting controller.attachRequired: true.
What Enabling It Does
When controller.attachRequired: true is set, Kubernetes creates a VolumeAttachment object for each PV-node pair before allowing NodePublishVolume. The kubelet waits for the attachment to be confirmed before mounting, ensuring the CSI node driver is registered and ready.
Changes
CSIDriver template: attachRequired is now driven by .Values.controller.attachRequired instead of being hardcoded to false
csi-attacher sidecar: Added to the controller deployment, conditionally enabled only when controller.attachRequired: true. Uses registry.k8s.io/sig-storage/csi-attacher:v4.8.0
ControllerPublishVolume: Implemented as a validated no-op — validates required inputs and volume ID format, returns success immediately with no EFS API calls
ControllerUnpublishVolume: Implemented as a no-op with input validation
RBAC: volumeattachments and volumeattachments/status permissions added to the controller ClusterRole, conditionally applied when attachRequired is enabled
What testing is done?
Unit Tests
TestControllerPublishVolume (6 cases): success with access point ID, success with filesystem ID only, empty volume ID, empty node ID, nil volume capability, invalid volume ID format
TestControllerUnpublishVolume (2 cases): success with valid volume ID, empty volume ID
Full driver test suite: all tests pass (ok github.com/kubernetes-sigs/aws-efs-csi-driver/pkg/driver)
Manual Testing (live EKS cluster, 3-node, 3-AZ)
Verified CSIDriver.spec.attachRequired: true after Helm install
Verified csi-attacher sidecar running (4/4 containers in controller pods)
Verified VolumeAttachment objects created per node-PV pair, all marked ATTACHED: true
Verified pods mount successfully across all 3 AZs
Verified AttachVolume.Attach succeeded event in pod describe output, confirming the attach/detach controller is coordinating mounts
Risks and Considerations
These risks apply only when controller.attachRequired: true is explicitly enabled.
High Risk
Controller unavailability blocks all new mounts: With attachRequired: false (default), controller downtime only affects dynamic provisioning. With attachRequired: true, if both controller replicas are unavailable, no new pod can mount an EFS volume — even for pre-existing PVs. Users enabling this feature should ensure the controller has sufficient replicas and resource headroom
Immutable field migration: attachRequired is immutable on the CSIDriver object. Users switching from false to true (or vice versa) must manually delete the CSIDriver object before upgrading, as helm.sh/resource-policy: keep prevents Helm from deleting it automatically
Medium Risk
Increased mount latency: Every new mount requires a VolumeAttachment reconciliation cycle before NodePublishVolume, adding 1–5 seconds per mount
VolumeAttachment object churn: For ReadWriteMany EFS volumes on large clusters, one VolumeAttachment per node-PV pair increases API server load