feat: CSI volume health monitoring and failover by abeowlu · Pull Request #1816 · kubernetes-sigs/aws-efs-csi-driver

abeowlu · 2026-03-11T18:57:07Z

Is this a bug fix or adding new feature? Feature enhancement

What is this PR about? / Why do we need it?

This PR should close #1675 #1676 satisfactorily, as well as related request for CSI volume health monitoring, high availability and self-heal

The CSI should at the minimum be able to monitor health issues on EFS volume mounts and surface volumeCondition status in volume information to kubelet and other operator
At best, a self recovery of dead volume mount, caused by EFS service AZ failure, network failure, etc. should be possible, and failing that, trigger pod failover (possibly by some separately concerned operator; this behaviour would be out of scope for CSI)

In this PR the minimum use-case, monitor volume mounts an report health and condition along with stat info when they are probed by kubelet, is implemented.
As a stab at the best case scenario, an asyn attempt at volume remount, and failover if EFS server issue or hang is encountered is also implemented.

What testing is done?
unit testing for health check and async recovery attempt flow

k8s-ci-robot · 2026-03-11T18:57:17Z

Hi @abeowlu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2026-03-11T18:57:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abeowlu
Once this PR has been reviewed and has the lgtm label, please assign davidxu12345 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-04-08T13:12:45Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

feat: vol_health_monitoring

b898893

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 11, 2026

k8s-ci-robot requested review from samuhale and wongma7 March 11, 2026 18:57

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2026

abeowlu added 2 commits March 16, 2026 15:46

feat: vol_health_updte

b1843f3

feat: vol_health_updte

1de30b3

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2026

abeowlu mentioned this pull request Apr 10, 2026

Volume condition reporting on EBS CSI volumes enhancement kubernetes-sigs/aws-ebs-csi-driver#2917

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CSI volume health monitoring and failover#1816

feat: CSI volume health monitoring and failover#1816
abeowlu wants to merge 3 commits intokubernetes-sigs:masterfrom
abeowlu:enhance_health_mon

abeowlu commented Mar 11, 2026

Uh oh!

k8s-ci-robot commented Mar 11, 2026

Uh oh!

k8s-ci-robot commented Mar 11, 2026

Uh oh!

k8s-ci-robot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abeowlu commented Mar 11, 2026

Uh oh!

k8s-ci-robot commented Mar 11, 2026

Uh oh!

k8s-ci-robot commented Mar 11, 2026

Uh oh!

k8s-ci-robot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants