Description
We encountered a scenario where a managed VPC was updated outside of CAPA to add IPv6 CIDR blocks (to enable communication with an IPv6-configured AWS service). After the change, CAPA's route table reconciliation fails with:
failed to discover routes on route table subnet-xxxxx: ipv6 block missing for ipv6 enabled subnet, can't create route for egress only internet gateway
We've traced through the code and understand what is happening, but wanted to understand if there is context behind it that we are not seeing.
How the broken state occurs
- A cluster is created with an IPv4-only managed VPC —
spec.network.vpc.ipv6 is nil
- IPv6 CIDR blocks are added to the VPC and subnets outside of CAPA directly in AWS.
- During subnet discovery (
subnets.go:406-410), CAPA unconditionally reads the IPv6 CIDR associations from AWS and sets IsIPv6 = true on the subnet specs:
for _, set := range ec2sn.Ipv6CidrBlockAssociationSet {
if set.Ipv6CidrBlockState.State == types.SubnetCidrBlockStateCodeAssociated {
spec.IPv6CidrBlock = aws.ToString(set.Ipv6CidrBlock)
spec.IsIPv6 = true
}
}
- During VPC discovery (
vpc.go:60-62), the IPv6 info is NOT adopted because of the guard added in #3914:
if s.scope.VPC().IsIPv6Enabled() {
s.scope.VPC().IPv6 = vpc.IPv6
}
- Route table reconciliation sees
sn.IsIPv6 == true on private subnets, checks !s.scope.VPC().IsIPv6Enabled(), and returns the error (routetables.go:418-422)
The subnet discovery and VPC discovery have different guarding behavior around IPv6, which creates an inconsistent state that the route table reconciler can't handle.
What we'd like to understand
The webhook validation in awsmanagedcontrolplane_webhook.go:188-191 prevents any change to IsIPv6Enabled():
if oldAWSManagedControlplane.Spec.NetworkSpec.VPC.IsIPv6Enabled() != r.Spec.NetworkSpec.VPC.IsIPv6Enabled() {
// "changing IP family is not allowed after it has been set"
}
We understand this was introduced in #3513 and the vpc.go guard was added in #3914 to fix #3912, where CAPA was auto-discovering IPv6 from AWS and then the webhook would reject the update, bricking clusters.
Our questions:
-
Was the immutability intended to be bidirectional? The vpc.go guard from #3914 already prevents the auto-discovery problem that motivated it. Is there a known reason the webhook also needs to block the nil → non-nil direction (intentionally enabling IPv6), or was this a conservative default that hasn't been revisited?
-
Is it safe to work around this by temporarily disabling the webhook and adding ipv6: {} to the VPC spec? From tracing through the reconciliation, it looks like:
reconcileVPC() would adopt the existing IPv6 CIDR from AWS
reconcileEgressOnlyInternetGateways() would create an EIGW
reconcileRouteTables() would succeed with the ::/0 → EIGW route
- No attempt would be made to re-associate IPv6 CIDRs (that only happens during VPC creation)
Are there any side effects we're not seeing?
-
Should the subnet discovery also be guarded? The asymmetry between subnet discovery (unconditionally sets IsIPv6 = true) and VPC discovery (guarded by IsIPv6Enabled()) seems like it could cause issues beyond route tables. Would it make sense to either:
- Guard the subnet discovery the same way, or
- Remove the VPC discovery guard and instead allow the reconciler to adopt IPv6 when it's present in AWS?
Environment
- CAPA version: v2.x
- Cluster type: AWSManagedControlPlane (EKS)
- VPC: Managed by CAPA, IPv6 added after initial creation
Description
We encountered a scenario where a managed VPC was updated outside of CAPA to add IPv6 CIDR blocks (to enable communication with an IPv6-configured AWS service). After the change, CAPA's route table reconciliation fails with:
We've traced through the code and understand what is happening, but wanted to understand if there is context behind it that we are not seeing.
How the broken state occurs
spec.network.vpc.ipv6is nilsubnets.go:406-410), CAPA unconditionally reads the IPv6 CIDR associations from AWS and setsIsIPv6 = trueon the subnet specs:vpc.go:60-62), the IPv6 info is NOT adopted because of the guard added in #3914:sn.IsIPv6 == trueon private subnets, checks!s.scope.VPC().IsIPv6Enabled(), and returns the error (routetables.go:418-422)The subnet discovery and VPC discovery have different guarding behavior around IPv6, which creates an inconsistent state that the route table reconciler can't handle.
What we'd like to understand
The webhook validation in
awsmanagedcontrolplane_webhook.go:188-191prevents any change toIsIPv6Enabled():We understand this was introduced in #3513 and the vpc.go guard was added in #3914 to fix #3912, where CAPA was auto-discovering IPv6 from AWS and then the webhook would reject the update, bricking clusters.
Our questions:
Was the immutability intended to be bidirectional? The vpc.go guard from #3914 already prevents the auto-discovery problem that motivated it. Is there a known reason the webhook also needs to block the
nil → non-nildirection (intentionally enabling IPv6), or was this a conservative default that hasn't been revisited?Is it safe to work around this by temporarily disabling the webhook and adding
ipv6: {}to the VPC spec? From tracing through the reconciliation, it looks like:reconcileVPC()would adopt the existing IPv6 CIDR from AWSreconcileEgressOnlyInternetGateways()would create an EIGWreconcileRouteTables()would succeed with the::/0→ EIGW routeAre there any side effects we're not seeing?
Should the subnet discovery also be guarded? The asymmetry between subnet discovery (unconditionally sets
IsIPv6 = true) and VPC discovery (guarded byIsIPv6Enabled()) seems like it could cause issues beyond route tables. Would it make sense to either:Environment