Skip to content

systemvm: ipv6 fw_input — accept return traffic from established,rela…#13173

Open
agronaught wants to merge 1 commit into
apache:4.22from
agronaught:fix-13171-v6-fw-input-established-related
Open

systemvm: ipv6 fw_input — accept return traffic from established,rela…#13173
agronaught wants to merge 1 commit into
apache:4.22from
agronaught:fix-13171-v6-fw-input-established-related

Conversation

@agronaught
Copy link
Copy Markdown

@agronaught agronaught commented May 17, 2026

This PR adds the IPv6 equivalent of fw_router_routing() to the systemvm Virtual Router's network configuration, so that return traffic for VR-initiated IPv6 connections (BGP to upstream PE peers, NTP, DNS lookups, etc.) is allowed back through the ip6_firewall fw_input chain.

Problem

The systemvm VR's nftables ip6 ip6_firewall fw_input chain is created with policy=drop and only ICMPv6 accept rules. The IPv4 INPUT chain has the equivalent iifname "eth2" ct state established,related accept rule (added by fw_router_routing() in CsAddress.py); the IPv6 path has no such rule.

Effect: any v6 connection the VR itself initiates outbound has its return traffic silently dropped at the v6 INPUT hook before TCP processes it. For Isolated IPv6 ROUTED networks this is fatal — BGP IPv6 sessions cannot reach Established, tenant /64 prefixes are never advertised upstream, and VMs in the network are unreachable from the IPv6 internet.

#10970 added the equivalent rule to the FORWARD chain (covering tenant VM return traffic) but explicitly removed it from the INPUT chain in its second commit. This PR completes that fix for VR-originated traffic.

Behavioural change

Before this PR, IPv6 BGP sessions from VRs in IsolatedV6RoutedFiltered (and similar Routed v6) network offerings stay in Connect state indefinitely. After this PR, sessions reach Established within seconds of VR start and prefix advertisements work normally.

The change is additive and behind the existing is_routed() / is_vpc() gating — only routed, non-VPC networks see new INPUT rules. No change for existing v4 paths, v4 NATted networks, or VPC networks.

Fixes: #13171

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Justifying Major: any operator wanting to ship the IsolatedV6RoutedFiltered offering (or any v6 Routed isolated network with Firewall service) for production tenant workloads is blocked. Workaround requires per-VR nft injection that wipes on every tenant FW rule change, making the offering unusable as a customer product without a downstream patch like this one.

Screenshots (if appropriate)

N/A — kernel-level firewall change.

How Has This Been Tested?

Verified end-to-end on Apache CloudStack 4.22.0.0, KVM hypervisor (Ubuntu 24.04 hosts), with:

  • Zone configured for BGP Routed networks (ASN range, BGP peers, IPv6 guest prefix /48)
  • Tenant network using IsolatedV6RoutedFiltered offering
  • Two independent fresh VRs in two different tenant networks

Before the patch:

vtysh -c "show bgp ipv6 unicast summary"
Neighbor State/PfxRcd
2400:88e0:ffff:258::2 Connect 0
2400:88e0:ffff:258::3 Connect 0

Hypervisor-side packet capture on the underlay confirms PE responds with SYN-ACK, but the VR's TCP stack never delivers it to FRR. Kernel TCPMD5* counters stay at zero — drop happens at netfilter before TCP processes the segment. Inside the VR:

$ nft list table ip6 ip6_firewall
table ip6 ip6_firewall {
chain fw_input {
type filter hook input priority filter; policy drop;
icmpv6 type { ... } accept
}
...
}

No ct state established,related accept rule.

After the patch:

vtysh -c "show bgp ipv6 unicast summary"
Neighbor State/PfxRcd
2400:88e0:ffff:258::2 Established 1
2400:88e0:ffff:258::3 Established 1

fw_input now includes the new rule with active counters:

iifname "eth2" ct state established,related counter packets ... bytes ... accept

Verified end-to-end: SSH from public IPv6 internet to a VM inside the v6-routed network succeeds. Reachability survives subsequent tenant firewall rule updates (the rule is rebuilt from nft_ipv6_fw on every IpTablesExecutor.process() cycle).

How did you try to break this feature and the system with this change?

  • Tenant firewall rule churn: added/removed tenant ingress rules via cmk createIpv6FirewallRule / deleteIpv6FirewallRule repeatedly after the patch. IpTablesExecutor.process() flushes and rebuilds the v6 table each time; the new INPUT rule is re-emitted on every cycle because it's now in nft_ipv6_fw. Counters resume; BGP stays Established.
  • VR reboot: rebooted the VR (cmk rebootRouter). After the reboot pulls fresh cloud-scripts.tgz, the patched CsAddress.py runs in the rebuilt VR and the rule is in place from boot. BGP establishes within ~30s of VR ready.
  • Non-routed networks: confirmed is_routed() gating means standard Isolated v4 networks and VPC networks see no new rules in either chain — no behaviour change for them.
  • Cross-account / cross-domain: verified the rule fires per-VR (each tenant network's VR gets its own rule with its own eth2 reference and per-VR counter), with no cross-tenant traffic leakage.

Tested with both single-tenant and multi-tenant network deployments. Validated the substrate change on ACS 4.22.0.0; same code path exists in 4.20 branch HEAD per inspection.

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 17, 2026

Congratulations on your first Pull Request and welcome to the Apache CloudStack community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md)
Here are some useful points:

@agronaught agronaught force-pushed the fix-13171-v6-fw-input-established-related branch from 0a11f6e to 992fbf5 Compare May 17, 2026 23:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@weizhouapache weizhouapache added this to the 4.23.0 milestone May 18, 2026
@weizhouapache
Copy link
Copy Markdown
Member

@agronaught
thanks for creating the PR
I will verify it

@agronaught
Copy link
Copy Markdown
Author

any chance of also getting this backported into the next 4.22 release ?
another pull request ?

@weizhouapache
Copy link
Copy Markdown
Member

any chance of also getting this backported into the next 4.22 release ? another pull request ?

@agronaught
you can rebase your branch with 4.22 branch.
all 4.22 commits will be merged into main branch

@agronaught
Copy link
Copy Markdown
Author

agronaught commented May 18, 2026 via email

…ted connections

The systemvm Virtual Router's nftables `ip6 ip6_firewall fw_input`
chain is created with policy=drop and only ICMPv6 accept rules.
The IPv4 INPUT chain has the equivalent `iifname "eth2" ct state
established,related accept` rule (added by `fw_router_routing()`);
the IPv6 path has no such rule.

Effect: any v6 connection the VR itself initiates outbound (BGP
to upstream PE peers, NTP, DNS lookups, etc.) has its return
traffic silently dropped at the v6 INPUT hook before TCP processes
it. For Isolated v6 ROUTED networks this is fatal — BGP IPv6
sessions cannot establish, tenant /64 prefixes are never
advertised upstream, and VMs in the network are unreachable from
the IPv6 internet.

PR apache#10970 added the equivalent rule to the FORWARD chain only
(covering tenant VM return traffic). This commit adds the matching
rule to the INPUT chain (covering VR-originated return traffic) by
introducing `fw_router_routing_v6()` as the IPv6 mirror of
`fw_router_routing()`.

Verified end-to-end on ACS 4.22.0.0 KVM: before the patch, v6 BGP
sessions stay in `Connect` indefinitely; tcpdump confirms PE
responds with SYN-ACK but VR's TCP stack never sees the SYN-ACK
(MD5 counters zero — drop happens at netfilter). After the patch,
v6 BGP sessions reach `Established` within seconds and remain
stable across subsequent tenant firewall rule updates.

Fixes: apache#13171
Signed-off-by: Jason Ball <jball@resetdata.com>
@agronaught agronaught force-pushed the fix-13171-v6-fw-input-established-related branch from 992fbf5 to 8dcc070 Compare May 18, 2026 10:23
@agronaught agronaught changed the base branch from main to 4.22 May 18, 2026 10:28
@agronaught
Copy link
Copy Markdown
Author

rebased to 4.22
thank you.

@weizhouapache
Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 18.08%. Comparing base (a289bb0) to head (8dcc070).

Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #13173      +/-   ##
============================================
+ Coverage     17.67%   18.08%   +0.40%     
- Complexity    15792    16717     +925     
============================================
  Files          5922     6037     +115     
  Lines        533123   542584    +9461     
  Branches      65201    66427    +1226     
============================================
+ Hits          94246    98133    +3887     
- Misses       428236   433423    +5187     
- Partials      10641    11028     +387     
Flag Coverage Δ
uitests 3.51% <ø> (-0.18%) ⬇️
unittests 19.25% <ø> (+0.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✖️ debian ✔️ suse15. SL-JID 17892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPv6 ROUTED Filtered networks: VR's ip6_firewall fw_input chain missing 'ct state established,related accept' rule — IPv6 BGP cannot establish

4 participants