Skip to content

IPv6 ROUTED Filtered networks: VR's ip6_firewall fw_input chain missing 'ct state established,related accept' rule — IPv6 BGP cannot establish #13171

@jball-resetdata

Description

@jball-resetdata

IPv6 BGP-routed Isolated network: missing ct state established,related INPUT rule on VR's IPv6 firewall

Summary

When creating a tenant network using an IPv6-only ROUTED + Filtered offering (internetprotocol=ipv6, networkmode=ROUTED, services including Firewall), the Virtual Router's nftables ip6 ip6_firewall fw_input chain has policy drop and only ICMPv6 accept rules. There is no ct state established,related accept rule on the public NIC.

Because the VR initiates BGP outbound to upstream PE peers, the return SYN-ACK is silently dropped at the v6 INPUT hook, before TCP's MD5 verification ever runs. BGP IPv6 sessions cannot reach Established.

The equivalent IPv4 INPUT chain on the same VR DOES have iifname "eth2" ct state related,established counter accept, and IPv4 BGP works correctly.

Environment

  • Apache CloudStack 4.22.0.0 (live install on staging mgmt host)
  • Source analysis cross-checked against 4.20 branch HEAD a7c2a05 — same bug visible in source on both branches
  • Hypervisor: KVM on Ubuntu 24.04
  • Hosts: 2-node staging cluster
  • VR systemvm template: ACS 4.20 stock
  • FRR on VR: 8.4.4
  • Network offering: IsolatedV6RoutedFiltered (internetprotocol=ipv6, routingmode=Dynamic, networkmode=ROUTED, services [UserData, Firewall, Dhcp, Dns], egressdefaultpolicy=true)
  • BGP peer ASN: 999999 (external)
  • ACS ASN range: 4200000001-4200000099 (32-bit private)
  • IPv6 guest prefix: /48
  • Reproduced on two independent VRs (r-276-VM ASN 4200000052, r-278-VM ASN 4200000081) — identical symptom, identical fix.

Steps to reproduce

  1. Configure zone with IPv6 BGP routing: ASN range, BGP peers (dual-stack), IPv6 guest prefix /48.
  2. Create a network offering matching the above shape, then enable it.
  3. createNetwork using the offering.
  4. Deploy a VM into the network — VR is provisioned.
  5. SSH into the VR via its link-local IP (port 3922, systemvm key from /root/.ssh/id_rsa.cloud).
  6. Check BGP state.

Expected

$ vtysh -c "show bgp ipv6 unicast summary"
Neighbor                         State/PfxRcd
2400:88e0:ffff:258::2  Established     1
2400:88e0:ffff:258::3  Established     1

VR advertises tenant /64 upstream; VMs in the network are reachable from the IPv6 internet.

Actual

$ vtysh -c "show bgp ipv6 unicast summary"
Neighbor                         State/PfxRcd
2400:88e0:ffff:258::2  Connect          0
2400:88e0:ffff:258::3  Connect          0

The IPv4 sessions on the SAME VR work normally:

10.25.12.2  Established  PfxRcd=1
10.25.12.3  Established  PfxRcd=1

Diagnostic

Packet capture on the hypervisor's underlay (bond0, VLAN 258):

VR → PE: TCP SYN (port 179) with MD5
PE → VR: TCP SYN-ACK with MD5
VR → PE: TCP SYN retransmit (VR never sent ACK)
PE → VR: TCP SYN-ACK retransmit
... cycle repeats until VR's connect timeout ...

PE responds correctly. Return packet reaches the VR's eth2. But VR's nftables drops it before TCP processes it.

Inside the VR, the v6 firewall table:

$ nft list table ip6 ip6_firewall
table ip6 ip6_firewall {
    chain fw_input {
        type filter hook input priority filter; policy drop;
        icmpv6 type { echo-request, echo-reply, nd-router-advert,
                       nd-neighbor-solicit, nd-neighbor-advert } accept
    }
    chain fw_forward {
        type filter hook forward priority filter; policy accept;
        ct state established,related accept
        ip6 saddr <tenant-/64> jump fw_chain_egress
        ip6 daddr <tenant-/64> jump fw_chain_ingress
    }
    chain fw_chain_egress { counter accept }
    chain fw_chain_ingress {
        # tenant-configured ingress rules
        ip6 saddr ::/0 ip6 daddr ::/0 icmpv6 type { ... } accept
        ip6 saddr ::/0 ip6 daddr ::/0 tcp dport 22 accept
        counter drop
    }
}

For comparison, the IPv4 table on the same VR:

$ nft list table ip ip4_firewall
table ip ip4_firewall {
    chain INPUT {
        type filter hook input priority filter; policy drop;
        ...
        iifname "eth2" ct state established,related counter packets ... accept
        ...
    }
    ...
}

The IPv4 INPUT chain has the rule on eth2; the IPv6 fw_input chain does not.

Kernel TCPMD5 counters are all zero, confirming the drop happens before TCP state machine — i.e., at netfilter.

Source code root cause

In systemvm/debian/opt/cloud/bin/cs/CsAddress.py, fw_router_routing() writes the default INPUT and FORWARD rules for IPv4 only:

def fw_router_routing(self):
    if self.config.is_vpc() or not self.config.is_routed():
        return

    # Add default rules for INPUT chain
    self.nft_ipv4_fw.append({'type': "", 'chain': 'INPUT',
                             'rule': "iifname lo counter accept"})
    self.nft_ipv4_fw.append({'type': "", 'chain': 'INPUT',
                             'rule': "iifname eth2 ct state related,established counter accept"})  # <-- this rule
    # Add default rules for FORWARD chain
    self.nft_ipv4_fw.append({'type': "", 'chain': 'FORWARD',
               'rule': 'iifname "eth2" oifname "eth0" ct state related,established counter accept'})
    # ... more v4-only rules ...

There is no IPv6 equivalent of this function — nft_ipv6_fw is not appended-to anywhere. The IPv6 firewall's INPUT chain default rules are entirely missing for ROUTED-mode Isolated networks.

CsNetfilter.py:add_ip6_chain() adds the ct state established,related accept rule only to FORWARD-hooked chains, not INPUT:

def add_ip6_chain(self, address_family, table, chain, hook, action):
    ...
    if hook == "input" or hook == "output":
        CsHelper.execute("nft add rule %s %s %s icmpv6 type { ... } accept" % ...)
    elif hook == "forward":
        CsHelper.execute("nft add rule %s %s %s ct state established,related accept" % ...)

So for v6 INPUT (fw_input chain), only ICMPv6 is allowed and the chain inherits policy drop. The return BGP traffic never matches anything → dropped.

Reproduction confirmed across multiple VRs

Tested independently on two fresh VRs in two different tenant networks. Both showed:

  • IPv4 BGP works (Established)
  • IPv6 BGP stuck at Connect (PfxRcd=0)
  • Same fw_input chain layout with same missing rule
  • Same fix applies

Workaround

On the running VR, apply the missing rule and restart FRR:

nft 'add rule ip6 ip6_firewall fw_input iifname "eth2" ct state established,related counter accept'
systemctl restart frr

Within seconds, both IPv6 BGP sessions reach Established, tenant /64 is advertised, VMs become reachable from IPv6 internet. Verified end-to-end with SSH from public IPv6 internet to VM inside the v6-only routed network.

Caveat: the workaround is in-memory only. Lost on:

  • VR reboot
  • Any subsequent cmk createIpv6FirewallRule / cmk deleteIpv6FirewallRule call (ACS regenerates the chain from its own config DB, wiping the manually-added rule)
  • Any other event that triggers a v6 firewall reconfiguration on the VR

Each tenant FW rule change wipes the workaround. The operator has to SSH back into the VR and re-apply the nft rule after every FW change. This makes the offering effectively unusable as a customer product without the upstream fix.

Proposed fix — VALIDATED on a live VR

Add a v6 equivalent of fw_router_routing() in systemvm/debian/opt/cloud/bin/cs/CsAddress.py plus expose nft_ipv6_fw on CsIP. nft_ipv6_fw already exists on CsConfig (line 43); we just need to plumb it through CsIP and write into it.

Three changes in CsAddress.py:

1. Add reference in CsIP.__init__ (around line 312):

         self.nft_ipv4_fw = config.get_nft_ipv4_fw()
         self.nft_ipv4_acl = config.get_nft_ipv4_acl()
+        self.nft_ipv6_fw = config.get_ipv6_fw()

2. Add new fw_router_routing_v6() method (immediately before fw_vpcrouter_routing at line 674):

def fw_router_routing_v6(self):
    if self.config.is_vpc() or not self.config.is_routed():
        return
    # IPv6 INPUT chain defaults — mirror of fw_router_routing() for v4.
    # Without these, return traffic for VR-initiated v6 connections (BGP etc) 
    # is silently dropped by the default-DROP policy.
    self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
                             'rule': "iifname lo counter accept"})
    self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
                             'rule': "iifname eth2 ct state established,related counter accept"})
    if self.get_type() in ["guest"]:
        self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
                                 'rule': "iifname %s ct state established,related counter accept" % self.dev})

3. Call it from CsIP.configure() (line 756-757):

         self.fw_router_routing()
         self.fw_vpcrouter_routing()
+        self.fw_router_routing_v6()

Note: eth2 is hardcoded matching the v4 convention (and PUBLIC_INTERFACES["router"] in CsHelper.py). A more robust fix could reference that constant.

Validation

Applied this patch in-place on a running VR (r-278-VM, ACS 4.22.0.0) on 2026-05-16:

  1. Pre-patch: v6 BGP stuck in Connect; v6 fw_input chain had only ICMPv6 accept
  2. Patch applied; /opt/cloud/bin/configure.py cmd_line.json triggered re-process
  3. fw_input chain now includes iifname "eth2" ct state established,related counter accept
  4. v6 BGP sessions Established within seconds, PfxRcd=1, PfxSnt=2

Survival test (the key one): After patch, ran cmk createIpv6FirewallRule networkid=<net> traffictype=Ingress protocol=tcp startport=80 endport=80 — this pushes ipv6_firewall_rules.json to the VR and triggers the full IpTablesExecutor flush+rebuild path that previously wiped the manual nft workaround. After the FW change:

  • iifname "eth2" ct state established,related accept rule persists in fw_input (with active counters)
  • Both v6 BGP sessions still Established
  • End-to-end SSH from public IPv6 internet to VM in the network still works

This confirms the fix is correct and durable. The bug is in CsAddress.py / nft_ipv6_fw not being populated; the rest of the pipeline handles the v6 list correctly once it has content.

VPC equivalent

The same gap likely exists in the VPC routed path (fw_vpcrouter_routing at line 674). Not tested here (our setup is non-VPC Isolated) but worth a symmetric audit.

Affected versions

Verified on Apache CloudStack 4.22.0.0 (latest LTS at time of filing). PR #10970, which added the equivalent FORWARD-chain rule, is present and active in this build — but the INPUT-chain rule was deliberately removed in the PR's second commit ("Remove rule from input chain"), leaving this regression.

Affected versions (by code inspection + PR #10970 history):

Severity

High for anyone wanting to deploy IPv6-only ROUTED Isolated networks at scale. The feature appears to work (offering enables, network creates, VR provisions, BGP-v4 establishes) but tenant v6 traffic doesn't route because BGP-v6 silently fails. Diagnosis requires packet captures on the underlay — not obvious from the VR's own view.

Related

  • PR IPv6 firewall: accept packets from related and established connections #10970 ("IPv6 firewall: accept packets from related and established connections") — landed in 4.20.2 and 4.22.0.0 — added the equivalent rule to the FORWARD chain only. This fixed the VM-return-traffic case (downloads, etc.) but did NOT add the rule to the INPUT chain, leaving the VR's own outbound BGP return traffic still dropped. The PR discussion mentions a second commit "Remove rule from input chain" — suggesting an earlier draft did add the INPUT rule but it was removed in review. The bug described here is the consequence of that removal: VR-originated v6 connections (BGP, but also NTP, DNS lookups, etc., that the systemvm itself initiates outbound) fail on the return.
  • IsolatedV6RoutedFiltered offering — affected
  • IsolatedV6RoutedOffering (no Firewall service) — not affected (no firewall service means no ip6_firewall table; v6 BGP works there because no nftables drop happens)
  • IPv4 ROUTED with same offering shape — works as expected (different code path: fw_router_routing() in CsAddress.py writes the INPUT iifname "eth2" ct state related,established rule for v4)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions