Skip to content

Pod source creates CNAME records with empty target when PodIP is not yet assigned #6375

@dixneuf19

Description

@dixneuf19

What happened:

After upgrading (from a very old version) external-dns with the azure-private-dns provider and --source=pod, we observed a large number of Private DNS zones ending up with stale CNAME records with empty targets, which then caused a permanent error loop when the same pods were later assigned a PodIP.

When a pod carries the external-dns.alpha.kubernetes.io/internal-hostname annotation but is still in Pending state (not yet scheduled, no PodIP assigned), external-dns creates a CNAME record with an empty target ("") instead of skipping the pod.

In source/pod.go, addInternalHostnameAnnotationEndpoints calls endpoint.SuitableType(pod.Status.PodIP) without first checking whether PodIP is empty. Since "" is not a valid IP address, SuitableType returns CNAME, and an endpoint with an empty CNAME target is created.

Once the pod gets scheduled and receives an IP, external-dns tries to create an A record at the same DNS name. The provider rejects this because A and CNAME records cannot coexist at the same name — Azure returns 409 Conflict with code CannotCreateRecordDueToCNameNamingRestriction. The error repeats on every reconciliation cycle (every 3 minutes) and the A record is never created until the stale CNAME record is manually deleted from the DNS zone.

The same bug exists in two other code paths in source/pod.go that also pass pod.Status.PodIP to SuitableType without an empty-IP guard:

  • addKopsDNSControllerEndpoints (kops-dns-controller compatibility mode)
  • addPodSourceDomainEndpoints (--pod-source-domain)

What you expected to happen:

Pods with an empty PodIP should be skipped when generating endpoints, consistent with the existing guard in hostsFromTemplate (source/pod.go line 261-264):

if address.IP == "" {
    log.Debugf("skipping pod %q. PodIP is empty with phase %q", pod.Name, pod.Status.Phase)
    continue
}

No CNAME record with an empty target should ever be created.

DNS records — actual vs. expected:

For a pod foo with annotation external-dns.alpha.kubernetes.io/internal-hostname: foo.example.internal:

Actual (buggy) behavior: while the pod is Pending, external-dns writes:

foo.example.internal.         CNAME   ""
cname-foo.example.internal.   TXT     "heritage=external-dns,external-dns/owner=example-cluster,external-dns/resource=pod/..."

Once the pod reaches Running with PodIP 10.0.0.10, external-dns tries to add foo.example.internal. A 10.0.0.10 but the empty-target CNAME at the same name blocks it, and the A record is never created.

Expected behavior — nothing is written while the pod is Pending. Once the pod reaches Running with PodIP 10.0.0.10:

foo.example.internal.         A     10.0.0.10
a-foo.example.internal.       TXT   "heritage=external-dns,external-dns/owner=example-cluster,external-dns/resource=pod/..."

No CNAME record is ever created.

How to reproduce it (as minimally and precisely as possible):

  1. Configure external-dns with --source=pod and an Azure Private DNS provider
  2. Create a pod with the annotation external-dns.alpha.kubernetes.io/internal-hostname: foo.example.com
  3. Ensure the pod stays in Pending state (e.g. unschedulable due to resource requests or node selector)
  4. Observe external-dns creating a CNAME record with an empty target for foo.example.com
  5. Let the pod become Running (normal scheduling once resources free up)
  6. Observe the permanent 409 Conflict error loop on every reconciliation cycle

We have occurence of this issue on ephemeral envs were pods with internal Hostname can spawn but stay a few minutes in Pending while new nodes are created.

Another way is to add these tests:

Adding the following table entries to TestPodSource in source/pod_test.go (each asserts that an empty endpoint list is produced) reproduces the bug on master — all three fail with expected 0 endpoints, got 1:

{
    "pending pod with empty PodIP and internal-hostname annotation should not create CNAME",
    "",
    "",
    false,
    "",
    []*endpoint.Endpoint{},
    false,
    nil,
    []*corev1.Pod{
        {
            ObjectMeta: metav1.ObjectMeta{
                Name:      "pending-pod",
                Namespace: "kube-system",
                Annotations: map[string]string{
                    annotations.InternalHostnameKey: "foo.example.com",
                },
            },
            Spec: corev1.PodSpec{
                HostNetwork: false,
            },
            Status: corev1.PodStatus{
                Phase: corev1.PodPending,
                PodIP: "",
            },
        },
    },
},
{
    "pending pod with empty PodIP and pod-source-domain should not create CNAME",
    "",
    "",
    false,
    "example.org",
    []*endpoint.Endpoint{},
    false,
    nil,
    []*corev1.Pod{
        {
            ObjectMeta: metav1.ObjectMeta{
                Name:      "pending-pod",
                Namespace: "kube-system",
            },
            Spec: corev1.PodSpec{HostNetwork: false},
            Status: corev1.PodStatus{
                Phase: corev1.PodPending,
                PodIP: "",
            },
        },
    },
},
{
    "pending pod with empty PodIP and kops-dns-controller annotation should not create CNAME",
    "",
    "kops-dns-controller",
    false,
    "",
    []*endpoint.Endpoint{},
    false,
    nil,
    []*corev1.Pod{
        {
            ObjectMeta: metav1.ObjectMeta{
                Name:      "pending-pod",
                Namespace: "kube-system",
                Annotations: map[string]string{
                    kopsDNSControllerInternalHostnameAnnotationKey: "foo.example.com",
                },
            },
            Spec: corev1.PodSpec{HostNetwork: false},
            Status: corev1.PodStatus{
                Phase: corev1.PodPending,
                PodIP: "",
            },
        },
    },
},

Run with:

go test ./source/ -run "TestPodSource/pending_pod" -v

Output on current master:

--- FAIL: TestPodSource (0.34s)
    --- FAIL: TestPodSource/pending_pod_with_empty_PodIP_and_internal-hostname_annotation_should_not_create_CNAME (0.13s)
        pod_test.go:861: expected 0 endpoints, got 1
    --- FAIL: TestPodSource/pending_pod_with_empty_PodIP_and_pod-source-domain_should_not_create_CNAME (0.10s)
        pod_test.go:861: expected 0 endpoints, got 1
    --- FAIL: TestPodSource/pending_pod_with_empty_PodIP_and_kops-dns-controller_annotation_should_not_create_CNAME (0.10s)
        pod_test.go:861: expected 0 endpoints, got 1

Each "got 1" endpoint is a CNAME with an empty target, confirming the bug.

External-dns deployment (live object from the API server, trimmed for brevity):

apiVersion: v1
kind: Pod
metadata:
  name: external-dns-xxxxxxxxxx-xxxxx
  namespace: external-dns
spec:
  containers:
  - args:
    - --log-level=info
    - --log-format=json
    - --interval=3m
    - --source=service
    - --source=pod
    - --policy=sync
    - --registry=txt
    - --txt-owner-id=example-cluster
    - --domain-filter=example.internal
    - --managed-record-types=A
    - --provider=azure-private-dns
    image: registry.k8s.io/external-dns/external-dns:v0.20.0 # note that we are using latest version available through chart, but bug still present in latest version
    name: external-dns
status:
  phase: Running
  podIP: 10.0.0.1

Logs (excerpt):

{"level":"info","msg":"Updating CNAME record named 'foo-1.primary' to '' for Azure Private DNS zone 'example.internal'."}
{"level":"info","msg":"Updating TXT record named 'cname-foo-1.primary' to '\"heritage=external-dns,...\"' for Azure Private DNS zone 'example.internal'."}
{"level":"info","msg":"Updating A record named 'foo-1.primary' to '10.0.0.10' for Azure Private DNS zone 'example.internal'.","time":"2026-04-16T16:42:39Z"}
{"level":"error","msg":"Failed to update A record named 'foo-1.primary' to '10.0.0.10' for Azure Private DNS zone 'example.internal': PUT https://management.azure.com/.../privateDnsZones/example.internal/A/foo-1.primary\nRESPONSE 409: 409 Conflict\nERROR CODE: Conflict\n{\n  \"code\": \"Conflict\",\n  \"message\": \"The record could not be created because a CNAME record with the same name already exists in this zone.\",\n  \"details\": [\n    {\n      \"code\": \"CannotCreateRecordDueToCNameNamingRestriction\",\n      \"message\": \"The record could not be created because a CNAME record with the same name already exists in this zone.\"\n    }\n  ]\n}\n","time":"2026-04-16T16:42:39Z"}

Then the same pattern Updating A + Failed to update A record repeats for ~150 distinct record names every reconciliation cycle

Note: --managed-record-types=A is our current workaround, it prevents external-dns from managing CNAMEs at all, so no new empty-target CNAMEs get created. The stale CNAMEs from before the workaround still exist in the zone and continue to conflict with A record creation until they are manually deleted.

Anything else we need to know?:

Possible fix: add the same empty-PodIP guard that hostsFromTemplate already uses, before each of the three SuitableType(pod.Status.PodIP) call sites in source/pod.go:

if pod.Status.PodIP == "" {
    log.Debugf("skipping pod %q: PodIP is empty with phase %q", pod.Name, pod.Status.Phase)
    continue  // or return, depending on the enclosing control flow
}

I have a PR adding this check, waiting a bit to confirm the issue is valid.

Current workaround: --managed-record-types=A stops external-dns from creating or updating CNAME records, which prevents new empty-target CNAMEs from being created. Stale CNAMEs from before the workaround must still be manually deleted from the zone.

Also, full disclosure, I did use AI to confirm this bug and draft this issue. I did not find specific guideline forbidding it, and hoped to have injected enough of my human brain so that it is somewhat intelligible.

See also: #5277 (thematically related — SuitableType returning CNAME inappropriately).

Environment:

  • External-DNS version (use external-dns --version): v0.20.0 (but given a quick analysis of the code should still be present on latest version)
  • DNS provider: azure-private-dns
  • Kubernetes: AKS 1.33
  • Source: pod
  • Scale: Affect every "long pending" pods, in our case +150 pods but depends on the env

Checklist

  • I have searched existing issues and tried to find a fix myself
  • I am using the latest release,
    or have checked the staging image to confirm the bug is still reproducible
  • I have provided the actual process flags (not Helm values)
  • I have provided kubectl get <resource> -o yaml output including status
  • I have provided full external-dns debug logs
  • I have described what DNS records exist and what I expected

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions