Skip to content

Managed domains#4

Draft
siddhsuresh wants to merge 20 commits into
mainfrom
managed-domains
Draft

Managed domains#4
siddhsuresh wants to merge 20 commits into
mainfrom
managed-domains

Conversation

@siddhsuresh
Copy link
Copy Markdown

No description provided.

siddhsuresh and others added 20 commits May 27, 2026 17:10
use_ravion_managed_domains toggle: ravion_certificate (shared_wildcard) issues
*.<name>-<hash>.<apex>; this module owns the public ALB HTTPS listener with that
cert as default (alb submodule skips its HTTPS listener + cert ARNs); opens SG
443. Outputs the wildcard fqdn + listener arn + cert arn + aws account/region
for ecs_service to nest under. Provider pinned ravion.com/ravion/ravion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cluster_parent_fqdn enables Ravion domains. Mode A (no domains): ravion_domain
auto-FQDN <name>.<cluster-apex> rides the cluster wildcard via a listener rule.
Mode B (domains): per-service ravion_certificate (<=10 SANs) attached to the
cluster listener + ravion_domain custom routing records; auto-FQDN retires once
customs are healthy (ravion_auto_domain_status). Skips caller listener rules in
Ravion mode. Outputs auto fqdn/url + custom cert arn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
use_ravion_managed_domains: ravion_certificate (instance, target_arn = the
CloudFront distribution ARN, region us-east-1) covering custom domains or a
generated auto-FQDN; ravion_domain custom routing records (ALIAS to the
distribution, CloudFront zone Z2FDTNDATAQYW2). Configure var.distributions
without aliases/cert in this mode (Ravion sets them server-side).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cluster only wired the Ravion wildcard cert + 443 HTTPS listener on the
public ALB, so private services (web-private / private-network-server) had no
Ravion-owned listener to attach to. Mirror the public wiring onto the private
ALB — one wildcard cert backs both listeners (an ACM ARN can default many).

ecs_cluster:
- enable_ravion_domain now triggers on public OR private ALB (was public-only);
  precondition requires at least one ALB instead of mandating the public one.
- Add aws_lb_listener.ravion_https_private on the private ALB (same cluster
  cert) + a private-ALB 443 ingress rule; private alb submodule now skips its
  own HTTPS listener/cert in Ravion mode, same as the public submodule.
- public/private_alb_https_listener_arn outputs surface the Ravion-owned
  listener when present (length()-guarded so a private-only cluster is valid).

ecs_service:
- Generalize cluster_https_listener_arn / cluster_alb_dns_name / cluster_alb_zone_id
  descriptions: pipe the public OR private ALB outputs per the service's
  visibility. The resources were already visibility-agnostic; only the docs
  hardcoded "public".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iven id

The cluster wildcard FQDN used var.name (the project-environment slug, e.g.
testttsss-prod-modules), not the user-facing instance given id (elysia-ecs-cluster).
Declare module_instance_given_id (injected by the runner as
TF_VAR_module_instance_given_id) and default the cert leaf to it, so the
wildcard becomes <given-id>-<hash>.<ravion-apex>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the ecs_cluster fix: the service auto-domain (<leaf>.<cluster-wildcard>)
used var.name (project-env slug) instead of the user-facing instance given id.
Declare module_instance_given_id (runner-injected) and default the leaf to it,
so the auto-FQDN becomes <given-id>.<cluster-apex>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ravion_certificate.cluster now passes target_dns_name/target_zone_id
(public ALB if present, else private) so Ravion publishes a *.<apex>
ALIAS to the cluster ALB. Service auto-FQDNs riding the wildcard then
resolve with no per-service DNS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ollide

Fixed name + create_before_destroy = true means any attribute change that
forces TG replacement (e.g. container_port change) fails apply because the
new TG can't share the same name as the existing one. Switching to name_prefix
lets AWS allocate a unique suffix on each create.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Existing 'spa' and 'filesystem' modes append /index.html to extensionless
paths, which is wrong for object sites whose URLs ARE the S3 keys —
namely a Terraform provider registry, where the viewer requests
`/v1/providers/<ns>/<type>/versions` and must get back the literal JSON
file at that key, not `/<...>/versions/index.html`.

`raw` is a 1:1 viewer-URI → /<version>/<URI> mapping. KVS-driven
versioning still applies. Use for terraform registries, S3-like content
APIs, or any case where viewer URLs must equal S3 keys 1:1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move all required_providers source references from
`providers.siddharthsuresh.dev/ravion/ravion` (cloudflared → local
registry on a laptop) to `provider-cf.siddharthsuresh.dev/ravion/ravion`
(CloudFront → S3, multi-version, KMS-signed). The local-cloudflared
path stays alive during the cutover so existing stacks pinning the old
hostname keep working until they're migrated or destroyed.

Migration note: existing stacks have provider addresses recorded in
their cloud-backend state under the old hostname. A subsequent apply
will need a state-replace-provider pass, OR be done as part of a
destroy+recreate. New stacks pick up the new hostname automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runner's terraformrc network-mirror config is hardcoded for the
canonical hostname (providers.siddharthsuresh.dev). Pointing modules at
provider-cf.siddharthsuresh.dev triggers terraform's
"requires authentication credentials" error since the runner doesn't
emit a credentials/mirror block for that host. The proper migration is
stage B — keep the source unchanged and point the canonical hostname's
DNS at CloudFront — not a per-module source rewrite. Reverting `2f78c63`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ains state

ecs_cluster always owns the public/private HTTPS listener at a stable TF address
so toggling use_ravion_managed_domains is an in-place certificate swap, not a
destroy+create across two addresses. Only the default cert (Ravion wildcard vs
the customer's first ARN), the SNI cert set, and ravion_certificate.cluster
change on toggle.

- alb submodule: additive force_http_to_https_redirect keeps the HTTP->HTTPS
  redirect when a parent owns port 443; redirect deduped into locals
- new ravion_managed_domains_enabled output for service-level show/hide
- moved blocks for the in-root renames + submodule->root migration
- focused tests/listeners.tftest.hcl (8/8 pass)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sification

Each domains entry is classified per-entry instead of all-or-nothing:
<leaf>.<apex> (one label under the cluster apex) rides the cluster wildcard
cert via SNI (no per-service cert, no DNS record); everything else gets one
per-service instance cert + a customer routing record. Empty list falls back
to the auto-FQDN <given-id>.<apex>. Removes the ravion_auto_domain_status
retirement flow — the domains list is the single source of truth.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lowercase + strip trailing dot/whitespace + drop empties, and require a
non-empty leaf for the wildcard bucket, so mixed-case / trailing-dot /
empty-leaf entries classify correctly and never yield an invalid ALB host
header.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Domains under the cluster apex that aren't a single-label <leaf>.<apex>
(the bare apex, or names more than one label deep) can't ride the
*.<apex> wildcard cert and can't be satisfied with a customer record
(the record would live in the Ravion-managed zone). Add an
invalid_apex_domains local + a lifecycle.precondition on
ravion_certificate.svc that fails the plan with the offending entries
and the fix, instead of silently mis-routing them into a per-service
cert + an unwritable routing record. Pairs with the server-side
RejectCustomDomainUnderApex backstop for direct-API callers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- B3: mock the ravion provider in the pre-existing ecs_cluster + ecs_service
  basic.tftest.hcl so the suites stop aborting on "Missing Ravion API key"
  (declaring the provider configures it even when all ravion resources are
  count=0 on the BYO path).
- M13: split the ravion listener rule's host headers into chunks of <=5 values
  (AWS ALB's per-rule condition-value quota), each chunk with its own priority.
- M14: keep the rolling target group's pre-branch stable name
  substr(var.name,0,28)+"-tg" instead of name_prefix — avoids the one-time
  ForceNew that deadlocks against the listener rule's ignore_changes=[action],
  and stays within ALB's 32-char TG-name limit.
- #39: gate the ravion listener rule (and cert/domain) on enable_load_balancer
  so it can't be created with a null target_group_arn.
- #40: widen the auto-derived rule-priority hash entropy to cut collisions on
  the shared cluster listener.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in-place rotation

Rotating the shared_wildcard cert (any RequiresReplace change, e.g. a renamed
apex) destroyed the old cert before swapping the listener, hitting ACM
ResourceInUse and deadlocking. create_before_destroy issues the new cert and
swaps it onto the listener in-place before deleting the old one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant