From 59f299b25a765b221239792688f40fefe4bbf166 Mon Sep 17 00:00:00 2001
From: Jon Froehlich <jonf@cs.uw.edu>
Date: Wed, 17 Jun 2026 15:28:52 -0700
Subject: [PATCH 1/2] docs(deployment): document sitemap + Google Search
 Console workflow (#1313)

Add a "Search Engine Indexing" subsection to DEPLOYMENT.md covering the
dynamic, DB-generated sitemap (nothing to regenerate per content change),
a copy-paste prod health check, the one-time Google Search Console
registration/verification steps, and the (near-zero) ongoing cadence.
Also note that production pages carry no X-Robots-Tag, with a verify
command. Builds on the routing notes from #1252.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/DEPLOYMENT.md | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index cec49723..b8477ed7 100644
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -17,6 +17,7 @@ This document covers the Makeability Lab website's production infrastructure, de
     - [Configuration File](#configuration-file)
     - [Environment Variables](#environment-variables)
     - [Static Files vs. Dynamic Requests (Apache routing)](#static-files-vs-dynamic-requests-apache-routing)
+    - [Search Engine Indexing (Sitemap & Google Search Console)](#search-engine-indexing-sitemap--google-search-console)
   - [Debugging \& Logging](#debugging--logging)
     - [Log Files](#log-files)
     - [Accessing Logs via Web](#accessing-logs-via-web)
@@ -135,7 +136,36 @@ On both servers, Apache sits in front of the Django container. It serves any URL
 - **`/robots.txt` is a static file** — it is the top-level [`robots.txt`](../robots.txt) committed in the repo root, served by Apache from the project checkout. To change crawler rules or the advertised sitemap, edit that file and deploy. A Django view/route for `/robots.txt` would be dead code on the servers (it only runs under local `runserver`, which diverges from production).
 - **`/sitemap.xml` is dynamic** — no such file exists, so Apache proxies it to Django's `django.contrib.sitemaps` (see `website/sitemaps.py`), which builds the XML from the database on each request.
 - **Django sees requests as HTTP, not HTTPS.** Apache terminates TLS and proxies to Django over plain HTTP, so `request.scheme` is `http`. Any code that builds absolute URLs from the request (e.g. the sitemap) must force `https` explicitly — the sitemaps do this via `protocol = "https"`.
-- **The test server is never indexed.** Apache stamps `X-Robots-Tag: noindex, nofollow` on every response from the test host, so staging stays out of search engines regardless of its `robots.txt`.
+- **The test server is never indexed.** Apache stamps `X-Robots-Tag: noindex, nofollow` on every response from the test host, so staging stays out of search engines regardless of its `robots.txt`. (Production pages carry no such header — verify with `curl -sI https://makeabilitylab.cs.washington.edu/ | grep -i x-robots-tag`, which should return nothing.)
+
+### Search Engine Indexing (Sitemap & Google Search Console)
+
+The production sitemap is **dynamically generated from the database** (`website/sitemaps.py`, served at [`/sitemap.xml`](https://makeabilitylab.cs.washington.edu/sitemap.xml)) and advertised in the repo-root [`robots.txt`](../robots.txt). New people, news items, publications, projects, etc. appear in it **automatically** — there is nothing to regenerate or re-upload when content changes. (Sitemap/robots work landed in #1252; the related prod-deploy stall it surfaced is #1313.)
+
+**Quick health check** (anytime — all should be true):
+
+```bash
+curl -sI https://makeabilitylab.cs.washington.edu/sitemap.xml | head -1   # 200, served by Django (WSGIServer)
+curl -s  https://makeabilitylab.cs.washington.edu/robots.txt              # allow-all + a "Sitemap:" line
+curl -s  https://makeabilitylab.cs.washington.edu/sitemap.xml | grep -c '<loc>'  # count of URLs (~700+)
+```
+
+The `X-Robots-Tag: noindex` that the sitemap *file* returns is intentional and harmless — it keeps the XML out of search results without affecting the URLs listed inside.
+
+#### One-time: register the sitemap with Google Search Console
+
+You only do this once per property (not per content change):
+
+1. Go to [Google Search Console](https://search.google.com/search-console) → **Add property** → **URL prefix** (not *Domain* — that needs a DNS record we can't add for `cs.washington.edu`).
+2. Enter `https://makeabilitylab.cs.washington.edu/` exactly.
+3. **Verify ownership.** Easiest if it works: the **Google Analytics** method (prod already serves an Analytics snippet). Otherwise use the **HTML file** method — commit Google's `google<token>.html` to the **repo root** (Apache serves it statically, exactly like `robots.txt`) and ship it to prod with a SemVer tag, then click *Verify*. **Leave the verification asset (GA snippet or HTML file) in place permanently** — removing it un-verifies the property.
+4. In the left sidebar → **Sitemaps** → enter `sitemap.xml` → **Submit**. Status moves to *Success* once Google fetches it.
+
+#### Ongoing maintenance: essentially none
+
+- **Per new person / news item / publication: do nothing.** The dynamic sitemap updates itself and Google re-crawls `/sitemap.xml` on its own schedule (days–weeks).
+- **Re-submit only if** the sitemap URL changes or you restructure the site's URL scheme.
+- **Optional:** glance at Search Console's *Pages* (indexing) report ~quarterly for crawl errors, or use *URL Inspection → Request Indexing* to fast-track an important new page.
 
 ## Debugging & Logging
 

From 2a05bc6908cd68de77bb1b74efa71871c259c84f Mon Sep 17 00:00:00 2001
From: Jon Froehlich <jonf@cs.uw.edu>
Date: Wed, 17 Jun 2026 15:42:30 -0700
Subject: [PATCH 2/2] docs(deployment): record sitemap already
 registered/verified in Search Console (#1313)

The prod site is already a verified URL-prefix property in Google Search
Console (verified via the pre-existing Google Analytics property), and the
sitemap was submitted 2026-06-17. Add a "Current status" note so nobody
re-runs verification, and reframe the registration steps as re-setup-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/DEPLOYMENT.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index b8477ed7..fb085d6d 100644
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -152,7 +152,11 @@ curl -s  https://makeabilitylab.cs.washington.edu/sitemap.xml | grep -c '<loc>'
 
 The `X-Robots-Tag: noindex` that the sitemap *file* returns is intentional and harmless — it keeps the XML out of search results without affecting the URLs listed inside.
 
-#### One-time: register the sitemap with Google Search Console
+#### Current status: already registered & verified
+
+The production site is **already a verified property** in Google Search Console (`https://makeabilitylab.cs.washington.edu/`, URL-prefix), with the `sitemap.xml` **submitted on 2026-06-17** ("Sitemap submitted successfully — Google will periodically process it and look for changes"). Ownership was verified via the site's **pre-existing Google Analytics property** (the Analytics snippet served on every page), so no verification file lives in the repo. **You do not need to re-do any of the steps below** under normal operation — see "Ongoing maintenance" for what little there is. The steps are retained only for re-setup (e.g. registering a new property or recovering after the Search Console / Analytics account access is lost).
+
+#### Setting up from scratch (only if re-registering)
 
 You only do this once per property (not per content change):