Check Django migrations into Git (stop generating them at runtime)


`website/migrations/` is **gitignored** and **zero migration files are tracked in
Git**. Instead, every environment — each laptop, CI, the test server, and the
production server — generates its *own* migration history at container start by
running `makemigrations` followed by `migrate` in `docker-entrypoint.sh`.

This means our database schema has **no single source of truth in version
control**. Each environment's `website/migrations/` directory is a private,
divergent history that lives only on that machine's disk. This is contrary to
Django's design and to standard practice, and it has already cost us real
debugging time and accreted a set of workarounds.

This issue documents the problem, why it breaks best practices, and proposes a
path to make committed migrations the source of truth.

## Background / how we got here

- Migrations were removed from the repo on **2016-05-17** (commit `f149094`,
  "Removed migrations from repo," referencing the old issue
  `jonfroehlich/makeabilitylabwebsite#30`). They have been gitignored ever
  since — roughly **a decade** and ~29 contributors / 2,200+ commits later. The
  project itself predates this repo (it goes back to ~2013 at UMD), so the
  schema has a very long, multi-contributor lineage.
- `.gitignore` currently contains:
  ```
  website/migrations/
  website/migrations_old/
  website/migrations/__pycache__
  ```
- `docker-entrypoint.sh` runs, on **every container start**:
  ```
  python manage.py makemigrations
  python manage.py migrate
  python manage.py makemigrations website   # repeated on purpose ("fixes first-run issues")
  python manage.py migrate website
  ```
  Both `docker-compose.yml` (test/prod) and `docker-compose-local-dev.yml`
  bind-mount the project root to `/code` (`.:/code`), so the migrations Django
  auto-generates on a server are written into *that server's* checkout and
  persist there — but they are never committed, and there is no reason to expect
  them to match what any other environment generated.
- Locally there are currently 35 auto-generated migrations (`0001_initial.py` …
  `0035_project_is_visible_…py`) plus a leftover `website/migrations_old/`
  directory with a parallel, partly-`.not_used` history. None of this is in Git.

## Why this breaks best practices

Django migrations are **schema-as-code**: they are meant to be committed,
code-reviewed, and replayed deterministically so that every environment converges
on the same schema. Generating them at runtime defeats all of that:

1. **No single source of truth for the schema.** The "real" schema is whatever
   has accreted on each machine. There is no artifact in the repo that answers
   "what is the current database shape and how did it get there?"
2. **Schema drift between environments.** Because `makemigrations` runs
   independently on each host, laptops/CI/test/prod can produce migrations with
   different names, ordering, or contents. This is the direct cause of the
   intermittent **`column "..." already exists`** failures when building a fresh
   test DB (issue #1267).
3. **Schema changes bypass code review.** A model change's *data-layer
   consequence* (the generated DDL) is never seen in a PR. Reviewers can't catch a
   destructive or ambiguous migration (e.g. a column rename Django guesses wrong,
   or a `RemoveField` that silently drops data) before it hits a server.
4. **Non-reproducible builds.** A fresh checkout cannot reconstruct the schema
   from the repo alone; it depends on `makemigrations` making the same guesses
   today that it made historically. Auto-named migrations
   (`0007_auto_20151221_1218`) are exactly the ambiguous-history case Django
   warns about.
5. **No safe data migrations or rollbacks.** Real schema changes often need
   paired data backfills, and Django expresses those as migration operations
   (`RunPython`). Without committed migrations there's nowhere to put them and no
   reliable `migrate <app> <number>` to roll back to.
6. **`makemigrations` at runtime on a server is itself an anti-pattern.** Servers
   should only ever *apply* reviewed migrations (`migrate`), never *invent* them.
   Auto-generating schema changes on production start-up means a model edit can
   silently alter the prod schema with no review gate.
7. **This is especially dangerous given our prod access model.** Per CLAUDE.md,
   the maintainer has **no shell/SSH access to the prod Docker host and no direct
   DB connection** — everything must run inside the container via the entrypoint.
   So if prod schema drifts or a runtime `makemigrations` does something
   unexpected, we have very limited ability to inspect or repair it. We are
   trusting an unreviewed, auto-generated process on a system we can't directly
   reach.

## Workarounds this has already forced

The absence of committed migrations is load-bearing for several pieces of code —
evidence of the cost:

- **`makeabilitylab/settings_test.py`** sets `MIGRATION_MODULES = {"website":
  None}` so tests build tables directly from the models (`run_syncdb`) instead of
  replaying the gitignored history. This is the documented durable fix for the
  `column already exists` flakiness (#1267). Tests literally cannot trust the
  migration history, so they bypass it.
- **`fix_sortedm2m_columns`** management command runs raw `ALTER TABLE` at every
  startup to add `sort_value` columns that "may have been migrated before the
  field was changed to `SortedManyToManyField`" — i.e. patching schema drift by
  hand because the migrations can't be relied on.
- **`backfill_num_pages`** and **`backfill_project_visibility`** are data
  backfills wired into the entrypoint instead of being expressed as `RunPython`
  data migrations.
- The **repeated `makemigrations website`** in the entrypoint ("often fixes some
  first-time run issues") is a symptom of the same fragility.
- A stale **`website/migrations_old/`** tree with `.not_used` / `.old_not_used`
  files is the archaeological residue of past manual history surgery.

## Proposed path forward

The goal: **committed migrations become the source of truth, and servers only
apply them.** Because we have long-lived prod/test databases with real data and
limited prod access, the delicate part is *baselining* the existing databases
onto a fresh committed history without rebuilding tables.

Suggested phased approach (to be validated with someone comfortable with Django
migrations before touching prod):

1. **Establish a clean baseline migration.**
   - From the current models, generate a single squashed `0001_initial` (and any
     necessary follow-ups) that exactly represents today's schema. Commit it.
   - Before trusting it, **request a prod DB schema snapshot from CSE IT** (the
     sanctioned way to inspect prod data per CLAUDE.md) and diff it against what
     the models/baseline produce, so we know prod actually matches the models and
     there's no hidden drift.
2. **Un-ignore migrations.** Remove `website/migrations/` (keep
   `__pycache__` ignored) and `website/migrations_old/` from `.gitignore`; delete
   the dead `migrations_old/` tree.
3. **Fake-apply the baseline on existing environments.** For databases whose
   tables already match the baseline, record it as applied without re-running DDL:
   `python manage.py migrate website --fake-initial` (or `migrate website 0001
   --fake`). This is the safe way to adopt a new history on a populated DB.
4. **Stop generating migrations on servers.** Remove `makemigrations` /
   `makemigrations website` from `docker-entrypoint.sh`; keep only `migrate`.
   Going forward, migrations are authored locally, reviewed in a PR, committed,
   and merely *applied* on deploy.
5. **Add CI drift detection.** Run `python manage.py makemigrations --check
   --dry-run` in `.github/workflows/test.yml` so a model change without a
   committed migration fails CI. (We can then likely drop the
   `MIGRATION_MODULES = {"website": None}` test shim, or keep it as belt-and-
   suspenders.)
6. **Fold the workarounds back into real migrations** where appropriate: convert
   `backfill_num_pages` / `backfill_project_visibility` into `RunPython` data
   migrations; once the schema is reliably reproduced, retire
   `fix_sortedm2m_columns` and the duplicated `makemigrations` step.

### Open questions / risks

- **Reconciling divergent existing histories.** Prod, test, and local have each
  accreted their own migration files. The `--fake-initial` baseline only works
  cleanly if their *table shapes* already agree with the new baseline; we need to
  confirm that (hence the schema-diff step) before deploying.
- **No direct prod access** makes step 3 a one-shot, entrypoint-driven operation
  we can only verify through logs — it needs to be scripted carefully and tested
  on the test server first.
- **Decide the fate of `migrations_old/`** and the `.not_used` files (almost
  certainly safe to delete, but worth a deliberate call).

## Why now

We're actively cleaning up tech debt and just stood up CI + a tests-first
workflow. Committed migrations are a prerequisite for trustworthy CI of
schema-affecting changes and remove a whole class of "works on my machine /
breaks on the server" failures. It's also far easier to baseline a decade-old
schema deliberately now than after the next contributor's model change quietly
diverges prod again.

---

*Drafted from a study of `.gitignore`, `docker-entrypoint.sh`,
`makeabilitylab/settings_test.py`, the `website/management/commands/` workarounds,
the compose files, and the git history of commit `f149094`.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check Django migrations into Git (stop generating them at runtime) #1317

Background / how we got here

Why this breaks best practices

Workarounds this has already forced

Proposed path forward

Open questions / risks

Why now

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Check Django migrations into Git (stop generating them at runtime) #1317

Description

Background / how we got here

Why this breaks best practices

Workarounds this has already forced

Proposed path forward

Open questions / risks

Why now

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions