Skip to content

Check Django migrations into Git (stop generating them at runtime) #1317

@jonfroehlich

Description

@jonfroehlich

website/migrations/ is gitignored and zero migration files are tracked in
Git
. Instead, every environment — each laptop, CI, the test server, and the
production server — generates its own migration history at container start by
running makemigrations followed by migrate in docker-entrypoint.sh.

This means our database schema has no single source of truth in version
control
. Each environment's website/migrations/ directory is a private,
divergent history that lives only on that machine's disk. This is contrary to
Django's design and to standard practice, and it has already cost us real
debugging time and accreted a set of workarounds.

This issue documents the problem, why it breaks best practices, and proposes a
path to make committed migrations the source of truth.

Background / how we got here

  • Migrations were removed from the repo on 2016-05-17 (commit f149094,
    "Removed migrations from repo," referencing the old issue
    jonfroehlich/makeabilitylabwebsite#30). They have been gitignored ever
    since — roughly a decade and ~29 contributors / 2,200+ commits later. The
    project itself predates this repo (it goes back to ~2013 at UMD), so the
    schema has a very long, multi-contributor lineage.
  • .gitignore currently contains:
    website/migrations/
    website/migrations_old/
    website/migrations/__pycache__
    
  • docker-entrypoint.sh runs, on every container start:
    python manage.py makemigrations
    python manage.py migrate
    python manage.py makemigrations website   # repeated on purpose ("fixes first-run issues")
    python manage.py migrate website
    
    Both docker-compose.yml (test/prod) and docker-compose-local-dev.yml
    bind-mount the project root to /code (.:/code), so the migrations Django
    auto-generates on a server are written into that server's checkout and
    persist there — but they are never committed, and there is no reason to expect
    them to match what any other environment generated.
  • Locally there are currently 35 auto-generated migrations (0001_initial.py
    0035_project_is_visible_…py) plus a leftover website/migrations_old/
    directory with a parallel, partly-.not_used history. None of this is in Git.

Why this breaks best practices

Django migrations are schema-as-code: they are meant to be committed,
code-reviewed, and replayed deterministically so that every environment converges
on the same schema. Generating them at runtime defeats all of that:

  1. No single source of truth for the schema. The "real" schema is whatever
    has accreted on each machine. There is no artifact in the repo that answers
    "what is the current database shape and how did it get there?"
  2. Schema drift between environments. Because makemigrations runs
    independently on each host, laptops/CI/test/prod can produce migrations with
    different names, ordering, or contents. This is the direct cause of the
    intermittent column "..." already exists failures when building a fresh
    test DB (issue Need much better testing infracture on local host #1267).
  3. Schema changes bypass code review. A model change's data-layer
    consequence
    (the generated DDL) is never seen in a PR. Reviewers can't catch a
    destructive or ambiguous migration (e.g. a column rename Django guesses wrong,
    or a RemoveField that silently drops data) before it hits a server.
  4. Non-reproducible builds. A fresh checkout cannot reconstruct the schema
    from the repo alone; it depends on makemigrations making the same guesses
    today that it made historically. Auto-named migrations
    (0007_auto_20151221_1218) are exactly the ambiguous-history case Django
    warns about.
  5. No safe data migrations or rollbacks. Real schema changes often need
    paired data backfills, and Django expresses those as migration operations
    (RunPython). Without committed migrations there's nowhere to put them and no
    reliable migrate <app> <number> to roll back to.
  6. makemigrations at runtime on a server is itself an anti-pattern. Servers
    should only ever apply reviewed migrations (migrate), never invent them.
    Auto-generating schema changes on production start-up means a model edit can
    silently alter the prod schema with no review gate.
  7. This is especially dangerous given our prod access model. Per CLAUDE.md,
    the maintainer has no shell/SSH access to the prod Docker host and no direct
    DB connection
    — everything must run inside the container via the entrypoint.
    So if prod schema drifts or a runtime makemigrations does something
    unexpected, we have very limited ability to inspect or repair it. We are
    trusting an unreviewed, auto-generated process on a system we can't directly
    reach.

Workarounds this has already forced

The absence of committed migrations is load-bearing for several pieces of code —
evidence of the cost:

  • makeabilitylab/settings_test.py sets MIGRATION_MODULES = {"website": None} so tests build tables directly from the models (run_syncdb) instead of
    replaying the gitignored history. This is the documented durable fix for the
    column already exists flakiness (Need much better testing infracture on local host #1267). Tests literally cannot trust the
    migration history, so they bypass it.
  • fix_sortedm2m_columns management command runs raw ALTER TABLE at every
    startup to add sort_value columns that "may have been migrated before the
    field was changed to SortedManyToManyField" — i.e. patching schema drift by
    hand because the migrations can't be relied on.
  • backfill_num_pages and backfill_project_visibility are data
    backfills wired into the entrypoint instead of being expressed as RunPython
    data migrations.
  • The repeated makemigrations website in the entrypoint ("often fixes some
    first-time run issues") is a symptom of the same fragility.
  • A stale website/migrations_old/ tree with .not_used / .old_not_used
    files is the archaeological residue of past manual history surgery.

Proposed path forward

The goal: committed migrations become the source of truth, and servers only
apply them.
Because we have long-lived prod/test databases with real data and
limited prod access, the delicate part is baselining the existing databases
onto a fresh committed history without rebuilding tables.

Suggested phased approach (to be validated with someone comfortable with Django
migrations before touching prod):

  1. Establish a clean baseline migration.
    • From the current models, generate a single squashed 0001_initial (and any
      necessary follow-ups) that exactly represents today's schema. Commit it.
    • Before trusting it, request a prod DB schema snapshot from CSE IT (the
      sanctioned way to inspect prod data per CLAUDE.md) and diff it against what
      the models/baseline produce, so we know prod actually matches the models and
      there's no hidden drift.
  2. Un-ignore migrations. Remove website/migrations/ (keep
    __pycache__ ignored) and website/migrations_old/ from .gitignore; delete
    the dead migrations_old/ tree.
  3. Fake-apply the baseline on existing environments. For databases whose
    tables already match the baseline, record it as applied without re-running DDL:
    python manage.py migrate website --fake-initial (or migrate website 0001 --fake). This is the safe way to adopt a new history on a populated DB.
  4. Stop generating migrations on servers. Remove makemigrations /
    makemigrations website from docker-entrypoint.sh; keep only migrate.
    Going forward, migrations are authored locally, reviewed in a PR, committed,
    and merely applied on deploy.
  5. Add CI drift detection. Run python manage.py makemigrations --check --dry-run in .github/workflows/test.yml so a model change without a
    committed migration fails CI. (We can then likely drop the
    MIGRATION_MODULES = {"website": None} test shim, or keep it as belt-and-
    suspenders.)
  6. Fold the workarounds back into real migrations where appropriate: convert
    backfill_num_pages / backfill_project_visibility into RunPython data
    migrations; once the schema is reliably reproduced, retire
    fix_sortedm2m_columns and the duplicated makemigrations step.

Open questions / risks

  • Reconciling divergent existing histories. Prod, test, and local have each
    accreted their own migration files. The --fake-initial baseline only works
    cleanly if their table shapes already agree with the new baseline; we need to
    confirm that (hence the schema-diff step) before deploying.
  • No direct prod access makes step 3 a one-shot, entrypoint-driven operation
    we can only verify through logs — it needs to be scripted carefully and tested
    on the test server first.
  • Decide the fate of migrations_old/ and the .not_used files (almost
    certainly safe to delete, but worth a deliberate call).

Why now

We're actively cleaning up tech debt and just stood up CI + a tests-first
workflow. Committed migrations are a prerequisite for trustworthy CI of
schema-affecting changes and remove a whole class of "works on my machine /
breaks on the server" failures. It's also far easier to baseline a decade-old
schema deliberately now than after the next contributor's model change quietly
diverges prod again.


Drafted from a study of .gitignore, docker-entrypoint.sh,
makeabilitylab/settings_test.py, the website/management/commands/ workarounds,
the compose files, and the git history of commit f149094.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions