You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
website/migrations/ is gitignored and zero migration files are tracked in
Git. Instead, every environment — each laptop, CI, the test server, and the
production server — generates its own migration history at container start by
running makemigrations followed by migrate in docker-entrypoint.sh.
This means our database schema has no single source of truth in version
control. Each environment's website/migrations/ directory is a private,
divergent history that lives only on that machine's disk. This is contrary to
Django's design and to standard practice, and it has already cost us real
debugging time and accreted a set of workarounds.
This issue documents the problem, why it breaks best practices, and proposes a
path to make committed migrations the source of truth.
Background / how we got here
Migrations were removed from the repo on 2016-05-17 (commit f149094,
"Removed migrations from repo," referencing the old issue jonfroehlich/makeabilitylabwebsite#30). They have been gitignored ever
since — roughly a decade and ~29 contributors / 2,200+ commits later. The
project itself predates this repo (it goes back to ~2013 at UMD), so the
schema has a very long, multi-contributor lineage.
Both docker-compose.yml (test/prod) and docker-compose-local-dev.yml
bind-mount the project root to /code (.:/code), so the migrations Django
auto-generates on a server are written into that server's checkout and
persist there — but they are never committed, and there is no reason to expect
them to match what any other environment generated.
Locally there are currently 35 auto-generated migrations (0001_initial.py … 0035_project_is_visible_…py) plus a leftover website/migrations_old/
directory with a parallel, partly-.not_used history. None of this is in Git.
Why this breaks best practices
Django migrations are schema-as-code: they are meant to be committed,
code-reviewed, and replayed deterministically so that every environment converges
on the same schema. Generating them at runtime defeats all of that:
No single source of truth for the schema. The "real" schema is whatever
has accreted on each machine. There is no artifact in the repo that answers
"what is the current database shape and how did it get there?"
Schema drift between environments. Because makemigrations runs
independently on each host, laptops/CI/test/prod can produce migrations with
different names, ordering, or contents. This is the direct cause of the
intermittent column "..." already exists failures when building a fresh
test DB (issue Need much better testing infracture on local host #1267).
Schema changes bypass code review. A model change's data-layer
consequence (the generated DDL) is never seen in a PR. Reviewers can't catch a
destructive or ambiguous migration (e.g. a column rename Django guesses wrong,
or a RemoveField that silently drops data) before it hits a server.
Non-reproducible builds. A fresh checkout cannot reconstruct the schema
from the repo alone; it depends on makemigrations making the same guesses
today that it made historically. Auto-named migrations
(0007_auto_20151221_1218) are exactly the ambiguous-history case Django
warns about.
No safe data migrations or rollbacks. Real schema changes often need
paired data backfills, and Django expresses those as migration operations
(RunPython). Without committed migrations there's nowhere to put them and no
reliable migrate <app> <number> to roll back to.
makemigrations at runtime on a server is itself an anti-pattern. Servers
should only ever apply reviewed migrations (migrate), never invent them.
Auto-generating schema changes on production start-up means a model edit can
silently alter the prod schema with no review gate.
This is especially dangerous given our prod access model. Per CLAUDE.md,
the maintainer has no shell/SSH access to the prod Docker host and no direct
DB connection — everything must run inside the container via the entrypoint.
So if prod schema drifts or a runtime makemigrations does something
unexpected, we have very limited ability to inspect or repair it. We are
trusting an unreviewed, auto-generated process on a system we can't directly
reach.
Workarounds this has already forced
The absence of committed migrations is load-bearing for several pieces of code —
evidence of the cost:
makeabilitylab/settings_test.py sets MIGRATION_MODULES = {"website": None} so tests build tables directly from the models (run_syncdb) instead of
replaying the gitignored history. This is the documented durable fix for the column already exists flakiness (Need much better testing infracture on local host #1267). Tests literally cannot trust the
migration history, so they bypass it.
fix_sortedm2m_columns management command runs raw ALTER TABLE at every
startup to add sort_value columns that "may have been migrated before the
field was changed to SortedManyToManyField" — i.e. patching schema drift by
hand because the migrations can't be relied on.
backfill_num_pages and backfill_project_visibility are data
backfills wired into the entrypoint instead of being expressed as RunPython
data migrations.
The repeated makemigrations website in the entrypoint ("often fixes some
first-time run issues") is a symptom of the same fragility.
A stale website/migrations_old/ tree with .not_used / .old_not_used
files is the archaeological residue of past manual history surgery.
Proposed path forward
The goal: committed migrations become the source of truth, and servers only
apply them. Because we have long-lived prod/test databases with real data and
limited prod access, the delicate part is baselining the existing databases
onto a fresh committed history without rebuilding tables.
Suggested phased approach (to be validated with someone comfortable with Django
migrations before touching prod):
Establish a clean baseline migration.
From the current models, generate a single squashed 0001_initial (and any
necessary follow-ups) that exactly represents today's schema. Commit it.
Before trusting it, request a prod DB schema snapshot from CSE IT (the
sanctioned way to inspect prod data per CLAUDE.md) and diff it against what
the models/baseline produce, so we know prod actually matches the models and
there's no hidden drift.
Un-ignore migrations. Remove website/migrations/ (keep __pycache__ ignored) and website/migrations_old/ from .gitignore; delete
the dead migrations_old/ tree.
Fake-apply the baseline on existing environments. For databases whose
tables already match the baseline, record it as applied without re-running DDL: python manage.py migrate website --fake-initial (or migrate website 0001 --fake). This is the safe way to adopt a new history on a populated DB.
Stop generating migrations on servers. Remove makemigrations / makemigrations website from docker-entrypoint.sh; keep only migrate.
Going forward, migrations are authored locally, reviewed in a PR, committed,
and merely applied on deploy.
Add CI drift detection. Run python manage.py makemigrations --check --dry-run in .github/workflows/test.yml so a model change without a
committed migration fails CI. (We can then likely drop the MIGRATION_MODULES = {"website": None} test shim, or keep it as belt-and-
suspenders.)
Fold the workarounds back into real migrations where appropriate: convert backfill_num_pages / backfill_project_visibility into RunPython data
migrations; once the schema is reliably reproduced, retire fix_sortedm2m_columns and the duplicated makemigrations step.
Open questions / risks
Reconciling divergent existing histories. Prod, test, and local have each
accreted their own migration files. The --fake-initial baseline only works
cleanly if their table shapes already agree with the new baseline; we need to
confirm that (hence the schema-diff step) before deploying.
No direct prod access makes step 3 a one-shot, entrypoint-driven operation
we can only verify through logs — it needs to be scripted carefully and tested
on the test server first.
Decide the fate of migrations_old/ and the .not_used files (almost
certainly safe to delete, but worth a deliberate call).
Why now
We're actively cleaning up tech debt and just stood up CI + a tests-first
workflow. Committed migrations are a prerequisite for trustworthy CI of
schema-affecting changes and remove a whole class of "works on my machine /
breaks on the server" failures. It's also far easier to baseline a decade-old
schema deliberately now than after the next contributor's model change quietly
diverges prod again.
Drafted from a study of .gitignore, docker-entrypoint.sh, makeabilitylab/settings_test.py, the website/management/commands/ workarounds,
the compose files, and the git history of commit f149094.
website/migrations/is gitignored and zero migration files are tracked inGit. Instead, every environment — each laptop, CI, the test server, and the
production server — generates its own migration history at container start by
running
makemigrationsfollowed bymigrateindocker-entrypoint.sh.This means our database schema has no single source of truth in version
control. Each environment's
website/migrations/directory is a private,divergent history that lives only on that machine's disk. This is contrary to
Django's design and to standard practice, and it has already cost us real
debugging time and accreted a set of workarounds.
This issue documents the problem, why it breaks best practices, and proposes a
path to make committed migrations the source of truth.
Background / how we got here
f149094,"Removed migrations from repo," referencing the old issue
jonfroehlich/makeabilitylabwebsite#30). They have been gitignored eversince — roughly a decade and ~29 contributors / 2,200+ commits later. The
project itself predates this repo (it goes back to ~2013 at UMD), so the
schema has a very long, multi-contributor lineage.
.gitignorecurrently contains:docker-entrypoint.shruns, on every container start:docker-compose.yml(test/prod) anddocker-compose-local-dev.ymlbind-mount the project root to
/code(.:/code), so the migrations Djangoauto-generates on a server are written into that server's checkout and
persist there — but they are never committed, and there is no reason to expect
them to match what any other environment generated.
0001_initial.py…0035_project_is_visible_…py) plus a leftoverwebsite/migrations_old/directory with a parallel, partly-
.not_usedhistory. None of this is in Git.Why this breaks best practices
Django migrations are schema-as-code: they are meant to be committed,
code-reviewed, and replayed deterministically so that every environment converges
on the same schema. Generating them at runtime defeats all of that:
has accreted on each machine. There is no artifact in the repo that answers
"what is the current database shape and how did it get there?"
makemigrationsrunsindependently on each host, laptops/CI/test/prod can produce migrations with
different names, ordering, or contents. This is the direct cause of the
intermittent
column "..." already existsfailures when building a freshtest DB (issue Need much better testing infracture on local host #1267).
consequence (the generated DDL) is never seen in a PR. Reviewers can't catch a
destructive or ambiguous migration (e.g. a column rename Django guesses wrong,
or a
RemoveFieldthat silently drops data) before it hits a server.from the repo alone; it depends on
makemigrationsmaking the same guessestoday that it made historically. Auto-named migrations
(
0007_auto_20151221_1218) are exactly the ambiguous-history case Djangowarns about.
paired data backfills, and Django expresses those as migration operations
(
RunPython). Without committed migrations there's nowhere to put them and noreliable
migrate <app> <number>to roll back to.makemigrationsat runtime on a server is itself an anti-pattern. Serversshould only ever apply reviewed migrations (
migrate), never invent them.Auto-generating schema changes on production start-up means a model edit can
silently alter the prod schema with no review gate.
the maintainer has no shell/SSH access to the prod Docker host and no direct
DB connection — everything must run inside the container via the entrypoint.
So if prod schema drifts or a runtime
makemigrationsdoes somethingunexpected, we have very limited ability to inspect or repair it. We are
trusting an unreviewed, auto-generated process on a system we can't directly
reach.
Workarounds this has already forced
The absence of committed migrations is load-bearing for several pieces of code —
evidence of the cost:
makeabilitylab/settings_test.pysetsMIGRATION_MODULES = {"website": None}so tests build tables directly from the models (run_syncdb) instead ofreplaying the gitignored history. This is the documented durable fix for the
column already existsflakiness (Need much better testing infracture on local host #1267). Tests literally cannot trust themigration history, so they bypass it.
fix_sortedm2m_columnsmanagement command runs rawALTER TABLEat everystartup to add
sort_valuecolumns that "may have been migrated before thefield was changed to
SortedManyToManyField" — i.e. patching schema drift byhand because the migrations can't be relied on.
backfill_num_pagesandbackfill_project_visibilityare databackfills wired into the entrypoint instead of being expressed as
RunPythondata migrations.
makemigrations websitein the entrypoint ("often fixes somefirst-time run issues") is a symptom of the same fragility.
website/migrations_old/tree with.not_used/.old_not_usedfiles is the archaeological residue of past manual history surgery.
Proposed path forward
The goal: committed migrations become the source of truth, and servers only
apply them. Because we have long-lived prod/test databases with real data and
limited prod access, the delicate part is baselining the existing databases
onto a fresh committed history without rebuilding tables.
Suggested phased approach (to be validated with someone comfortable with Django
migrations before touching prod):
0001_initial(and anynecessary follow-ups) that exactly represents today's schema. Commit it.
sanctioned way to inspect prod data per CLAUDE.md) and diff it against what
the models/baseline produce, so we know prod actually matches the models and
there's no hidden drift.
website/migrations/(keep__pycache__ignored) andwebsite/migrations_old/from.gitignore; deletethe dead
migrations_old/tree.tables already match the baseline, record it as applied without re-running DDL:
python manage.py migrate website --fake-initial(ormigrate website 0001 --fake). This is the safe way to adopt a new history on a populated DB.makemigrations/makemigrations websitefromdocker-entrypoint.sh; keep onlymigrate.Going forward, migrations are authored locally, reviewed in a PR, committed,
and merely applied on deploy.
python manage.py makemigrations --check --dry-runin.github/workflows/test.ymlso a model change without acommitted migration fails CI. (We can then likely drop the
MIGRATION_MODULES = {"website": None}test shim, or keep it as belt-and-suspenders.)
backfill_num_pages/backfill_project_visibilityintoRunPythondatamigrations; once the schema is reliably reproduced, retire
fix_sortedm2m_columnsand the duplicatedmakemigrationsstep.Open questions / risks
accreted their own migration files. The
--fake-initialbaseline only workscleanly if their table shapes already agree with the new baseline; we need to
confirm that (hence the schema-diff step) before deploying.
we can only verify through logs — it needs to be scripted carefully and tested
on the test server first.
migrations_old/and the.not_usedfiles (almostcertainly safe to delete, but worth a deliberate call).
Why now
We're actively cleaning up tech debt and just stood up CI + a tests-first
workflow. Committed migrations are a prerequisite for trustworthy CI of
schema-affecting changes and remove a whole class of "works on my machine /
breaks on the server" failures. It's also far easier to baseline a decade-old
schema deliberately now than after the next contributor's model change quietly
diverges prod again.
Drafted from a study of
.gitignore,docker-entrypoint.sh,makeabilitylab/settings_test.py, thewebsite/management/commands/workarounds,the compose files, and the git history of commit
f149094.