Skip to content

fix(auth): cascade user deletion across all owned data on PostgreSQL#9702

Open
localai-bot wants to merge 2 commits intomasterfrom
fix/distributed-user-delete-postgres
Open

fix(auth): cascade user deletion across all owned data on PostgreSQL#9702
localai-bot wants to merge 2 commits intomasterfrom
fix/distributed-user-delete-postgres

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

In distributed mode the auth DB is PostgreSQL, which strictly enforces foreign keys. Deleting a user from the admin UI returned "Failed to delete user: user not found" even when the user clearly existed.

Two issues:

  1. invite_codes.created_by / used_by reference users(id) but the InviteCode model declares the FKs without ON DELETE CASCADE. PostgreSQL therefore rejected the user delete with NO ACTION whenever the user had ever issued or consumed an invite. On SQLite (single-node default) FKs are not enforced, so the bug never showed up there.
  2. The handler ignored result.Error and only checked RowsAffected, so the FK violation surfaced as a misleading 404.
  3. Several owned tables were never cleaned up regardless of dialect: user_permissions and quota_rules relied on CASCADE that doesn't fire under SQLite, and usage_records have no FK at all and were left orphaned in every dialect.

Fix

Introduce auth.DeleteUserCascade running the full cleanup in a single transaction:

  • drop invites authored by the user,
  • NULL used_by on invites they consumed (preserves the audit trail),
  • explicitly wipe sessions, API keys, permissions, quota rules, and usage metrics,
  • delete the user, and
  • invalidate the in-memory quota cache after commit.

The HTTP handler now maps the helper's errors to proper status codes — real failures surface as 500 with the cause instead of being swallowed as 404.

Test plan

  • New Ginkgo specs in core/http/auth/users_test.go cover: ErrRecordNotFound on missing user, invite-author cleanup, used_by null-out (audit row preserved), full data wipe (sessions / api keys / permissions / quotas / usage metrics), and the FK-enforced original failure mode via PRAGMA foreign_keys=ON to mirror PostgreSQL behavior on SQLite.
  • Verified red→green by temporarily reverting the production code: 4 of 5 specs failed without the fix, all 5 pass with it.
  • HTTP-level regression specs added in core/http/routes/auth_test.go (build-clean, but the routes package currently has an unrelated build break on master from feat: support word-level timestamps for faster-whisper #9621TranscriptSegment.Words is in the .proto but not in the generated pb.go).
  • Full auth package suite (99 specs) passes.
  • Validate end-to-end against the live distributed cluster after a rebuild + redeploy.

Refs the user-not-found bug observed on the distributed PostgreSQL auth deployment.

mudler and others added 2 commits May 6, 2026 23:06
Deleting a user from the admin UI in distributed mode (PostgreSQL auth
DB) returned "user not found" even when the user clearly existed. The
old handler ignored result.Error and only checked RowsAffected, so a
foreign-key constraint violation surfaced as a misleading 404.

Two issues drove this:

1. invite_codes.created_by / used_by reference users(id) but the
   InviteCode model declared the FKs without ON DELETE CASCADE. On
   PostgreSQL the engine therefore rejected the user delete with NO
   ACTION whenever the user had ever issued or consumed an invite. On
   SQLite (default in single-node mode) FKs are not enforced, so the
   bug never appeared there.
2. Several owned tables were never cleaned up regardless of dialect:
   user_permissions and quota_rules relied on CASCADE that does not
   fire under SQLite, and usage_records have no FK at all and were
   left orphaned in every dialect.

Introduce auth.DeleteUserCascade which runs the full cleanup in a
single transaction: drop invites authored by the user, NULL used_by on
invites they consumed (preserves the audit trail), and explicitly wipe
sessions, API keys, permissions, quota rules, and usage metrics before
deleting the user. The in-memory quota cache is invalidated after
commit so a recreated user with the same id never sees stale entries.
The HTTP handler now maps the helper's errors to proper status codes —
real failures surface as 500 with the cause instead of being swallowed
as "not found".

Add Ginkgo regression coverage in core/http/auth/users_test.go and
core/http/routes/auth_test.go covering invite cleanup, used_by
null-out, full data wipe, and the FK-enforced original failure mode
(via PRAGMA foreign_keys=ON to mirror PostgreSQL behavior on SQLite).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Pulls LocalAGI@main (facd888) and LocalRecall@v0.6.0. The latter
swaps PDF text extraction from dslipak/pdf to gen2brain/go-fitz
(libmupdf bindings) and wraps it in a 60s goroutine timeout —
previously certain PDFs (broken xref tables, encrypted, image-only
without OCR) would hang indefinitely inside r.GetPlainText() and
poison the upload queue.

Pure dep bump, no LocalAI source changes. Indirect graph picks up
go-fitz + purego + ffi; drops dslipak/pdf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants