Skip to content

fix(wiki): eliminate duplicate-page bugs (path normalization, move_page, FTS triggers)#36

Merged
aniongithub merged 3 commits into
mainfrom
fixes/page-path
May 17, 2026
Merged

fix(wiki): eliminate duplicate-page bugs (path normalization, move_page, FTS triggers)#36
aniongithub merged 3 commits into
mainfrom
fixes/page-path

Conversation

@aniongithub
Copy link
Copy Markdown
Owner

Fixes #35.

Three related fixes for the duplicate-page class of bugs.

1. Path normalization (2d640d4)

CreatePage/UpdatePage/DeletePage/GetPage/ListPages used the raw input string as the SQLite primary key while resolving the file via filepath.Join, which normalizes. Equivalent denormalized spellings (/projects/foo, projects/foo, projects//foo, projects/foo.md, etc.) produced two index rows for the same on-disk file — surfacing as duplicates in search and list until the next process restart cleaned them up via Reindex Phase 4.

Adds normalizePagePath() and applies it at every public entry point. Rejects empty, ., /, .., and any ../ traversal.

2. move_page tool (cd6cfeb)

Adds Wiki.MovePage and the corresponding move_page MCP tool, so agents can atomically rename/relocate a page instead of doing create_page at the new path + (often forgotten) delete_page at the old path — the pattern that was leaving duplicate files on disk.

  • Normalizes both from and to.
  • Refuses if normalized paths are equal, source is missing, or destination already exists.
  • Acquires per-page locks in sorted order to avoid deadlocks.
  • os.Renames the file, then refreshes the index (drops the old row + outgoing-link rows; re-indexes under the new path).
  • Backlinks (links.target = old_path) are intentionally not rewritten — those rows reflect [[wikilink]] text in other pages' markdown that still says the old name. Rewriting would make the index lie.

3. FTS recursive_triggers + rebuild (60f35ad)

SQLite fires AFTER DELETE triggers on the implicit row removal inside INSERT OR REPLACE only when PRAGMA recursive_triggers = ON. With the default OFF, every indexPage / Reindex write left an orphan docid in pages_fts pointing at the old revision. Search's JOIN masked it, but the FTS index grew unbounded, and a direct probe demonstrates the leak:

INSERT OR REPLACE INTO pages (path='foo', body='uniqueoldtoken');
INSERT OR REPLACE INTO pages (path='foo', body='uniquenewtoken');
SELECT rowid FROM pages_fts WHERE pages_fts MATCH 'uniqueoldtoken';
-- before: returns rowid=1  (orphan)
-- after:  returns nothing

Sets _pragma=recursive_triggers(1) in the DSN (applies to every pooled connection) and runs an INSERT INTO pages_fts(pages_fts) VALUES('rebuild') once on Open() to purge orphans left behind by prior versions.

Backward compatibility

None of the three changes is breaking for existing .mind-map.db files. No schema changes. Denormalized rows are removed by the existing Reindex Phase 4 on first startup; FTS orphans are removed by the one-shot rebuild. Tables, columns, triggers, and FTS virtual table are unchanged.

Tests

Full go test ./... ✅, including new regression tests:

  • TestNormalizePagePath
  • TestDuplicateIndexRowsViaDenormalizedPaths
  • TestRejectPathEscapingWikiRoot
  • TestMovePage / …NormalizesPaths / …FailsWhenDestinationExists / …FailsWhenSourceMissing / …FailsWhenSamePath
  • MCP-level TestMovePage
  • TestFTSDoesNotLeakOrphansOnUpdate (probes pages_fts directly to bypass the masking JOIN)

CreatePage/UpdatePage/DeletePage/GetPage/ListPages stored the raw
input string as the SQLite primary key while resolving the file on
disk via filepath.Join, which normalizes the path. Equivalent
denormalized spellings ("/projects/foo", "projects/foo",
"projects//foo", "projects/foo.md", etc.) thus produced two index
rows for the same on-disk file, surfacing as duplicates in search
and list results until the next process restart.

Add normalizePagePath() and apply it at every public entry point.
It rejects empty, ".", "/", and any path that escapes the wiki
root via "..".

Fixes #35
Adds Wiki.MovePage and a corresponding move_page MCP tool. The
operation:

  - Normalizes both source and destination paths.
  - Refuses if normalized paths are equal, source is missing, or
    destination already exists.
  - Acquires per-page locks in sorted order to avoid deadlocks
    against concurrent moves.
  - Renames the file on disk, then refreshes the index: removes
    the old row (which drops the page's outgoing-link rows) and
    re-indexes under the new path.
  - Leaves backlinks (other pages' [[wikilink]] text pointing to
    the old name) untouched — those rows reflect what is actually
    in the source markdown.

This replaces the common create_page + delete_page workaround
pattern that was leaving duplicate pages behind when agents
forgot the second step.

Refs #35
SQLite fires AFTER DELETE triggers on the implicit row removal inside
INSERT OR REPLACE only when PRAGMA recursive_triggers is ON. With the
default OFF, every indexPage / Reindex write left an orphan docid in
pages_fts pointing at the old revision. The Search() JOIN masked it,
but the FTS index grew unbounded and queries that bypass the JOIN
(probed directly: SELECT rowid FROM pages_fts WHERE pages_fts MATCH
'old-token') returned ghost hits.

  - Set _pragma=recursive_triggers(1) in the DSN so it applies to
    every pooled connection.
  - Run an FTS rebuild once on Open() to purge orphans left behind by
    previous versions.

Adds a regression test that probes pages_fts directly for the old
token.

Refs #35
@aniongithub aniongithub merged commit 7d12cfb into main May 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Page path normalization: equivalent denormalized paths produce duplicate index rows

1 participant