Skip to content

backend: pull Wikipedia song intro into the research pipeline#217

Merged
dprodger merged 3 commits into
mainfrom
song-wikipedia-intro-import
Jun 14, 2026
Merged

backend: pull Wikipedia song intro into the research pipeline#217
dprodger merged 3 commits into
mainfrom
song-wikipedia-intro-import

Conversation

@dprodger

Copy link
Copy Markdown
Owner

Summary

The Wikipedia "biography/summary" shown on a song's detail page lives in songs.structure, fetched from the song's Wikipedia URL via the MediaWiki extracts API. That logic was only ever run as a one-time backfill (scripts/onetime_scripts/one_time_song_wiki_intro.py) — it was never wired into the ongoing import/research pipeline. So newly imported songs (e.g. Afro Blue, Beautiful Love) got their wikipedia_url set but never the intro text.

This wires the intro fetch into the research pipeline so every imported/refreshed song with a wikipedia_url gets its intro pulled in.

Changes

  • New backend/integrations/wikipedia/song_intro.py: parse_wikipedia_url, fetch_wikipedia_intro, and update_song_wikipedia_intro(song_id). The updater reads the wikipedia_url already on the song, fetches the lead section, and writes it to songs.structure. Idempotent like the sibling MusicBrainz updaters (skips if structure is already set) — except force_refresh re-pulls so a deep refresh picks up Wikipedia edits.
  • Wired as Step 1.8 in core.song_research.research_song, right after the Wikipedia-URL step it depends on, passing through force_refresh.

Placement note

It lives under integrations/wikipedia/ rather than the MusicBrainz updaters in song_updates.py — that module is explicitly scoped to MusicBrainz lookups, and this fetch talks to MediaWiki, consuming the wikipedia_url MB already resolved.

Verification

  • New module + the wired core.song_research import resolve cleanly; both files compile.
  • Live fetch confirmed against en.wikipedia.org/wiki/Afro_Blue"Afro Blue" is a jazz standard composed by Mongo Santamaría.
  • Confirmed there's no parallel song-research path — the durable research_worker Wikipedia handler is performer-only — so this is the single place that needed it.

Follow-ups (not in this PR)

  • structure is doing double duty for the intro; the backfill's own docstring notes it's temporary scaffolding pending a dedicated column.
  • A song only gets an intro once it has a wikipedia_url, which depends on MusicBrainz carrying a Wikipedia/Wikidata relation on the work.
  • Backfilling already-imported songs (one_time_song_wiki_intro.py --only-empty) is run separately by the maintainer.

🤖 Generated with Claude Code

dprodger and others added 3 commits June 14, 2026 15:16
Add inline editors to the admin song detail page:

- MB Work ID and Secondary MB Work ID are both shown and editable.
  Entering an ID runs a MusicBrainz work lookup (cached via
  MusicBrainzSearcher) that surfaces the id, title, and composer/
  writer/lyricist credits; Save is gated on a successful lookup so
  an unverified ID cannot be persisted. Slots with a value get Clear.
- Alternate titles render under the song title as chips with an
  add/remove editor that saves to songs.alt_titles.

Backing endpoints:
- GET  /admin/musicbrainz/work/<id>/lookup
- POST /admin/songs/<id>/mb-id        (slot: primary|second)
- POST /admin/songs/<id>/alt-titles

CSRF is handled by the existing admin.js fetch wrapper.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The song intro shown in the app lives in songs.structure, fetched from
the song's Wikipedia URL via the MediaWiki extracts API. That logic only
ever ran as a one-time backfill (scripts/onetime_scripts/
one_time_song_wiki_intro.py); it was never wired into ongoing import, so
newly imported songs got a wikipedia_url but never the intro text.

- Add integrations/wikipedia/song_intro.py: parse_wikipedia_url,
  fetch_wikipedia_intro, and update_song_wikipedia_intro(song_id). The
  updater reads the wikipedia_url already on the song, fetches the lead
  section, and writes it to songs.structure. Idempotent like the sibling
  MB updaters (skips if structure is set) except force_refresh re-pulls.
- Wire it as Step 1.8 in core.song_research.research_song, after the
  Wikipedia-URL step it depends on, passing through force_refresh.

Lives under integrations/wikipedia (not the MusicBrainz updaters in
song_updates.py) because it talks to MediaWiki, not MusicBrainz.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Introduce core/http_client.py with HTTP_USER_AGENT and make_session(), a
single home for the outbound User-Agent that's currently copy-pasted into
~20 files. song_intro.py now builds its session via make_session() instead
of hardcoding the UA string. A follow-up PR will sweep the remaining call
sites onto the factory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dprodger dprodger merged commit 60a4aac into main Jun 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant