backend: pull Wikipedia song intro into the research pipeline#217
Merged
Conversation
Add inline editors to the admin song detail page: - MB Work ID and Secondary MB Work ID are both shown and editable. Entering an ID runs a MusicBrainz work lookup (cached via MusicBrainzSearcher) that surfaces the id, title, and composer/ writer/lyricist credits; Save is gated on a successful lookup so an unverified ID cannot be persisted. Slots with a value get Clear. - Alternate titles render under the song title as chips with an add/remove editor that saves to songs.alt_titles. Backing endpoints: - GET /admin/musicbrainz/work/<id>/lookup - POST /admin/songs/<id>/mb-id (slot: primary|second) - POST /admin/songs/<id>/alt-titles CSRF is handled by the existing admin.js fetch wrapper. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The song intro shown in the app lives in songs.structure, fetched from the song's Wikipedia URL via the MediaWiki extracts API. That logic only ever ran as a one-time backfill (scripts/onetime_scripts/ one_time_song_wiki_intro.py); it was never wired into ongoing import, so newly imported songs got a wikipedia_url but never the intro text. - Add integrations/wikipedia/song_intro.py: parse_wikipedia_url, fetch_wikipedia_intro, and update_song_wikipedia_intro(song_id). The updater reads the wikipedia_url already on the song, fetches the lead section, and writes it to songs.structure. Idempotent like the sibling MB updaters (skips if structure is set) except force_refresh re-pulls. - Wire it as Step 1.8 in core.song_research.research_song, after the Wikipedia-URL step it depends on, passing through force_refresh. Lives under integrations/wikipedia (not the MusicBrainz updaters in song_updates.py) because it talks to MediaWiki, not MusicBrainz. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Introduce core/http_client.py with HTTP_USER_AGENT and make_session(), a single home for the outbound User-Agent that's currently copy-pasted into ~20 files. song_intro.py now builds its session via make_session() instead of hardcoding the UA string. A follow-up PR will sweep the remaining call sites onto the factory. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Wikipedia "biography/summary" shown on a song's detail page lives in
songs.structure, fetched from the song's Wikipedia URL via the MediaWiki extracts API. That logic was only ever run as a one-time backfill (scripts/onetime_scripts/one_time_song_wiki_intro.py) — it was never wired into the ongoing import/research pipeline. So newly imported songs (e.g. Afro Blue, Beautiful Love) got theirwikipedia_urlset but never the intro text.This wires the intro fetch into the research pipeline so every imported/refreshed song with a
wikipedia_urlgets its intro pulled in.Changes
backend/integrations/wikipedia/song_intro.py:parse_wikipedia_url,fetch_wikipedia_intro, andupdate_song_wikipedia_intro(song_id). The updater reads thewikipedia_urlalready on the song, fetches the lead section, and writes it tosongs.structure. Idempotent like the sibling MusicBrainz updaters (skips ifstructureis already set) — exceptforce_refreshre-pulls so a deep refresh picks up Wikipedia edits.core.song_research.research_song, right after the Wikipedia-URL step it depends on, passing throughforce_refresh.Placement note
It lives under
integrations/wikipedia/rather than the MusicBrainz updaters insong_updates.py— that module is explicitly scoped to MusicBrainz lookups, and this fetch talks to MediaWiki, consuming thewikipedia_urlMB already resolved.Verification
core.song_researchimport resolve cleanly; both files compile.en.wikipedia.org/wiki/Afro_Blue→ "Afro Blue" is a jazz standard composed by Mongo Santamaría.research_workerWikipedia handler is performer-only — so this is the single place that needed it.Follow-ups (not in this PR)
structureis doing double duty for the intro; the backfill's own docstring notes it's temporary scaffolding pending a dedicated column.wikipedia_url, which depends on MusicBrainz carrying a Wikipedia/Wikidata relation on the work.one_time_song_wiki_intro.py --only-empty) is run separately by the maintainer.🤖 Generated with Claude Code