Scripts and data for archiving KB-managed websites to the Internet Archive's Wayback Machine.
Maintained by KB, national library of the Netherlands
This repository has a companion GitHub Pages website with screenshot galleries, interactive navigation, and comprehensive documentation.
Some websites managed by the KB have been discontinued. To preserve their content for Wikipedia sourcing and cultural heritage purposes, the KB actively archives websites to the Wayback Machine at web.archive.org.
| Site | Archive date | # URLs | Link to dataset (.tsv, .txt, .xlsx) |
|---|---|---|---|
| Medieval Manuscripts in Dutch Collections (catalog records) | Apr 2026 | 11.738 | Excel file (sheets catalog-pages and catalog-pages-full-metadata) |
| Medieval Manuscripts in Dutch Collections (static pages, PDFs, assets) | Dec 2025 | 466 | Excel file (sheet non-catalog-pages) |
| Medieval Illuminated Manuscripts (manuscripts.kb.nl) | Dec 2025 | 7.460 | Excel file |
| kb.nl (new) | Mar 2022 | 1.915 | Excel file and CSV |
| Literatuurgeschiedenis.org | Mar 2022 | 465 | Excel file and CSV |
| kb.nl (old) | Dec 2021 | 5.720 | Excel file and CSV |
| Literatuurplein.nl | Dec 2019 | 69.599 | See this Data overview |
| Gidsvoornederland.nl | Nov 2018 | 1.300 | TXT |
| Literaireprijzen.nl | Oct 2018 | 452 | TXT |
| Lezenvoordelijst.nl | Aug 2018 | 12.456 | TXT |
| Leesplein.nl | Jun 2018 | 23.785 | TXT |
Read the stories behind some of these archiving projects — narratives of how (parts of) KB websites were rescued from the digital memory hole, and the role AI assistants played along the way.
This project was transformed in December 2025 through an intensive AI-human collaboration:
- 10+ hours of development across Dec 2-3, 2025
- 33+ commits reorganizing and enhancing the repository
- Built using Claude Opus 4.5 AI assistant via Claude Code CLI
- Repository reorganization - Clean hierarchical folder structure
- Screenshot galleries - 36 Wayback Machine screenshots captured via Python/Playwright
- GitHub Pages website - Responsive site with navigation, lightbox, and breadcrumbs
- AI vision recognition - Used multimodal AI to extract meaningful captions from screenshots
- EU compliance - GDPR, WCAG 2.1 Level AA, comprehensive accessibility features
Location: scripts/wbm-archiver/
Python script with three modes:
- Save pages to the Wayback Machine
- Retrieve the latest archived version
- Retrieve the oldest archived version
Requirements: Python 3.x, waybackpy
Archive pages without Python: archive.org/services/wayback-gsheets/
The companion website meets European standards:
- GDPR/AVG - No cookies, no tracking, no personal data
- WCAG 2.1 Level AA - Full accessibility compliance
- Responsive design - Desktop, tablet, mobile support
- SEO optimized - Schema.org, Open Graph, Twitter Cards
View compliance documentation →
The source code and text content of this project are dedicated to the public domain under CC0 1.0.
Note: This license does not apply to:
- Wayback Machine screenshots (third-party copyrights)
- KB logo (CC BY-SA 3.0)
- Social media brand icons (respective trademarks)
See Image credits & copyrights for details.