You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification
GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information
A configurable pipeline for extracting and filtering articles from large corpora, tailored for the Delpher Kranten corpus, with support for features like keyword filtering and tf-idf-based relevance scoring.
Chrome extension that yoinks webpages into clean markdown. Supports article extraction, full-page capture, YouTube transcripts, and visual element picking.
Capture exactly what the user sees and turn any page into structured JSON or clean Markdown. Built for read-later and bookmarking apps, and for AI agents that need token-efficient input. Readability-style, zero dependencies, single file.
Chrome/Edge extension that estimates article word count, reading time, and lets you double-click any word to update the toolbar badge with your reading progress.