Language Experiments turns books into visual fingerprints. It tokenizes a text, computes a linguistic metric, maps the metric to colors, and writes PNG and HTML viewers where each pixel represents a token or a sliding window of tokens.
uv syncList available metrics and color maps:
uv run bookviz listRender one book:
uv run bookviz render books/dubliners.txt --metric word-freq --color heat --htmlThe legacy form still works:
uv run python book_png.py books/dubliners.txt --metric word-freq --htmlRender a sliding-window view:
uv run bookviz render books/ulysses.txt \
--metric lexical-diversity \
--window-size 200 \
--window-step 50 \
--htmlCompare books with shared color normalization:
uv run bookviz compare \
books/dubliners.txt books/ulysses.txt books/moby-dick.txt \
--metric lexical-diversity \
--window-size 200 \
--output outputs/comparison.png \
--htmlDownload a book from Project Gutenberg:
uv run bookviz gutenberg 2701 --title moby-dickGenerate the GitHub Pages gallery:
uv run bookviz gallery \
--input books \
--output site \
--metrics word-freq lexical-diversity bigram-diversity \
--window-size 200The gallery is a static browser app. It publishes the book text files and lets the browser compute metrics, change window size, change window step, switch books, and redraw the visualization without regenerating images in CI.
Token metrics:
word-freqword-freq-linearbigram-probbigram-diversityword-lengthword-positionunique-word
Window metrics:
avg-word-lengthlexical-diversitypunctuation-densityrepetition-densitysentence-length
Token metrics can also be used with --window-size; their values are averaged
inside each window.
The workflow at .github/workflows/pages.yml builds the client-side gallery
with uv and publishes the generated site/ directory to GitHub Pages. Enable
Pages in the repository settings and choose GitHub Actions as the source.