|
| 1 | +# Language Experiments |
| 2 | + |
| 3 | +Language Experiments turns books into visual fingerprints. It tokenizes a text, |
| 4 | +computes a linguistic metric, maps the metric to colors, and writes PNG and |
| 5 | +HTML viewers where each pixel represents a token or a sliding window of tokens. |
| 6 | + |
| 7 | +## Setup |
| 8 | + |
| 9 | +```bash |
| 10 | +uv sync |
| 11 | +``` |
| 12 | + |
| 13 | +## Commands |
| 14 | + |
| 15 | +List available metrics and color maps: |
| 16 | + |
| 17 | +```bash |
| 18 | +uv run bookviz list |
| 19 | +``` |
| 20 | + |
| 21 | +Render one book: |
| 22 | + |
| 23 | +```bash |
| 24 | +uv run bookviz render books/dubliners.txt --metric word-freq --color heat --html |
| 25 | +``` |
| 26 | + |
| 27 | +The legacy form still works: |
| 28 | + |
| 29 | +```bash |
| 30 | +uv run python book_png.py books/dubliners.txt --metric word-freq --html |
| 31 | +``` |
| 32 | + |
| 33 | +Render a sliding-window view: |
| 34 | + |
| 35 | +```bash |
| 36 | +uv run bookviz render books/ulysses.txt \ |
| 37 | + --metric lexical-diversity \ |
| 38 | + --window-size 200 \ |
| 39 | + --window-step 50 \ |
| 40 | + --html |
| 41 | +``` |
| 42 | + |
| 43 | +Compare books with shared color normalization: |
| 44 | + |
| 45 | +```bash |
| 46 | +uv run bookviz compare \ |
| 47 | + books/dubliners.txt books/ulysses.txt books/moby-dick.txt \ |
| 48 | + --metric lexical-diversity \ |
| 49 | + --window-size 200 \ |
| 50 | + --output outputs/comparison.png \ |
| 51 | + --html |
| 52 | +``` |
| 53 | + |
| 54 | +Download a book from Project Gutenberg: |
| 55 | + |
| 56 | +```bash |
| 57 | +uv run bookviz gutenberg 2701 --title moby-dick |
| 58 | +``` |
| 59 | + |
| 60 | +Generate a static gallery: |
| 61 | + |
| 62 | +```bash |
| 63 | +uv run bookviz gallery \ |
| 64 | + --input books \ |
| 65 | + --output site \ |
| 66 | + --metrics word-freq lexical-diversity bigram-diversity \ |
| 67 | + --window-size 200 |
| 68 | +``` |
| 69 | + |
| 70 | +## Metrics |
| 71 | + |
| 72 | +Token metrics: |
| 73 | + |
| 74 | +- `word-freq` |
| 75 | +- `word-freq-linear` |
| 76 | +- `bigram-prob` |
| 77 | +- `bigram-diversity` |
| 78 | +- `word-length` |
| 79 | +- `word-position` |
| 80 | +- `unique-word` |
| 81 | + |
| 82 | +Window metrics: |
| 83 | + |
| 84 | +- `avg-word-length` |
| 85 | +- `lexical-diversity` |
| 86 | +- `punctuation-density` |
| 87 | +- `repetition-density` |
| 88 | +- `sentence-length` |
| 89 | + |
| 90 | +Token metrics can also be used with `--window-size`; their values are averaged |
| 91 | +inside each window. |
| 92 | + |
| 93 | +## GitHub Pages |
| 94 | + |
| 95 | +The workflow at `.github/workflows/pages.yml` builds the gallery with `uv` and |
| 96 | +publishes the generated `site/` directory to GitHub Pages. Enable Pages in the |
| 97 | +repository settings and choose GitHub Actions as the source. |
| 98 | + |
0 commit comments