Skip to content

Commit e29f288

Browse files
committed
Add bookviz CLI and gallery workflow
1 parent 868e8cc commit e29f288

18 files changed

Lines changed: 1522 additions & 592 deletions

.github/workflows/pages.yml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: Deploy Gallery
2+
3+
on:
4+
push:
5+
branches: [main]
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: read
10+
pages: write
11+
id-token: write
12+
13+
concurrency:
14+
group: pages
15+
cancel-in-progress: false
16+
17+
jobs:
18+
build:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
- name: Install uv
23+
uses: astral-sh/setup-uv@v5
24+
- name: Set up Python
25+
uses: actions/setup-python@v5
26+
with:
27+
python-version: "3.12"
28+
- name: Install dependencies
29+
run: uv sync --locked
30+
- name: Build gallery
31+
run: uv run bookviz gallery --input books --output site --metrics word-freq lexical-diversity --window-size 200
32+
- name: Upload artifact
33+
uses: actions/upload-pages-artifact@v3
34+
with:
35+
path: site
36+
37+
deploy:
38+
environment:
39+
name: github-pages
40+
url: ${{ steps.deployment.outputs.page_url }}
41+
runs-on: ubuntu-latest
42+
needs: build
43+
steps:
44+
- name: Deploy to GitHub Pages
45+
id: deployment
46+
uses: actions/deploy-pages@v4
47+

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.venv/
2+
.pytest_cache/
3+
__pycache__/
4+
*.pyc
5+
outputs/
6+
site/
7+
books/gutenberg/
8+

README.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Language Experiments
2+
3+
Language Experiments turns books into visual fingerprints. It tokenizes a text,
4+
computes a linguistic metric, maps the metric to colors, and writes PNG and
5+
HTML viewers where each pixel represents a token or a sliding window of tokens.
6+
7+
## Setup
8+
9+
```bash
10+
uv sync
11+
```
12+
13+
## Commands
14+
15+
List available metrics and color maps:
16+
17+
```bash
18+
uv run bookviz list
19+
```
20+
21+
Render one book:
22+
23+
```bash
24+
uv run bookviz render books/dubliners.txt --metric word-freq --color heat --html
25+
```
26+
27+
The legacy form still works:
28+
29+
```bash
30+
uv run python book_png.py books/dubliners.txt --metric word-freq --html
31+
```
32+
33+
Render a sliding-window view:
34+
35+
```bash
36+
uv run bookviz render books/ulysses.txt \
37+
--metric lexical-diversity \
38+
--window-size 200 \
39+
--window-step 50 \
40+
--html
41+
```
42+
43+
Compare books with shared color normalization:
44+
45+
```bash
46+
uv run bookviz compare \
47+
books/dubliners.txt books/ulysses.txt books/moby-dick.txt \
48+
--metric lexical-diversity \
49+
--window-size 200 \
50+
--output outputs/comparison.png \
51+
--html
52+
```
53+
54+
Download a book from Project Gutenberg:
55+
56+
```bash
57+
uv run bookviz gutenberg 2701 --title moby-dick
58+
```
59+
60+
Generate a static gallery:
61+
62+
```bash
63+
uv run bookviz gallery \
64+
--input books \
65+
--output site \
66+
--metrics word-freq lexical-diversity bigram-diversity \
67+
--window-size 200
68+
```
69+
70+
## Metrics
71+
72+
Token metrics:
73+
74+
- `word-freq`
75+
- `word-freq-linear`
76+
- `bigram-prob`
77+
- `bigram-diversity`
78+
- `word-length`
79+
- `word-position`
80+
- `unique-word`
81+
82+
Window metrics:
83+
84+
- `avg-word-length`
85+
- `lexical-diversity`
86+
- `punctuation-density`
87+
- `repetition-density`
88+
- `sentence-length`
89+
90+
Token metrics can also be used with `--window-size`; their values are averaged
91+
inside each window.
92+
93+
## GitHub Pages
94+
95+
The workflow at `.github/workflows/pages.yml` builds the gallery with `uv` and
96+
publishes the generated `site/` directory to GitHub Pages. Enable Pages in the
97+
repository settings and choose GitHub Actions as the source.
98+

0 commit comments

Comments
 (0)