Skip to content

docs: add Join Order Benchmark (JOB) example dataset page#6417

Open
ayakovlev-clickhouse wants to merge 5 commits into
mainfrom
docs/job-benchmark
Open

docs: add Join Order Benchmark (JOB) example dataset page#6417
ayakovlev-clickhouse wants to merge 5 commits into
mainfrom
docs/job-benchmark

Conversation

@ayakovlev-clickhouse

@ayakovlev-clickhouse ayakovlev-clickhouse commented Jun 18, 2026

Copy link
Copy Markdown

Summary

Add a new example dataset page documenting the Join Order Benchmark (JOB), a snapshot of IMDb used to stress-test query optimizers. Covers schema creation, loading the 21 tables from the public S3 Parquet files (loop and
explicit INSERT variants), per-table row/compressed-size figures, preparing the data from the original CSV files, and links to the init script, queries, and settings in the ClickHouse repository.

This is similar to page https://clickhouse.com/docs/getting-started/example-datasets/tpch

Checklist


Note

Low Risk
Documentation-only changes with no runtime or product code impact.

Overview
Adds a new Getting Started → example datasets guide for the Join Order Benchmark (JOB) (IMDb snapshot used to stress join ordering and cardinality estimation).

The page documents creating the job database from init_cloud.sql, loading all 21 tables from public S3 Parquet (shell loop or per-table INSERT), a per-table row/compressed-size table, where to find the 113 benchmark queries and settings.json, and an optional path to rebuild from original PostgreSQL-style CSVs via convert_csv.py and init.sql with data_type_default_nullable=1.

Also adds Join Order Benchmark to the Vale Headings.yml capitalization exceptions so headings on the new page pass style checks.

Reviewed by Cursor Bugbot for commit 0f4c001. Bugbot is set up for automated code reviews on this repo. Configure here.

@ayakovlev-clickhouse ayakovlev-clickhouse requested a review from a team as a code owner June 18, 2026 18:16
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview, Comment Jun 19, 2026 4:41pm
4 Skipped Deployments
Project Deployment Actions Updated (UTC)
clickhouse-docs-jp Ignored Ignored Jun 19, 2026 4:41pm
clickhouse-docs-ko Ignored Ignored Preview Jun 19, 2026 4:41pm
clickhouse-docs-ru Ignored Ignored Preview Jun 19, 2026 4:41pm
clickhouse-docs-zh Ignored Ignored Preview Jun 19, 2026 4:41pm

Request Review

@ayakovlev-clickhouse

Copy link
Copy Markdown
Author

Note that links to GH Core repo like tests/benchmarks/job/init.sql might not work yet. There is another PR to Core for this, see ClickHouse/ClickHouse#107756.
This PR should be merged only after those links are working.

@dhtclk

dhtclk commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

@ayakovlev-clickhouse - I made a small commit to add the header to our exceptions list so the CI passes because it's flagged for sentence casing.

@dhtclk dhtclk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few very minor non-blocking comments.

Comment thread docs/getting-started/example-datasets/job.md Outdated
Comment thread docs/getting-started/example-datasets/job.md Outdated
Comment thread docs/getting-started/example-datasets/job.md Outdated
ayakovlev-clickhouse and others added 2 commits June 19, 2026 17:58
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Don't Merge Don't merge yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants