docs: add Join Order Benchmark (JOB) example dataset page#6417
Open
ayakovlev-clickhouse wants to merge 5 commits into
Open
docs: add Join Order Benchmark (JOB) example dataset page#6417ayakovlev-clickhouse wants to merge 5 commits into
ayakovlev-clickhouse wants to merge 5 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
4 Skipped Deployments
|
Author
|
Note that links to GH Core repo like |
Collaborator
|
@ayakovlev-clickhouse - I made a small commit to add the header to our exceptions list so the CI passes because it's flagged for sentence casing. |
dhtclk
approved these changes
Jun 18, 2026
dhtclk
left a comment
Collaborator
There was a problem hiding this comment.
LGTM, left a few very minor non-blocking comments.
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a new example dataset page documenting the Join Order Benchmark (JOB), a snapshot of IMDb used to stress-test query optimizers. Covers schema creation, loading the 21 tables from the public S3 Parquet files (loop and
explicit INSERT variants), per-table row/compressed-size figures, preparing the data from the original CSV files, and links to the init script, queries, and settings in the ClickHouse repository.
This is similar to page https://clickhouse.com/docs/getting-started/example-datasets/tpch
Checklist
Note
Low Risk
Documentation-only changes with no runtime or product code impact.
Overview
Adds a new Getting Started → example datasets guide for the Join Order Benchmark (JOB) (IMDb snapshot used to stress join ordering and cardinality estimation).
The page documents creating the
jobdatabase frominit_cloud.sql, loading all 21 tables from public S3 Parquet (shell loop or per-tableINSERT), a per-table row/compressed-size table, where to find the 113 benchmark queries andsettings.json, and an optional path to rebuild from original PostgreSQL-style CSVs viaconvert_csv.pyandinit.sqlwithdata_type_default_nullable=1.Also adds Join Order Benchmark to the Vale
Headings.ymlcapitalization exceptions so headings on the new page pass style checks.Reviewed by Cursor Bugbot for commit 0f4c001. Bugbot is set up for automated code reviews on this repo. Configure here.