Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file by mkmkme · Pull Request #1776 · Altinity/ClickHouse

mkmkme · 2026-05-09T12:54:06Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Object information used for parsing data files in iceberg now contains the number of file rows and file size in bytes parsed from manifest file (ClickHouse#100645 by @divanik).

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

…s_and_rows_count_to_iceberg_data_object Parse record_count and size_bytes fields from iceberg manifest file

github-actions · 2026-05-09T12:55:10Z

Workflow [PR], commit [3cb64e5]

ianton-ru · 2026-05-11T10:25:57Z

+        if (info.record_count.has_value())
+            LOG_TEST(log, "Iceberg record_count for '{}': {}", object_info->getPath(), *info.record_count);
+        if (info.file_size_in_bytes.has_value())
+            LOG_TEST(log, "Iceberg file_size_in_bytes for '{}': {}", object_info->getPath(), *info.file_size_in_bytes);


Am I right that writing in log with 'test' level is an only place, where new data are used?

alsugiliazova · 2026-05-14T13:54:18Z

Audit: PR #1776 — Antalya 26.3 Backport of #100645 — Parse `record_count` and `size_bytes` fields from iceberg manifest file

AI audit note: generated by AI (Cursor agent, audit-review skill).

Confirmed defects

No confirmed defects in reviewed scope.

ianton-ru

LGTM, but I don't understand a value of this PR. It makes a lot of work only to write two lines in log with 'test' level. Or I missed something.

alsugiliazova · 2026-05-15T12:55:10Z

Verification: PR #1776

tats_logging(storages3× format-version1/2×use_view False/True` = 4 cases).

PR-added tests — all GREEN

4 parametrized cases × 3 integration jobs = 12 OK runs, 0 failures.

Job	Cases run	Status
Integration tests (amd_asan, db disk, old analyzer, 4/6)	4	OK
Integration tests (amd_binary, 5/5)	4	OK
Integration tests (arm_binary, distributed plan, 2/4)	4	OK

All four parametrizations pass on every job:

[s3-1-False], [s3-1-True] — format-version 1, with/without view
[s3-2-False], [s3-2-True] — format-version 2, with/without view

The new manifest-file stats path has clean positive coverage on both Iceberg spec versions and on plain-table / view-wrapped reads.

Note: the new gtest gtest_datalake_table_state_serde is built into the unit-tests binary; the unit-tests job ran green in this CI rollup.

CI overview (head commit)

PR test workflow: 46 success / 50 skipped / 0 failure — fully GREEN at the PR test workflow level.
Regression workflow: 29 success / 67 skipped / 4 failure (chronic baseline).
One pending action_required job (queue/auth).

Test-level failures in DB

Zero. No test_status='FAIL' rows on this commit.

Regression-workflow failures (chronic baseline on `antalya-26.3`)

Suite	Fails
Swarms (Aarch64 + Release)	227
Parquet (Aarch64 + Release)	34
S3Export `partition` (Aarch64 + Release)	20
S3Export `part` (Aarch64 + Release)	16

Same fingerprint as sibling antalya-26.3 PRs (1783, 1775, 1773, 1772, 1771, 1770, 1769, …). No new failure modes.

Caveat — partial frontport

PR lands on antalya-26.3 while companion features from antalya-26.1 are still being frontported in parallel. Final re-verify recommended once the rest of the bundle lands.

Verdict

Safe to merge.

New integration test test_iceberg_file_stats_logging passes 100% (12/12 integration runs) across all 4 parametrizations and 3 integration jobs.
New gtest for datalake_table_state serde compiles and runs green.
Zero test-level FAIL rows on this head.
All remaining red checks are the recurring antalya-26.3 chronic regression baseline (Swarms / Parquet / S3Export), shared with sibling PRs.

divanik and others added 2 commits May 9, 2026 13:53

Merge pull request ClickHouse#100645 from ClickHouse/divanik/add_byte…

1cacf98

…s_and_rows_count_to_iceberg_data_object Parse record_count and size_bytes fields from iceberg manifest file

fix gtest to use string instead of IcebergPath

3cb64e5

mkmkme added antalya backport Backport antalya-26.3 labels May 9, 2026

ianton-ru reviewed May 11, 2026

View reviewed changes

svb-alt assigned ianton-ru May 14, 2026

ianton-ru approved these changes May 15, 2026

View reviewed changes

alsugiliazova added the verified Approved for release label May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776

Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776
mkmkme wants to merge 2 commits into
antalya-26.3from
backports/antalya-26.3/100645

mkmkme commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

ianton-ru May 11, 2026

Uh oh!

alsugiliazova commented May 14, 2026

Uh oh!

ianton-ru left a comment

Uh oh!

alsugiliazova commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mkmkme commented May 9, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

ianton-ru May 11, 2026

Choose a reason for hiding this comment

Uh oh!

alsugiliazova commented May 14, 2026

Audit: PR #1776 — Antalya 26.3 Backport of #100645 — Parse record_count and size_bytes fields from iceberg manifest file

Confirmed defects

Uh oh!

ianton-ru left a comment

Choose a reason for hiding this comment

Uh oh!

alsugiliazova commented May 15, 2026

Verification: PR #1776

PR-added tests — all GREEN

CI overview (head commit)

Test-level failures in DB

Regression-workflow failures (chronic baseline on antalya-26.3)

Caveat — partial frontport

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Audit: PR #1776 — Antalya 26.3 Backport of #100645 — Parse `record_count` and `size_bytes` fields from iceberg manifest file

Regression-workflow failures (chronic baseline on `antalya-26.3`)