Skip to content

Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776

Open
mkmkme wants to merge 2 commits into
antalya-26.3from
backports/antalya-26.3/100645
Open

Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776
mkmkme wants to merge 2 commits into
antalya-26.3from
backports/antalya-26.3/100645

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented May 9, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Object information used for parsing data files in iceberg now contains the number of file rows and file size in bytes parsed from manifest file (ClickHouse#100645 by @divanik).

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

divanik and others added 2 commits May 9, 2026 13:53
…s_and_rows_count_to_iceberg_data_object

Parse record_count and size_bytes fields from iceberg manifest file
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Workflow [PR], commit [3cb64e5]

if (info.record_count.has_value())
LOG_TEST(log, "Iceberg record_count for '{}': {}", object_info->getPath(), *info.record_count);
if (info.file_size_in_bytes.has_value())
LOG_TEST(log, "Iceberg file_size_in_bytes for '{}': {}", object_info->getPath(), *info.file_size_in_bytes);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that writing in log with 'test' level is an only place, where new data are used?

@alsugiliazova
Copy link
Copy Markdown
Member

Audit: PR #1776 — Antalya 26.3 Backport of #100645 — Parse record_count and size_bytes fields from iceberg manifest file

AI audit note: generated by AI (Cursor agent, audit-review skill).

Confirmed defects

No confirmed defects in reviewed scope.

Copy link
Copy Markdown

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't understand a value of this PR. It makes a lot of work only to write two lines in log with 'test' level. Or I missed something.

@alsugiliazova
Copy link
Copy Markdown
Member

Verification: PR #1776

tats_logging(storages3× format-version1/2×use_view False/True` = 4 cases).

PR-added tests — all GREEN

4 parametrized cases × 3 integration jobs = 12 OK runs, 0 failures.

Job Cases run Status
Integration tests (amd_asan, db disk, old analyzer, 4/6) 4 OK
Integration tests (amd_binary, 5/5) 4 OK
Integration tests (arm_binary, distributed plan, 2/4) 4 OK

All four parametrizations pass on every job:

  • [s3-1-False], [s3-1-True] — format-version 1, with/without view
  • [s3-2-False], [s3-2-True] — format-version 2, with/without view

The new manifest-file stats path has clean positive coverage on both Iceberg spec versions and on plain-table / view-wrapped reads.

Note: the new gtest gtest_datalake_table_state_serde is built into the unit-tests binary; the unit-tests job ran green in this CI rollup.

CI overview (head commit)

  • PR test workflow: 46 success / 50 skipped / 0 failure — fully GREEN at the PR test workflow level.
  • Regression workflow: 29 success / 67 skipped / 4 failure (chronic baseline).
  • One pending action_required job (queue/auth).

Test-level failures in DB

Zero. No test_status='FAIL' rows on this commit.

Regression-workflow failures (chronic baseline on antalya-26.3)

Suite Fails
Swarms (Aarch64 + Release) 227
Parquet (Aarch64 + Release) 34
S3Export partition (Aarch64 + Release) 20
S3Export part (Aarch64 + Release) 16

Same fingerprint as sibling antalya-26.3 PRs (1783, 1775, 1773, 1772, 1771, 1770, 1769, …). No new failure modes.

Caveat — partial frontport

PR lands on antalya-26.3 while companion features from antalya-26.1 are still being frontported in parallel. Final re-verify recommended once the rest of the bundle lands.

Verdict

Safe to merge.

  • New integration test test_iceberg_file_stats_logging passes 100% (12/12 integration runs) across all 4 parametrizations and 3 integration jobs.
  • New gtest for datalake_table_state serde compiles and runs green.
  • Zero test-level FAIL rows on this head.
  • All remaining red checks are the recurring antalya-26.3 chronic regression baseline (Swarms / Parquet / S3Export), shared with sibling PRs.

@alsugiliazova alsugiliazova added the verified Approved for release label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants