feat(storage): Add full object read checksum validation for Open#16120
feat(storage): Add full object read checksum validation for Open#16120v-pratap wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements checksum validation (CRC32C and MD5) for asynchronous reads by integrating hash validators into ObjectDescriptorImpl and ReadRange. However, several critical issues were identified during the review. First, the checksum validation in OnRead is performed prematurely before the final chunk of data is processed, which will cause false checksum mismatches. Second, the entire integration test suite was accidentally deleted and replaced with a single test containing hardcoded credentials. Finally, multiple debugging std::cout statements were left in the production code and should be removed.
I am having trouble creating individual review comments. Click here to see my feedback.
google/cloud/storage/internal/async/read_range.cc (74-89)
Correctness Bug: Incorrect Checksum Validation Order
The range_end checksum validation is performed at the very beginning of OnRead, before the current message's checksummed_data is processed and incorporated into the hash_function_.
Since the last message in a GCS read stream typically contains both the final chunk of data and range_end = true, calling hash_function_->Finish() here will exclude the last chunk from the computed hash, resulting in a false checksum mismatch error.
Fix: Defer the range_end check and object-level checksum validation until after the chunk's data has been successfully processed and added to the hash function (i.e., after the chunk validation and offset updates).
google/cloud/storage/tests/async_client_integration_test.cc (234-242)
Critical Issue: Accidental Deletion of Integration Tests & Hardcoded Credentials
It appears that the entire integration test suite for AsyncClient was accidentally deleted and replaced with a single manual test (StartAppendableUploadEmpty) containing a hardcoded project ID ("bajajnehaa-devrel-test") and bucket name.
This will break the CI/CD pipeline and prevent other developers from running the tests.
Fix: Please revert the changes to async_client_integration_test.cc to restore the full integration test suite, and avoid committing hardcoded project/bucket names.
google/cloud/storage/internal/async/object_descriptor_impl.cc (159-166)
Please remove these debugging std::cout statements before merging. They are not suitable for production code.
if (options_.get<storage::EnableCrc32cValidationOption>()) {
hash_function =
std::make_shared<storage::internal::Crc32cMessageHashFunction>(
std::make_unique<storage::internal::Crc32cHashFunction>());
}google/cloud/storage/internal/async/object_descriptor_impl.cc (188-203)
Please remove the debugging std::cout statements from the metadata checksum processing block.
if (metadata_->has_checksums()) {
auto const& checksums = metadata_->checksums();
if (checksums.has_crc32c()) {
hashes = Merge(std::move(hashes),
storage::internal::HashValues{
storage_internal::Crc32cFromProto(checksums.crc32c()), {}});
}
if (!checksums.md5_hash().empty()) {
hashes = Merge(std::move(hashes),
storage::internal::HashValues{
{}, storage_internal::MD5FromProto(checksums.md5_hash())});
}
}google/cloud/storage/internal/async/read_range.cc (99-101)
Please remove this debugging std::cout statement.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16120 +/- ##
=======================================
Coverage 92.71% 92.71%
=======================================
Files 2353 2353
Lines 219274 219335 +61
=======================================
+ Hits 203303 203362 +59
- Misses 15971 15973 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
No description provided.