feat: metrics reporting for scan and commit#589
Conversation
5fa02cb to
0a85180
Compare
|
Let me know when ready for review :) |
for sure ! most likely tomorrow |
5241044 to
d26fb96
Compare
|
Thanks for updating this! I still need to take some time to get familiar with the Java implementation and its API in the rest spec before reviewing it. |
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for working on this! I have some general questions on the design, especially on the capability of customization. IMO we can focus on the API design for now and later integrate with other classes. Please let me know what you think.
| /// | ||
| /// \param reporter_type Case-insensitive type identifier (e.g., "noop"). | ||
| /// \param factory Factory function that produces the reporter. | ||
| static void Register(std::string_view reporter_type, MetricsReporterFactory factory); |
There was a problem hiding this comment.
How do we support the Java parity CompositeMetricsReporter? It would be useful if we want to report metrics to multiple sinks.
This is fair, I just wanted to show the end to end picture for ease of understanding. |
wgtmac
left a comment
There was a problem hiding this comment.
Follow-up findings from a deeper design/parity pass over the metrics reporting changes.
|
I have a design question about the |
The plan is still to keep parity as a next step I was thinking of doing serialization and then introduce the utilities. I initially elected to do it this way because I wanted to have an idea of the end to end view. Now I can probably update the first initial PR to just on the utilities if it will make your review easier I also don't want the change to be too big. |
|
@evindj Sorry I've been busy these days. I think it would be good to focus on the API design first. |
| std::shared_ptr<FileIO> file_io_; | ||
| std::string warehouse_location_; | ||
| std::unique_ptr<class InMemoryNamespace> root_namespace_; | ||
| std::shared_ptr<MetricsReporter> reporter_; |
There was a problem hiding this comment.
please add comment about this new member
| namespace { | ||
|
|
||
| /// \brief Registry type for MetricsReporter factories. | ||
| using MetricsReporterRegistry = std::unordered_map<std::string, MetricsReporterFactory>; |
There was a problem hiding this comment.
Do we have any use case that std::string fit better?
| /// \param[in] metadata The metadata for the table. | ||
| /// \param[in] metadata_location The location of the table metadata file. | ||
| /// \param[in] io The FileIO to read and write table data and metadata files. | ||
| /// \param[in] catalog The catalog that this table belongs to. |
There was a problem hiding this comment.
update comment of parameter list
| std::shared_ptr<TableMetadata> table_metadata, std::shared_ptr<FileIO> file_io, | ||
| std::shared_ptr<MetricsReporter> reporter, const std::string& table_name) | ||
| : metadata_(std::move(table_metadata)), io_(std::move(file_io)) { | ||
| context_.reporter = std::move(reporter); |
There was a problem hiding this comment.
should we also check port reporter like table.h above?
There was a problem hiding this comment.
Just want to make all usage of reporter to be consistent
| report.scan_metrics.total_data_manifests = | ||
| static_cast<int64_t>(data_manifests.size()); | ||
| report.scan_metrics.total_delete_manifests = | ||
| static_cast<int64_t>(delete_manifests.size()); |
There was a problem hiding this comment.
if the source of the data are not int64_t, why do we declare them as int64_t ?
It looke like unnecessary type casting?
|
@evindj Do you have any chance to revive this? |
Sorry for the long wait, I have been a bit busy lately. My schedule should clear toward the end of the week. |
f787939 to
dbf02c0
Compare
wgtmac
left a comment
There was a problem hiding this comment.
I haven't finished my review yet. Just post my findings so far and I'll finish it in two days. I think this change looks great. Thank you for polishing the design!
| class ICEBERG_EXPORT ScanMetrics { | ||
| public: | ||
| /// \brief Create a ScanMetrics instance backed by the given MetricsContext. | ||
| static ScanMetrics Of(MetricsContext& context); |
There was a problem hiding this comment.
Should we return std::unique_ptr<ScanMetrics> instead? Downstream is free to choose unique_ptr or shared_ptr then.
There was a problem hiding this comment.
Do we want to make the default ctor private?
wgtmac
left a comment
There was a problem hiding this comment.
I've finished reviewing on non-test files. Great work! Let me know what you think. Thanks!
b87f96a to
b87c145
Compare
64c2353 to
f600898
Compare
Initial commit for addressing #533