chore(hive): vendor Hive 3.1 metastore + fb303 Thrift IDL#694
chore(hive): vendor Hive 3.1 metastore + fb303 Thrift IDL#694MisterRaindrop wants to merge 1 commit into
Conversation
Vendor the Apache Hive 3.1 standalone-metastore IDL and the fb303 helper IDL it includes into third_party/hive_metastore/. These files are the input for the C++ HMS client bindings, generated by a follow-up commit that invokes `thrift --gen cpp` at build time. Provenance: * hive_metastore.thrift - apache/hive @ branch-3.1, standalone-metastore * share/fb303/if/fb303.thrift - apache/thrift @ master, contrib/fb303 Both upstream files retain their Apache 2.0 license headers; only trailing whitespace and final newlines were normalized by the repository's pre-commit hooks. third_party/hive_metastore/NOTICE records the upstream sources, and the project root NOTICE references it. .github/.licenserc.yaml gains third_party/** to paths-ignore so the license-eye check skips the vendored tree. Part of the iceberg-cpp HiveCatalog port that follows iceberg-rust's iceberg-catalog-hms crate as a blueprint.
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for importing Hive related files. I've left some minor comments. BTW, the title is not core. Perhaps rename to feat(hive): vendor Hive 3.1 metastore + fb303 Thrift IDL
| - 'requirements.txt' | ||
| - 'src/iceberg/util/murmurhash3_internal.*' | ||
| - 'src/iceberg/test/resources/**' | ||
| - 'third_party/**' |
There was a problem hiding this comment.
Should we rename third_party to thirdparty which is more widely used?
| ================ | ||
|
|
||
| * hive_metastore.thrift | ||
| Apache Hive 3.1 standalone-metastore. |
There was a problem hiding this comment.
Why this specific version? How do we want to upgrade or maintain multiple versions in the future?
There was a problem hiding this comment.
Good question. I selected Hive 3.1.3 because it is a mature and widely deployed HMS
version. I also considered the precedent from iceberg-rust, which maintains a single
client generated from a Hive 2.3 IDL and integration-tests it against Hive 3.1.3.
For future maintenance, I propose keeping a single vendored IDL pinned to an
immutable Hive release tag or commit, rather than maintaining separate generated
clients for each Hive version. The implementation should use RPCs shared across the
supported versions whenever possible.
If future Hive releases introduce incompatible RPC changes, we can add narrowly
scoped runtime adapters or fallback logic, allowing one build of iceberg-cpp to
support multiple Hive versions. We should validate and document the supported
versions through a CI compatibility matrix before claiming compatibility.
| This product includes software developed at | ||
| The Apache Software Foundation (http://www.apache.org/). | ||
|
|
||
| Third-party Thrift IDLs vendored under third_party/hive_metastore/ are |
There was a problem hiding this comment.
IIUC, ASF projects are exempted here. cc expert @jbonofre
Vendor the Apache Hive 3.1 standalone-metastore IDL and the fb303 helper IDL it includes into third_party/hive_metastore/. These files are the input for the C++ HMS client bindings, generated by a follow-up commit that invokes
thrift --gen cppat build time.Provenance:
Both upstream files retain their Apache 2.0 license headers; only trailing whitespace and final newlines were normalized by the repository's pre-commit hooks. third_party/hive_metastore/NOTICE records the upstream sources, and the project root NOTICE references it. .github/.licenserc.yaml gains third_party/** to paths-ignore so the license-eye check skips the vendored tree.