-
Notifications
You must be signed in to change notification settings - Fork 107
feat: add SQL catalog #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zhjwpku
wants to merge
2
commits into
apache:main
Choose a base branch
from
zhjwpku:sql_catalog
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,6 +46,12 @@ option(ICEBERG_BUILD_BUNDLE "Build the battery included library" ON) | |
| option(ICEBERG_BUILD_REST "Build rest catalog client" ON) | ||
| option(ICEBERG_BUILD_REST_INTEGRATION_TESTS "Build rest catalog integration tests" OFF) | ||
| option(ICEBERG_BUILD_HIVE "Build hive (HMS) catalog client" OFF) | ||
| option(ICEBERG_BUILD_SQL_CATALOG "Build SQL catalog client" ON) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we make it off by default? This follows the pattern used by rest catalog library. |
||
| # Built-in SQL catalog database connectors. Disable all of them to build a SQL | ||
| # catalog that only works with a user-supplied CatalogStore. | ||
| option(ICEBERG_SQL_SQLITE "Build the SQLite connector for the SQL catalog" ON) | ||
| option(ICEBERG_SQL_POSTGRESQL "Build the PostgreSQL connector for the SQL catalog" OFF) | ||
| option(ICEBERG_SQL_MYSQL "Build the MySQL connector for the SQL catalog" OFF) | ||
| option(ICEBERG_S3 "Build with S3 support" OFF) | ||
| option(ICEBERG_ENABLE_ASAN "Enable Address Sanitizer" OFF) | ||
| option(ICEBERG_ENABLE_UBSAN "Enable Undefined Behavior Sanitizer" OFF) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| # Select the built-in database connectors to compile. The catalog logic itself | ||
| # is database-agnostic; each connector is an optional `CatalogStore` implementation | ||
| # built on sqlpp23 and linked against its native client library. | ||
| # | ||
| # sqlpp23 is a build-time-only (header-only) dependency: it is compiled into the | ||
| # connector translation units but never appears in the installed interface, so | ||
| # downstream consumers only need the native client libraries. | ||
| set(ICEBERG_SQL_CATALOG_SOURCES connection_uri.cc sql_catalog.cc) | ||
| # Targets used while building (header-only sqlpp23 connector targets). | ||
| set(ICEBERG_SQL_CATALOG_CONNECTOR_BUILD_LIBS) | ||
| # Native client libraries required at link time, including by installed consumers. | ||
| set(ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS) | ||
|
|
||
| if(ICEBERG_SQL_SQLITE) | ||
| set(BUILD_SQLITE3_CONNECTOR ON) | ||
| list(APPEND ICEBERG_SQL_CATALOG_SOURCES catalog_store_sqlite3.cc) | ||
| list(APPEND ICEBERG_SQL_CATALOG_CONNECTOR_BUILD_LIBS sqlpp23::sqlite3) | ||
| list(APPEND ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS SQLite::SQLite3) | ||
| endif() | ||
|
|
||
| if(ICEBERG_SQL_POSTGRESQL) | ||
| set(BUILD_POSTGRESQL_CONNECTOR ON) | ||
| list(APPEND ICEBERG_SQL_CATALOG_SOURCES catalog_store_postgresql.cc) | ||
| list(APPEND ICEBERG_SQL_CATALOG_CONNECTOR_BUILD_LIBS sqlpp23::postgresql) | ||
| list(APPEND ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS PostgreSQL::PostgreSQL) | ||
| endif() | ||
|
|
||
| if(ICEBERG_SQL_MYSQL) | ||
| set(BUILD_MYSQL_CONNECTOR ON) | ||
| list(APPEND ICEBERG_SQL_CATALOG_SOURCES catalog_store_mysql.cc) | ||
| list(APPEND ICEBERG_SQL_CATALOG_CONNECTOR_BUILD_LIBS sqlpp23::mysql) | ||
| list(APPEND ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS MySQL::MySQL) | ||
| endif() | ||
|
|
||
| # config.h.in uses #cmakedefine for the BUILD_*_CONNECTOR variables set above, | ||
| # so it must be configured after the connectors are selected. | ||
| configure_file("${CMAKE_CURRENT_SOURCE_DIR}/config.h.in" | ||
| "${CMAKE_CURRENT_BINARY_DIR}/config.h") | ||
|
|
||
| install(FILES "${CMAKE_CURRENT_BINARY_DIR}/config.h" | ||
| DESTINATION "${ICEBERG_INSTALL_INCLUDEDIR}/iceberg/catalog/sql") | ||
|
|
||
| set(ICEBERG_SQL_CATALOG_STATIC_BUILD_INTERFACE_LIBS) | ||
| set(ICEBERG_SQL_CATALOG_SHARED_BUILD_INTERFACE_LIBS) | ||
| set(ICEBERG_SQL_CATALOG_STATIC_INSTALL_INTERFACE_LIBS) | ||
| set(ICEBERG_SQL_CATALOG_SHARED_INSTALL_INTERFACE_LIBS) | ||
|
|
||
| # The sqlpp23 connector targets are header-only and used only while building. | ||
| # The installed interface exposes only the native client libraries. | ||
| list(APPEND | ||
| ICEBERG_SQL_CATALOG_STATIC_BUILD_INTERFACE_LIBS | ||
| "$<IF:$<TARGET_EXISTS:iceberg_static>,iceberg_static,iceberg_shared>" | ||
| ${ICEBERG_SQL_CATALOG_CONNECTOR_BUILD_LIBS} | ||
| ${ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS}) | ||
| list(APPEND | ||
| ICEBERG_SQL_CATALOG_SHARED_BUILD_INTERFACE_LIBS | ||
| "$<IF:$<TARGET_EXISTS:iceberg_shared>,iceberg_shared,iceberg_static>" | ||
| ${ICEBERG_SQL_CATALOG_CONNECTOR_BUILD_LIBS} | ||
| ${ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS}) | ||
| list(APPEND | ||
| ICEBERG_SQL_CATALOG_STATIC_INSTALL_INTERFACE_LIBS | ||
| "$<IF:$<TARGET_EXISTS:iceberg::iceberg_static>,iceberg::iceberg_static,iceberg::iceberg_shared>" | ||
| ${ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS}) | ||
| list(APPEND | ||
| ICEBERG_SQL_CATALOG_SHARED_INSTALL_INTERFACE_LIBS | ||
| "$<IF:$<TARGET_EXISTS:iceberg::iceberg_shared>,iceberg::iceberg_shared,iceberg::iceberg_static>" | ||
| ${ICEBERG_SQL_CATALOG_CONNECTOR_INSTALL_LIBS}) | ||
|
|
||
| add_iceberg_lib(iceberg_sql_catalog | ||
| SOURCES | ||
| ${ICEBERG_SQL_CATALOG_SOURCES} | ||
| SHARED_LINK_LIBS | ||
| ${ICEBERG_SQL_CATALOG_SHARED_BUILD_INTERFACE_LIBS} | ||
| STATIC_LINK_LIBS | ||
| ${ICEBERG_SQL_CATALOG_STATIC_BUILD_INTERFACE_LIBS} | ||
| STATIC_INSTALL_INTERFACE_LIBS | ||
| ${ICEBERG_SQL_CATALOG_STATIC_INSTALL_INTERFACE_LIBS} | ||
| SHARED_INSTALL_INTERFACE_LIBS | ||
| ${ICEBERG_SQL_CATALOG_SHARED_INSTALL_INTERFACE_LIBS}) | ||
|
|
||
| iceberg_install_all_headers(iceberg/catalog/sql) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,158 @@ | ||
| <!-- | ||
| ~ Licensed to the Apache Software Foundation (ASF) under one | ||
| ~ or more contributor license agreements. See the NOTICE file | ||
| ~ distributed with this work for additional information | ||
| ~ regarding copyright ownership. The ASF licenses this file | ||
| ~ to you under the Apache License, Version 2.0 (the | ||
| ~ "License"); you may not use this file except in compliance | ||
| ~ with the License. You may obtain a copy of the License at | ||
| ~ | ||
| ~ http://www.apache.org/licenses/LICENSE-2.0 | ||
| ~ | ||
| ~ Unless required by applicable law or agreed to in writing, | ||
| ~ software distributed under the License is distributed on an | ||
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| ~ KIND, either express or implied. See the License for the | ||
| ~ specific language governing permissions and limitations | ||
| ~ under the License. | ||
| --> | ||
|
|
||
| # SQL Catalog | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps it is good to add to the iceberg-cpp site as well. |
||
|
|
||
| `SqlCatalog` implements the Iceberg `Catalog` API on top of a relational | ||
| database. Its on-disk schema is compatible with the Apache Iceberg Java | ||
| `JdbcCatalog`: two tables, `iceberg_tables` and `iceberg_namespace_properties`, | ||
| scoped by a catalog name so multiple catalogs can share one database. | ||
|
|
||
| ## Design | ||
|
|
||
| `SqlCatalog` owns the Iceberg catalog behavior. It validates namespaces, reads | ||
| and writes table metadata files, and performs optimistic-concurrency commits. | ||
| Database access is delegated to a small storage interface: | ||
|
|
||
| ``` | ||
| Application | ||
| | | ||
| v | ||
| SqlCatalog | ||
| | | ||
| | CatalogStore API | ||
| v | ||
| CatalogStore implementation | ||
| | | ||
| v | ||
| SQL database | ||
| - iceberg_tables | ||
| - iceberg_namespace_properties | ||
| ``` | ||
|
|
||
| `CatalogStore` (see [`catalog_store.h`](catalog_store.h)) exposes typed row | ||
| operations such as `InsertTable`, `GetTableMetadataLocation`, | ||
| `UpdateTableMetadataLocation(expected_current)`, namespace-property CRUD, and | ||
| `RunInTransaction`. It exposes no SQL strings or driver-specific types. | ||
|
|
||
| The project provides built-in `CatalogStore` implementations for SQLite, | ||
| PostgreSQL, and MySQL. They are implemented with | ||
| [sqlpp23](https://github.com/rbock/sqlpp23), and the shared query code lives in | ||
| `catalog_store_sqlpp23_internal.h`. Users can also provide their own | ||
| `CatalogStore` implementation for another database, driver, or connection pool. | ||
|
|
||
| sqlpp23 is a build-time-only dependency for the built-in stores. It is compiled | ||
| into the connector translation units and does not appear in the installed | ||
| interface, so downstream consumers only need the native client libraries. | ||
|
|
||
| > The built-in sqlpp23 connectors require CMake >= 3.28 and C++23; sqlpp23 is | ||
| > fetched automatically via `FetchContent` when at least one built-in connector | ||
| > is enabled. A SQL catalog backed only by a user-supplied `CatalogStore` does not | ||
| > need sqlpp23. The SQL catalog is currently wired into the CMake build only; | ||
| > the Meson build does not build or install it yet. | ||
|
|
||
| ## Out-of-the-box usage | ||
|
|
||
| Built-in connectors pull in their native client libraries via sqlpp23. Enable | ||
| them at configure time: | ||
|
|
||
| | CMake option | Default | sqlpp23 target | Native dependency | | ||
| |-------------------------|---------|-----------------------|------------------------| | ||
| | `ICEBERG_SQL_SQLITE` | `ON` | `sqlpp23::sqlite3` | SQLite3 | | ||
| | `ICEBERG_SQL_POSTGRESQL`| `OFF` | `sqlpp23::postgresql` | libpq (PostgreSQL) | | ||
| | `ICEBERG_SQL_MYSQL` | `OFF` | `sqlpp23::mysql` | libmysqlclient (MySQL) | | ||
|
|
||
| ```cpp | ||
| #include "iceberg/catalog/sql/sql_catalog.h" | ||
|
|
||
| using iceberg::sql::SqlCatalog; | ||
| using iceberg::sql::SqlCatalogConfig; | ||
|
|
||
| SqlCatalogConfig config{ | ||
| .name = "prod", | ||
| .uri = "/var/lib/iceberg/catalog.db", // SQLite file path | ||
| .warehouse_location = "s3://my-bucket/warehouse", | ||
| }; | ||
|
|
||
| auto catalog = SqlCatalog::MakeSqliteCatalog(config, file_io).value(); | ||
| // catalog->CreateNamespace(...), CreateTable(...), LoadTable(...), ... | ||
| ``` | ||
|
|
||
| `MakePostgreSqlCatalog` and `MakeMySqlCatalog` are available when the matching | ||
| connector is enabled. Their URI is parsed as | ||
| `[scheme://][user[:password]@]host[:port][/database]`. Each factory creates the | ||
| schema if it does not yet exist. | ||
|
|
||
| The PostgreSQL and MySQL stores use a single sqlpp23 connection when | ||
| `max_connections <= 1` and a bounded sqlpp23 connection pool otherwise. | ||
| Transaction bodies reuse the same leased connection for every store operation | ||
| issued inside `RunInTransaction`. The SQLite store ignores `max_connections` and | ||
| always uses a single connection: a file database only allows one writer (a pool | ||
| of write connections would just hit `SQLITE_BUSY`) and a `:memory:` database is | ||
| private to each connection. | ||
|
|
||
| The backing schema follows the Java/Rust-compatible `iceberg_tables` layout, | ||
| including the optional `iceberg_type` column. New table rows write | ||
| `iceberg_type = 'TABLE'` as the record type; existing rows with `NULL` remain | ||
| readable for compatibility. | ||
|
|
||
| ## Bring your own store | ||
|
|
||
| To use a database, driver, or connection pool that is not built in, implement | ||
| `CatalogStore` and inject it. No catalog code changes are required: | ||
|
|
||
| ```cpp | ||
| class MyCatalogStore : public iceberg::sql::CatalogStore { | ||
| public: | ||
| iceberg::Status Initialize() override { /* CREATE TABLE IF NOT EXISTS ... */ } | ||
| iceberg::Result<std::optional<std::string>> GetTableMetadataLocation( | ||
| std::string_view ns, std::string_view name) override { /* ... */ } | ||
| iceberg::Status InsertTable(std::string_view ns, std::string_view name, | ||
| std::string_view metadata_location) override { /* ... */ } | ||
| // ... the remaining CatalogStore operations ... | ||
| iceberg::Status RunInTransaction( | ||
| const std::function<iceberg::Status()>& body) override { /* ... */ } | ||
| }; | ||
|
|
||
| auto store = std::make_shared<MyCatalogStore>(/* ... */); | ||
| auto catalog = SqlCatalog::Make(config, file_io, std::move(store)).value(); | ||
| ``` | ||
|
|
||
| ### Implementation contract | ||
|
|
||
| - **Catalog scope**: a store instance is bound to one catalog name; every row it | ||
| reads or writes must be scoped by that name. | ||
| - **Namespace identifiers**: namespace levels must not be empty and must not | ||
| contain `.` because the backing schema stores a namespace as a dot-joined | ||
| string. | ||
| - **Table rows**: new table inserts should write `iceberg_type = 'TABLE'` as | ||
| the record type. | ||
| Reads should treat both `TABLE` and `NULL` as table rows so older databases | ||
| remain readable. | ||
| - **Unique violations**: `InsertTable`, `InsertNamespaceProperty`, and | ||
| `RenameTable` must report a primary-key collision as `ErrorKind::kAlreadyExists`. | ||
| The catalog relies on this as the authoritative signal for concurrent creates. | ||
| - **Affected rows**: `UpdateTableMetadataLocation` performs the optimistic | ||
| compare-and-set; it must return the number of rows updated (0 on a stale base). | ||
| - **Atomicity**: `RunInTransaction` must commit on success and roll back on any | ||
| error so the database is left unchanged. | ||
| - **Threading**: a store may be called from multiple threads; serialize | ||
| internally or use one connection per concurrent operation. The built-in | ||
| PostgreSQL and MySQL stores use a bounded sqlpp23 connection pool when | ||
| `max_connections > 1`; the SQLite store always uses a single connection. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to add a dedicated ci workflow for the sql catalog? We can trigger it only when files related to sql catalog have been changed to reduce resource usage.