Skip to content

feat(gooddata-sdk): [AUTO] Add refresh-partition endpoint for AI Lake pipe tables#1626

Open
yenkins-admin wants to merge 3 commits into
masterfrom
auto/openapi-sync-C003-20260524-r46494
Open

feat(gooddata-sdk): [AUTO] Add refresh-partition endpoint for AI Lake pipe tables#1626
yenkins-admin wants to merge 3 commits into
masterfrom
auto/openapi-sync-C003-20260524-r46494

Conversation

@yenkins-admin
Copy link
Copy Markdown
Contributor

Summary

Added refresh_partition method to CatalogAILakeService. Since RefreshPartitionRequest is not yet in the generated gooddata-api-client, the method uses a raw requests.post call following the same URL-construction pattern as GoodDataApiClient._do_post_request. Seeds a client-side UUID as the operation-id (matching the analyze_statistics pattern), sends it as an HTTP header, and returns it so callers can poll via get_operation / wait_for_operation.

Impact: new_feature | Services: afm-exec-api

Source commits (gdc-nas):

  • ec3f4d3 by Dan Homola — Merge pull request #23181 from gooddata/dho/refresh-partition

Files changed

  • packages/gooddata-sdk/src/gooddata_sdk/catalog/ai_lake/service.py
  • packages/gooddata-sdk/tests/catalog/unit_tests/test_ai_lake_service.py
  • packages/gooddata-sdk/tests/catalog/test_catalog_ai_lake.py

Agent decisions

Decisions (4)

raw HTTP instead of generated client — Use requests.post directly in CatalogAILakeService.refresh_partition

  • Alternatives: Wait for api-client regeneration with RefreshPartitionRequest, Hand-roll a generated-client stub (blocked by security hook)
  • Why: RefreshPartitionRequest is absent from gooddata-api-client; the task instructions explicitly recommend wrapping raw HTTP for genuinely new endpoints not yet in the generated client. The raw-HTTP layer survives the next api-client regeneration cleanly.

operation-id seeding strategy — Seed a UUID on the client side and send it as the operation-id request header, returning the same UUID to the caller

  • Alternatives: Let server generate the ID and read it from the 202 response headers
  • Why: Matches the existing analyze_statistics pattern: caller gets a known polling handle without inspecting response headers, enabling deterministic cassette recording.

no new init.py export — refresh_partition is a method on the already-exported CatalogAILakeService; no new class or alias needs re-exporting

  • Alternatives: Export a RefreshPartitionRequest wrapper class
  • Why: The method signature uses only primitive types (str, dict[str, str]); no new public class is introduced, so no init.py change is required.

integration test scope — Single test (test_refresh_partition) that supplies a fixed operation_id for cassette determinism and asserts the returned ID matches

  • Alternatives: Also call get_operation in the same test to verify the operation is trackable, Add a second test for UUID auto-generation
  • Why: One cassette per test (caps rule). UUID auto-generation is covered by the unit test. The integration test validates the full HTTP round-trip; the unit tests validate the logic.
Assumptions to verify (3)
  • The AI Lake staging instance named 'demo-db' exists and 'fact_orders' is a valid pipe table when the cassette is recorded.
  • The pre-existing ty check failures in execution.py (pyarrow not installed in this environment) and filter.py (EmptyValueHandling type mismatch) are baseline issues unrelated to this cluster's changes — they appear in git-unchanged files.
  • The refresh-partition endpoint returns HTTP 202 with no body; raise_for_status() is sufficient error handling.
Risks (2)
  • Raw requests.post bypasses the api-client's default-headers injection (X-Requested-With, X-GDC-VALIDATE-RELATIONS, Accept-Encoding). If the server requires these headers, the call will fail.
  • If the server does not honor the client-supplied operation-id header and generates its own, the returned op_id will not match the server's tracking ID and polling via get_operation(op_id) would fail with 404.
Layers touched (2)
  • public_api — New method refresh_partition on CatalogAILakeService; no new public class — method return type is str (operation ID)
    • packages/gooddata-sdk/src/gooddata_sdk/catalog/ai_lake/service.py
  • tests — Updated _make_service factory to include _hostname and _token on fake_client; added TestRefreshPartition unit tests and test_refresh_partition integration test
    • packages/gooddata-sdk/tests/catalog/unit_tests/test_ai_lake_service.py
    • packages/gooddata-sdk/tests/catalog/test_catalog_ai_lake.py
OpenAPI diff
--- a/gooddata-afm-client.json
+++ b/gooddata-afm-client.json
@@ -4547,14 +4791,15 @@
           "kind": {
-            "description": "...* `analyze-statistics` — Running ANALYZE TABLE...",
+            "description": "...* `refresh-partition` — Refreshing a specific Hive partition (delete + re-load from S3).",
             "enum": [
               "provision-database",
               "deprovision-database",
               "run-service-command",
               "create-pipe-table",
               "delete-pipe-table",
-              "analyze-statistics"
+              "analyze-statistics",
+              "refresh-partition"
             ]
@@ -5416,6 +5661,23 @@
+      "RefreshPartitionRequest": {
+        "description": "Request to refresh a specific Hive partition in a pipe-backed OLAP table",
+        "properties": {
+          "partitionSpec": {
+            "additionalProperties": { "type": "string" },
+            "description": "Partition column values identifying the partition to refresh.",
+            "type": "object"
+          }
+        },
+        "required": ["partitionSpec"],
+        "type": "object"
+      },
@@ -9874,6 +10215,95 @@
+    "/api/v1/ailake/database/instances/{instanceId}/pipeTables/{tableName}/refresh": {
+      "post": {
+        "description": "(BETA) Deletes all rows for the specified Hive partition and re-loads them from S3.",
+        "operationId": "refreshAiLakePipeTablePartition",
+        "parameters": [
+          { "name": "instanceId", "in": "path", "required": true },
+          { "name": "tableName", "in": "path", "required": true },
+          { "name": "operation-id", "in": "header", "required": false }
+        ],
+        "requestBody": { "$ref": "#/components/schemas/RefreshPartitionRequest" },
+        "responses": {
+          "202": {
+            "headers": {
+              "operation-id": { "description": "Operation ID to use for polling.", "required": true },
+              "operation-location": { "description": "Operation location URL.", "required": true }
+            }
+          }
+        },
+        "summary": "(BETA) Refresh a pipe table partition",
+        "tags": ["AI Lake", "AI Lake - Pipe Tables"]
+      }
+    }

Workflow run


Generated by SDK OpenAPI Sync workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant