Overriding profile endpoint with analyze endpoint with operator tree and profiling by Krish-Gandhi · Pull Request #5568 · opensearch-project/sql

Krish-Gandhi · 2026-06-19T00:06:17Z

Description

Introduces the analyze endpoint for PPL queries, which can be activated by passing "analyze": true as a request body parameter.
Overrides the existing profile endpoint to run analyze functionality.
Propagate PPLanalyze flag through transport and request parsing.
Combines logical plan and physical plan nodes by walking each tree and grouping nodes into an operator_tree.
Maps PPL query segments and estimated/actual rows to operator_tree nodes.
Combined current profile plan response with operator_tree nodes to calculate time taken by each node.

Note: This PR provides a simple implementation of creating the operator_tree, which works for simpler queries. The logic doesn't hold for queries that produce non-linear physical plan trees (for example, JOINs). This only affects the operator_tree, meaning the subset of the response that corresponds to profile is 100% functional. Therefore, in this scenario, the analyze endpoint will "fallback" to the profile output by returning a response mirroring the profile endpoint (by only including the profile, schema, datarows, total, and size fields in the response). This operator_tree issue will be addressed in a later PR.

Important

This PR overrides the existing profile endpoint by routing all requests with either "analyze": true or "profile": true (or both) to pplService.analyze() in TransportPPLQueryAction.java. For current end-users of the profile endpoint, this will make little difference, as the response from profile is a subset of the response from analyze. In simpler terms, current end-users can make no changes and have similar results.

All of the existing code for profile still exists and is in place. This makes it extremely simple to separate analyze and profile again. This can be done by removing the || transformedRequest.profile() part of the boolean expression in line 200 of TransportPPLQueryAction.java.

plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java:193-209:

    if (transformedRequest.isExplainRequest()) {
      pplService.explain(
          transformedRequest, createExplainResponseListener(transformedRequest, clearingListener));
    /**
     * Removing  `|| transformedRequest.profile()` from line 200 will
     * separate the `profile` and `analyze` endpoints. See PR #5568.
     */
    } else if (transformedRequest.analyze() || transformedRequest.profile()) {
      pplService.analyze(
        transformedRequest, createAnalyzeResponseListener(transformedRequest, clearingListener));
    } else {
      pplService.execute(
          transformedRequest,
          createListener(transformedRequest, clearingListener),
          createExplainResponseListener(transformedRequest, clearingListener));
    }
  }

Example Query and Response

The following curl command will run the query source=accounts | where age < 30 | eval full_name = firstname + \" \" + lastname | fields full_name, email, age on the analyze endpoint:

curl -X POST "localhost:9200/_plugins/_ppl" \
  -H "Content-Type: application/json" \
  -d '{"query": "source=accounts | where age < 30 | eval full_name = firstname + \" \" + lastname | fields full_name, email, age", "analyze": true}'

The response of this will be as follows. (NOTE: The "logicalPlan" and "physicalPlan" fields are included for debugging purposes and should not be included in the final version of this endpoint.)

{
  "query": "source=accounts | where age < 30 | eval full_name = firstname + \" \" + lastname | fields full_name, email, age",
  "querySegments": [
    {
      "nodeType": "SearchFrom",
      "source": "source=accounts"
    },
    {
      "nodeType": "WhereCommand",
      "source": "where age < 30"
    },
    {
      "nodeType": "EvalCommand",
      "source": "eval full_name = firstname + \" \" + lastname"
    },
    {
      "nodeType": "FieldsCommand",
      "source": "fields full_name, email, age"
    }
  ],
  "logicalPlan": [
    "LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]): rowcount = 5000.0, cumulative cost = {114000.0 rows, 145000.0 cpu, 0.0 io}, id = 103",
    "LogicalProject(full_name=[||(||($0, ' '), $4)], email=[$3], age=[$2]): rowcount = 5000.0, cumulative cost = {109000.0 rows, 25000.0 cpu, 0.0 io}, id = 102",
    "LogicalFilter(condition=[<($2, 30)]): rowcount = 5000.0, cumulative cost = {104000.0 rows, 10000.0 cpu, 0.0 io}, id = 100",
    "CalciteLogicalIndexScan(table=[[OpenSearch, accounts]]): rowcount = 10000.0, cumulative cost = {99000.0 rows, 0.0 cpu, 0.0 io}, id = 99"
  ],
  "physicalPlan": [
    "EnumerableCalc(expr#0..3=[{inputs}], expr#4=[' '], expr#5=[||($t0, $t4)], expr#6=[||($t5, $t3)], full_name=[$t6], email=[$t2], age=[$t1]): rowcount = 5000.0, cumulative cost = {22996.4 rows, 50000.0 cpu, 0.0 io}, id = 193",
    "CalciteEnumerableIndexScan(table=[[OpenSearch, accounts]], PushDownContext=[[PROJECT->[firstname, age, email, lastname], FILTER-><($1, 30), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"query\":{
  ],
  "profile": {
    "summary": {
      "total_time_ms": 777.18
    },
    "phases": {
      "analyze": {
        "time_ms": 477.57
      },
      "optimize": {
        "time_ms": 225.46
      },
      "execute": {
        "time_ms": 73.41
      },
      "format": {
        "time_ms": 0.11
      }
    },
    "plan": {
      "node": "EnumerableCalc",
      "time_ms": 64.19,
      "rows": 3,
      "children": [
        {
          "node": "CalciteEnumerableIndexScan",
          "time_ms": 64.0,
          "rows": 3
        }
      ]
    }
  },
  "operator_tree": [
    {
      "source": "source=accounts | where age < 30",
      "node_type": [
        "SearchFrom",
        "WhereCommand"
      ],
      "description": [
        "CalciteLogicalIndexScan(table=[[OpenSearch, accounts]]): rowcount = 10000.0, cumulative cost = {99000.0 rows, 0.0 cpu, 0.0 io}, id = 99",
        "LogicalFilter(condition=[<($2, 30)]): rowcount = 5000.0, cumulative cost = {104000.0 rows, 10000.0 cpu, 0.0 io}, id = 100"
      ],
      "estimated_rows": 5000,
      "actual_time_ms": "64.00 ms",
      "actual_rows": 3,
      "is_pushed_down": true
    },
    {
      "source": "eval full_name = firstname + \" \" + lastname | fields full_name, email, age",
      "node_type": [
        "EvalCommand",
        "FieldsCommand"
      ],
      "description": [
        "LogicalProject(full_name=[||(||($0, ' '), $4)], email=[$3], age=[$2]): rowcount = 5000.0, cumulative cost = {109000.0 rows, 25000.0 cpu, 0.0 io}, id = 102"
      ],
      "estimated_rows": 5000,
      "actual_time_ms": "0.19 ms",
      "actual_rows": 3
    }
  ],
  "recommendations": [
    {
      "serverity": "INFO",
      "rule": "Pushdown visibility",
      "message": "1 of 2 stages pushed down; 1 ran in-memory"
    },
    {
      "serverity": "INFO",
      "rule": "Bottleneck stage",
      "message": "87% of time is in the *SearchFrom, WhereCommand* stage",
      "affected_node": "source=accounts | where age < 30",
      "suggestion": "Consider optimizing the SearchFrom, WhereCommand operation"
    }
  ],
  "schema": [
    {
      "name": "full_name",
      "type": "STRING"
    },
    {
      "name": "email",
      "type": "STRING"
    },
    {
      "name": "age",
      "type": "LONG"
    }
  ],
  "datarows": [
    [
      "Jane Smith",
      "jane@example.com",
      28
    ],
    [
      "Kyle Miller",
      "goat@example.com",
      22
    ],
    [
      "Joette Kap",
      "coast2coast@example.com",
      22
    ]
  ],
  "total": 3,
  "size": 3
}

Performance of `analyze`

After writing a benchmarking script to run 20 queries on a sample 10 GB dataset, the results were as follows. This shows that analyze has a small enough overhead to justify running it by default on the Discover tab on the OpenSearch Dashboard.

=== PPL Query Benchmark ===

--- Results ---

normal      avg=4489.8ms  p50=206.9ms  p95=1083.1ms  min=28.5ms  max=84080.9ms
profile     avg=4407.9ms  p50=197.9ms  p95=749.1ms  min=19.1ms  max=83598.1ms
analyze     avg=4410.2ms  p50=213.3ms  p95=801.3ms  min=38.3ms  max=83146.3ms

Related Issues

#5500
Resolves #4343

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…and profiling Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

github-actions · 2026-06-25T17:52:36Z

PR Reviewer Guide 🔍

(Review updated until commit `ff87829`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue In `visitChildren`, when tracking is enabled, the method iterates over children and calls `child.accept(this, context)` for each, but only the last child's result is returned. If a node has multiple children, all but the last child's results are discarded. This breaks the visitor pattern for multi-child nodes (e.g., joins, unions). The non-tracking path calls `super.visitChildren(node, context)` which properly handles multiple children. if (context.isTrackingEnabled() && node instanceof UnresolvedPlan) { // Track each child's total contribution (the subtree it produces) RelNode result = null; for (org.opensearch.sql.ast.Node child : node.getChild()) { int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode childResult = child.accept(this, context); result = childResult; // After child.accept returns, the child's visit* method has fully completed, // so all RelNodes produced by that child (including ITS children) are on the stack. int idAfter = context.relBuilder.peek().getId(); if (child instanceof UnresolvedPlan) { java.util.List<Integer> producedIds = new java.util.ArrayList<>(); for (int id = idBefore + 1; id <= idAfter; id++) { producedIds.add(id); } context.recordMapping(child.getClass().getSimpleName(), producedIds); } } if (node instanceof UnresolvedPlan plan) { mapPathMaterializer.materializePaths(plan, context); } return result; } Thread Safety `QueryContext.setProfile(true)` at line 256 modifies a shared context without synchronization. If multiple threads call `analyzeWithCalcite` concurrently, they will race on this global flag. One thread's profile setting can leak into another thread's execution, causing incorrect profiling behavior or missing profile data. QueryContext.setProfile(true); Resource Leak At line 367, `OpenSearchRelRunners.run(context, calcitePlan)` returns a `PreparedStatement` that is immediately closed in a try-with-resources block. However, if an exception occurs between line 367 and the try block's opening, the statement is never closed. The `Hook.Closeable` at line 359 is properly closed, but the PreparedStatement is not guaranteed cleanup if `run()` throws before the try block executes. OpenSearchRelRunners.run(context, calcitePlan)) { } catch (java.sql.SQLException e) { throw new RuntimeException(e); Possible Issue `buildOperatorTree` assumes a linear plan structure (single-child chains). When `isLinearPlanTree` returns false (line 310), the method returns early with a fallback response. However, the fallback response omits `query`, `querySegments`, `logicalPlan`, `physicalPlan`, and `operator_tree` fields (lines 330-336), which breaks the documented API contract. Clients expecting these fields will receive incomplete responses for non-linear queries like JOINs. if (profile != null && profile.getPlan() != null && !isLinearPlanTree(profile)) { List<AnalyzeResponse.SchemaColumn> schema = new ArrayList<>(); if (queryResponse.getSchema() != null) { for (ExecutionEngine.Schema.Column col : queryResponse.getSchema().getColumns()) { schema.add( AnalyzeResponse.SchemaColumn.builder() .name(col.getName()) .type(col.getExprType().typeName()) .build()); } } Object[][] datarows = new Object[queryResponse.getResults().size()][]; int rowIdx = 0; for (var exprValue : queryResponse.getResults()) { datarows[rowIdx++] = exprValue.tupleValue().entrySet().stream() .map(e -> e.getValue().value()) .toArray(Object[]::new); } listener.onResponse( AnalyzeResponse.builder() // .query(query) .profile(profile) .schema(schema) .datarows(datarows) .total(datarows.length) .size(datarows.length) .build()); return; } Possible Issue In `buildOperatorTree`, the calculation of `pushedNodeCount` at line 486 assumes `logicalDepth - physicalDepth` equals the number of pushed-down logical nodes. This assumption breaks when the physical plan has a different structure than expected (e.g., additional physical operators inserted by the optimizer). If `pushedNodeCount` is negative or exceeds the actual number of segments, the loop at line 503 will produce incorrect `pushedSegments`, leading to wrong pushdown attribution in the operator tree. int physicalDepth = getLinearDepth(physicalPlan); int logicalDepth = getLinearDepth(logicalPlan); int pushedNodeCount = logicalDepth - physicalDepth;

github-actions · 2026-06-25T17:53:07Z

PR Code Suggestions ✨

Latest suggestions up to ff87829

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Guard against empty builder stack Check if `context.relBuilder` is empty before calling `peek()` after `accept()` returns. If the builder is empty, `peek()` will throw `NoSuchElementException`, causing the analyze operation to fail unexpectedly. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [222-232] if (context.isTrackingEnabled()) { int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode result = unresolved.accept(this, context); + if (context.relBuilder.size() == 0) { + return result; + } int idAfter = context.relBuilder.peek().getId(); java.util.List<Integer> producedIds = new java.util.ArrayList<>(); for (int id = idBefore + 1; id <= idAfter; id++) { producedIds.add(id); } context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); return result; } Suggestion importance[1-10]: 8 __ Why: This addresses a potential `NoSuchElementException` when calling `peek()` on an empty `relBuilder` stack after `accept()` returns. The check is necessary to prevent runtime failures during tracking.	Medium
Possible issue	Add timeout to latch await Add a timeout to `latch.await()` to prevent indefinite blocking if the async callback never completes. Without a timeout, the thread could hang forever if `executeWithCalcite` fails to invoke either callback, leading to resource exhaustion. core/src/main/java/org/opensearch/sql/executor/QueryService.java [292-297] -CountDownLatch latch = new CountDownLatch(1); - -executeWithCalcite( - plan, - queryType, - null, - new ResponseListener<>() { - @Override - public void onResponse(ExecutionEngine.QueryResponse response) { - ... - latch.countDown(); - } - - @Override - public void onFailure(Exception e) { - errorRef.set(e); - latch.countDown(); - } - }); - try { - latch.await(); + if (!latch.await(60, TimeUnit.SECONDS)) { + listener.onFailure(new RuntimeException("Timeout waiting for query execution")); + return; + } } catch (InterruptedException e) { Thread.currentThread().interrupt(); listener.onFailure(new RuntimeException("Interrupted while waiting for query execution", e)); return; } Suggestion importance[1-10]: 7 __ Why: Adding a timeout prevents indefinite blocking if the async callback fails to complete, which is a valid concern for production systems. However, the 60-second timeout value is arbitrary and may need tuning based on actual query execution times.	Medium
General	Prevent negative pushed node count Validate that `pushedNodeCount` is non-negative before using it as a loop boundary. If `physicalDepth` exceeds `logicalDepth` due to unexpected plan structure, the negative value could cause incorrect operator tree grouping or infinite loops. core/src/main/java/org/opensearch/sql/executor/QueryService.java [486-508] -int pushedNodeCount = logicalDepth - physicalDepth; +int pushedNodeCount = Math.max(0, logicalDepth - physicalDepth); ... long pushedLogicalNodes = 0; int pushedSegments = 0; for (int idx = 0; idx < querySegments.size() && pushedLogicalNodes < pushedNodeCount; idx++) { Set<Integer> ids = idx < exclusiveIds.size() ? exclusiveIds.get(idx) : Set.of(); long planNodeCount = ids.stream().filter(idToDescription::containsKey).count(); pushedLogicalNodes += planNodeCount; pushedSegments++; } Suggestion importance[1-10]: 6 __ Why: Using `Math.max(0, ...)` prevents negative values that could cause incorrect operator tree grouping. While this is a defensive programming practice, the scenario where `physicalDepth` exceeds `logicalDepth` may indicate a deeper issue that should be investigated rather than silently handled.	Low
General	Include empty operator tree field Set `operator_tree` to an empty list instead of omitting it when falling back to profile-only mode. Consumers expecting the field to always be present may encounter null pointer exceptions or deserialization errors if it's missing. core/src/main/java/org/opensearch/sql/executor/QueryService.java [329-338] -if (profile != null && profile.getPlan() != null && !isLinearPlanTree(profile)) { - ... - listener.onResponse( - AnalyzeResponse.builder() - .profile(profile) - .schema(schema) - .datarows(datarows) - .total(datarows.length) - .size(datarows.length) - .build()); - return; -} +listener.onResponse( + AnalyzeResponse.builder() + .profile(profile) + .operator_tree(List.of()) + .schema(schema) + .datarows(datarows) + .total(datarows.length) + .size(datarows.length) + .build()); Suggestion importance[1-10]: 5 __ Why: Setting `operator_tree` to an empty list instead of omitting it improves API consistency and prevents potential null pointer exceptions in consumers. However, the `AnalyzeResponse.builder()` may already handle null fields appropriately, and the documentation should clarify the field's optionality.	Low

Previous suggestions

Suggestions up to commit 7f164a0

Category	Suggestion	Impact
Possible issue	Guard against empty builder stack Accessing `context.relBuilder.peek()` after `unresolved.accept()` can throw `NoSuchElementException` if the builder stack is empty. Verify the builder has elements before calling `peek()` to prevent runtime crashes during tracking. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [222-232] if (context.isTrackingEnabled()) { int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode result = unresolved.accept(this, context); + if (context.relBuilder.size() == 0) { + return result; + } int idAfter = context.relBuilder.peek().getId(); java.util.List<Integer> producedIds = new java.util.ArrayList<>(); for (int id = idBefore + 1; id <= idAfter; id++) { producedIds.add(id); } context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); return result; } Suggestion importance[1-10]: 8 __ Why: This is a critical safety check. After `unresolved.accept()` at line 224, the builder stack could be empty, and calling `peek()` at line 225 would throw `NoSuchElementException`. The guard prevents a potential runtime crash during tracking.	Medium
	Add timeout to latch await Add a timeout to `latch.await()` to prevent indefinite blocking if the async callback never completes. Without a timeout, the thread could hang forever if `executeWithCalcite` fails to invoke either callback, leading to resource exhaustion. core/src/main/java/org/opensearch/sql/executor/QueryService.java [291-297] -CountDownLatch latch = new CountDownLatch(1); - -executeWithCalcite( - plan, - queryType, - null, - new ResponseListener<>() { - @Override - public void onResponse(ExecutionEngine.QueryResponse response) { - ... - latch.countDown(); - } - - @Override - public void onFailure(Exception e) { - errorRef.set(e); - latch.countDown(); - } - }); - try { - latch.await(); + if (!latch.await(60, TimeUnit.SECONDS)) { + listener.onFailure(new RuntimeException("Query execution timed out after 60 seconds")); + return; + } } catch (InterruptedException e) { Thread.currentThread().interrupt(); listener.onFailure(new RuntimeException("Interrupted while waiting for query execution", e)); return; } Suggestion importance[1-10]: 7 __ Why: Adding a timeout prevents indefinite blocking if the async callback fails to complete, which is a valid concern for production code. However, the 60-second timeout value is arbitrary and may need tuning based on actual query execution patterns.	Medium
	Prevent negative pushed node count When `pushedNodeCount` is negative (if `physicalDepth > logicalDepth`), the loop condition `pushedLogicalNodes < pushedNodeCount` will never be satisfied, causing all segments to be marked as pushed down incorrectly. Add validation to ensure `pushedNodeCount` is non-negative. core/src/main/java/org/opensearch/sql/executor/QueryService.java [484-508] -int pushedNodeCount = logicalDepth - physicalDepth; +int pushedNodeCount = Math.max(0, logicalDepth - physicalDepth); ... long pushedLogicalNodes = 0; int pushedSegments = 0; for (int idx = 0; idx < querySegments.size() && pushedLogicalNodes < pushedNodeCount; idx++) { Set<Integer> ids = idx < exclusiveIds.size() ? exclusiveIds.get(idx) : Set.of(); long planNodeCount = ids.stream().filter(idToDescription::containsKey).count(); pushedLogicalNodes += planNodeCount; pushedSegments++; } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that `pushedNodeCount` could be negative if `physicalDepth > logicalDepth`, which would cause incorrect pushdown marking. Using `Math.max(0, ...)` at line 486 ensures the value is non-negative and prevents logic errors in the subsequent loop.	Medium
General	Remove or uncomment query field The commented-out `.query(query)` field should either be included or removed entirely. Leaving commented code in production suggests incomplete implementation and may confuse future maintainers about whether this field should be present. core/src/main/java/org/opensearch/sql/executor/QueryService.java [329-338] if (profile != null && profile.getPlan() != null && !isLinearPlanTree(profile)) { ... listener.onResponse( AnalyzeResponse.builder() - // .query(query) + .query(query) .profile(profile) .schema(schema) .datarows(datarows) .total(datarows.length) .size(datarows.length) .build()); return; } Suggestion importance[1-10]: 3 __ Why: While removing commented code improves maintainability, the commented field at line 332 appears intentional (possibly for debugging). The suggestion correctly identifies that the field should be included in the fallback response, as it is included in the main response at line 419.	Low

Suggestions up to commit 727e3da

Category	Suggestion	Impact
Possible issue	Prevent NoSuchElementException on empty builder The code assumes `context.relBuilder.peek()` is always valid after `unresolved.accept()`, but if the builder is empty, this will throw `NoSuchElementException`. Add a null/empty check before accessing `peek()` to prevent runtime failures when the builder stack is unexpectedly empty. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [222-232] if (context.isTrackingEnabled()) { int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode result = unresolved.accept(this, context); + if (context.relBuilder.size() == 0) { + return result; + } int idAfter = context.relBuilder.peek().getId(); java.util.List<Integer> producedIds = new java.util.ArrayList<>(); for (int id = idBefore + 1; id <= idAfter; id++) { producedIds.add(id); } context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); return result; } Suggestion importance[1-10]: 8 __ Why: The code assumes `context.relBuilder.peek()` is valid after `unresolved.accept()`, but if the builder is empty, this will throw `NoSuchElementException`. This is a critical runtime safety issue that could cause crashes during tracking.	Medium
Possible issue	Guard against empty builder after child processing Similar to the `analyze` method, this code accesses `context.relBuilder.peek()` without verifying the builder is non-empty after `child.accept()`. If the child processing empties the builder, `peek()` will throw `NoSuchElementException`. Add a size check before calling `peek()`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [242-247] int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode childResult = child.accept(this, context); result = childResult; +if (context.relBuilder.size() == 0) { + continue; +} int idAfter = context.relBuilder.peek().getId(); Suggestion importance[1-10]: 8 __ Why: Similar to the first suggestion, this code accesses `context.relBuilder.peek()` without verifying the builder is non-empty after `child.accept()`. This is a critical safety issue that could cause `NoSuchElementException` at runtime.	Medium
General	Count only segments with plan nodes The loop increments `pushedSegments` even when `planNodeCount` is zero, potentially counting segments that contribute no logical nodes. This can lead to incorrect operator tree grouping. Only increment `pushedSegments` when `planNodeCount > 0` to accurately track segments with actual plan nodes. core/src/main/java/org/opensearch/sql/executor/QueryService.java [461-467] int pushedSegments = 0; for (int idx = 0; idx < querySegments.size() && pushedLogicalNodes < pushedNodeCount; idx++) { Set<Integer> ids = idx < exclusiveIds.size() ? exclusiveIds.get(idx) : Set.of(); long planNodeCount = ids.stream().filter(idToDescription::containsKey).count(); - pushedLogicalNodes += planNodeCount; - pushedSegments++; + if (planNodeCount > 0) { + pushedLogicalNodes += planNodeCount; + pushedSegments++; + } } Suggestion importance[1-10]: 7 __ Why: The loop increments `pushedSegments` even when `planNodeCount` is zero, which can lead to incorrect operator tree grouping. Only counting segments with actual plan nodes improves the accuracy of the pushdown detection logic.	Medium
General	Clean up profiling state on interruption After catching `InterruptedException` and re-interrupting the thread, the method returns without cleaning up `QueryProfiling` state. This could leave profiling in an inconsistent state. Ensure `QueryProfiling.noop()` or similar cleanup is called before returning to prevent resource leaks. core/src/main/java/org/opensearch/sql/executor/QueryService.java [284-290] try { latch.await(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); + QueryProfiling.noop(); listener.onFailure(new RuntimeException("Interrupted while waiting for query execution", e)); return; } Suggestion importance[1-10]: 6 __ Why: After catching `InterruptedException`, the method returns without cleaning up `QueryProfiling` state. Adding `QueryProfiling.noop()` before returning ensures consistent state management and prevents potential resource leaks.	Low

Suggestions up to commit 127dfe7

Category	Suggestion	Impact
Possible issue	Prevent peek on empty builder Check if `context.relBuilder` is empty before calling `peek()` after `accept()`. The `accept()` call might not push any nodes onto the builder, causing `peek()` to throw `NoSuchElementException`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [223-225] int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode result = unresolved.accept(this, context); -int idAfter = context.relBuilder.peek().getId(); +int idAfter = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; Suggestion importance[1-10]: 8 __ Why: The code calls `context.relBuilder.peek()` after `accept()` without checking if the builder is empty, which could throw `NoSuchElementException`. This is a potential runtime error that should be prevented.	Medium
Possible issue	Add timeout to latch await Add a timeout to `latch.await()` to prevent indefinite blocking if the async execution never completes. Without a timeout, the thread could hang forever if the callback is never invoked due to an unexpected error path. core/src/main/java/org/opensearch/sql/executor/QueryService.java [283-290] -CountDownLatch latch = new CountDownLatch(1); - -executeWithCalcite( - plan, - queryType, - null, - new ResponseListener<>() { - @Override - public void onResponse(ExecutionEngine.QueryResponse response) { - ... - latch.countDown(); - } - - @Override - public void onFailure(Exception e) { - errorRef.set(e); - latch.countDown(); - } - }); - try { - latch.await(); + if (!latch.await(60, TimeUnit.SECONDS)) { + listener.onFailure(new RuntimeException("Query execution timed out after 60 seconds")); + return; + } } catch (InterruptedException e) { Thread.currentThread().interrupt(); listener.onFailure(new RuntimeException("Interrupted while waiting for query execution", e)); return; } Suggestion importance[1-10]: 7 __ Why: Adding a timeout to `latch.await()` prevents indefinite blocking if the async callback never completes. This is a valid defensive programming practice that improves robustness, though the system may have other safeguards in place.	Medium
General	Validate ID range before iteration Validate that `idAfter >= idBefore` before creating the range. If node IDs are not monotonically increasing or if the builder state is inconsistent, this could produce incorrect or negative ranges. core/src/main/java/org/opensearch/sql/executor/QueryService.java [227-229] -for (int id = idBefore + 1; id <= idAfter; id++) { - producedIds.add(id); +if (idAfter >= idBefore) { + for (int id = idBefore + 1; id <= idAfter; id++) { + producedIds.add(id); + } } Suggestion importance[1-10]: 6 __ Why: While validating `idAfter >= idBefore` adds safety, the logic assumes monotonically increasing IDs. The check prevents incorrect ranges but may indicate a deeper issue if the condition fails. This is a reasonable defensive check.	Low
General	Restore profiling state after tracking Ensure `QueryProfiling.noop()` is properly restored after the tracking phase completes. If an exception occurs during the second phase, profiling state might remain in noop mode, affecting subsequent queries. core/src/main/java/org/opensearch/sql/executor/QueryService.java [305-313] QueryProfiling.noop(); -CalciteClassLoaderHelper.withCalciteClassLoader( - () -> { - CalcitePlanContext context = - CalcitePlanContext.create( - buildFrameworkConfig(), SysLimit.fromSettings(settings), queryType); - context.setTrackingEnabled(true); - RelNode relNode = analyze(plan, context); - ... - }, - QueryService.class); +try { + CalciteClassLoaderHelper.withCalciteClassLoader( + () -> { + CalcitePlanContext context = + CalcitePlanContext.create( + buildFrameworkConfig(), SysLimit.fromSettings(settings), queryType); + context.setTrackingEnabled(true); + RelNode relNode = analyze(plan, context); + ... + }, + QueryService.class); +} finally { + QueryProfiling.current(); +} Suggestion importance[1-10]: 5 __ Why: The suggestion to restore profiling state in a `finally` block is reasonable, but the `improved_code` calls `QueryProfiling.current()` which may not properly restore the previous state. The concern is valid but the solution needs verification.	Low

Suggestions up to commit 50ddb92

Category	Suggestion	Impact
Possible issue	Add null safety check The code assumes `context.relBuilder.peek()` is always non-null after `unresolved.accept()`, which may not be true if the visitor empties the builder stack. Add a null check before calling `getId()` to prevent potential `NullPointerException`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [222-232] if (context.isTrackingEnabled()) { int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode result = unresolved.accept(this, context); - int idAfter = context.relBuilder.peek().getId(); - java.util.List<Integer> producedIds = new java.util.ArrayList<>(); - for (int id = idBefore + 1; id <= idAfter; id++) { - producedIds.add(id); + if (context.relBuilder.size() > 0) { + int idAfter = context.relBuilder.peek().getId(); + java.util.List<Integer> producedIds = new java.util.ArrayList<>(); + for (int id = idBefore + 1; id <= idAfter; id++) { + producedIds.add(id); + } + context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); } - context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); return result; } Suggestion importance[1-10]: 8 __ Why: The code assumes `context.relBuilder.peek()` is non-null after `unresolved.accept()`, which could cause a `NullPointerException` if the visitor empties the builder stack. This is a critical safety issue that should be addressed.	Medium
Possible issue	Prevent null pointer access Similar to the `analyze` method, `context.relBuilder.peek()` may be null after `child.accept()` completes. Verify the builder is non-empty before accessing `peek().getId()` to avoid `NullPointerException`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [242-247] int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode childResult = child.accept(this, context); result = childResult; -int idAfter = context.relBuilder.peek().getId(); +if (context.relBuilder.size() > 0) { + int idAfter = context.relBuilder.peek().getId(); + ... +} Suggestion importance[1-10]: 8 __ Why: Similar to the first suggestion, `context.relBuilder.peek()` may be null after `child.accept()` completes. This is a critical safety issue that could cause a `NullPointerException` in production.	Medium
General	Validate hook callback type The hook callback casts `obj` to `RelRoot` without validation. If the hook receives an unexpected object type, this will throw `ClassCastException`. Add type checking before casting to handle unexpected hook invocations gracefully. core/src/main/java/org/opensearch/sql/executor/QueryService.java [316-330] -AtomicReference<RelNode> physicalRelRef = new AtomicReference<>(); try (Hook.Closeable closeable = Hook.PLAN_BEFORE_IMPLEMENTATION.addThread( obj -> { - RelRoot relRoot = (RelRoot) obj; - physicalRelRef.set(relRoot.rel); - physicalPlanRef.set( - RelOptUtil.toString(relRoot.rel, SqlExplainLevel.ALL_ATTRIBUTES)); + if (obj instanceof RelRoot relRoot) { + physicalRelRef.set(relRoot.rel); + physicalPlanRef.set( + RelOptUtil.toString(relRoot.rel, SqlExplainLevel.ALL_ATTRIBUTES)); + } })) { - try (java.sql.PreparedStatement ignored = - OpenSearchRelRunners.run(context, calcitePlan)) { - } catch (java.sql.SQLException e) { - throw new RuntimeException(e); - } + ... } Suggestion importance[1-10]: 7 __ Why: The unchecked cast to `RelRoot` could throw `ClassCastException` if the hook receives an unexpected object type. Adding type validation improves robustness and prevents unexpected runtime failures.	Medium
General	Cleanup profiling on interruption After catching `InterruptedException` and re-interrupting the thread, the method returns without cleaning up `QueryProfiling` state. Ensure `QueryProfiling.noop()` or equivalent cleanup is called before returning to prevent resource leaks. core/src/main/java/org/opensearch/sql/executor/QueryService.java [284-290] try { latch.await(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); + QueryProfiling.noop(); listener.onFailure(new RuntimeException("Interrupted while waiting for query execution", e)); return; } Suggestion importance[1-10]: 6 __ Why: Adding `QueryProfiling.noop()` cleanup before returning on interruption prevents potential resource leaks and ensures consistent state management, though the impact is moderate since interruption is an exceptional case.	Low

Suggestions up to commit 726b7e6

Category	Suggestion	Impact
Possible issue	Ensure latch countdown on all exceptions If `executeWithCalcite` throws an exception before invoking the listener callback, the latch will never count down, causing the subsequent `latch.await()` to block indefinitely. Wrap the `executeWithCalcite` call in a try-catch block and ensure `latch.countDown()` is called in all error paths. core/src/main/java/org/opensearch/sql/executor/QueryService.java [255-282] AtomicReference<ExecutionEngine.QueryResponse> queryResponseRef = new AtomicReference<>(); AtomicReference<QueryProfile> profileRef = new AtomicReference<>(); AtomicReference<Exception> errorRef = new AtomicReference<>(); CountDownLatch latch = new CountDownLatch(1); -executeWithCalcite( - plan, - queryType, - null, - new ResponseListener<>() { - @Override - public void onResponse(ExecutionEngine.QueryResponse response) { - ... - latch.countDown(); - } +try { + executeWithCalcite( + plan, + queryType, + null, + new ResponseListener<>() { + @Override + public void onResponse(ExecutionEngine.QueryResponse response) { + ... + latch.countDown(); + } - @Override - public void onFailure(Exception e) { - errorRef.set(e); - latch.countDown(); - } - }); + @Override + public void onFailure(Exception e) { + errorRef.set(e); + latch.countDown(); + } + }); +} catch (Exception e) { + errorRef.set(e); + latch.countDown(); +} Suggestion importance[1-10]: 9 __ Why: If `executeWithCalcite` throws before invoking the listener, the latch never counts down, causing `latch.await()` to block indefinitely. Wrapping in try-catch and ensuring countdown in all error paths prevents deadlock.	High
	Prevent null pointer on empty builder The code assumes `context.relBuilder.peek()` will not be null after `unresolved.accept()`, but if the accept method doesn't push any nodes, this will throw a `NoSuchElementException`. Add a null/empty check before accessing `peek()` to prevent crashes when tracking is enabled for nodes that don't produce RelNodes. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [222-232] if (context.isTrackingEnabled()) { int idBefore = context.relBuilder.size() > 0 ? context.relBuilder.peek().getId() : -1; RelNode result = unresolved.accept(this, context); - int idAfter = context.relBuilder.peek().getId(); - java.util.List<Integer> producedIds = new java.util.ArrayList<>(); - for (int id = idBefore + 1; id <= idAfter; id++) { - producedIds.add(id); + if (context.relBuilder.size() > 0) { + int idAfter = context.relBuilder.peek().getId(); + java.util.List<Integer> producedIds = new java.util.ArrayList<>(); + for (int id = idBefore + 1; id <= idAfter; id++) { + producedIds.add(id); + } + context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); } - context.recordMapping(unresolved.getClass().getSimpleName(), producedIds); return result; } Suggestion importance[1-10]: 8 __ Why: The code assumes `context.relBuilder.peek()` will not throw after `unresolved.accept()`, but if no nodes are pushed, `peek()` will throw `NoSuchElementException`. Adding a size check prevents a potential crash in tracking mode.	Medium
	Add timeout to prevent indefinite blocking The `latch.await()` call blocks indefinitely without a timeout, which can cause the thread to hang forever if the async callback never completes. Add a timeout to `await()` and handle the timeout case to prevent resource exhaustion and improve system resilience. core/src/main/java/org/opensearch/sql/executor/QueryService.java [284-290] try { - latch.await(); + if (!latch.await(60, java.util.concurrent.TimeUnit.SECONDS)) { + listener.onFailure(new RuntimeException("Query execution timed out after 60 seconds")); + return; + } } catch (InterruptedException e) { Thread.currentThread().interrupt(); listener.onFailure(new RuntimeException("Interrupted while waiting for query execution", e)); return; } Suggestion importance[1-10]: 7 __ Why: The `latch.await()` blocks indefinitely without a timeout. If the async callback never completes, the thread hangs forever. Adding a timeout improves resilience and prevents resource exhaustion.	Medium

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

github-actions · 2026-06-25T18:00:09Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ff87829.

Path	Line	Severity	Description
core/src/main/java/org/opensearch/sql/executor/QueryService.java	280	medium	The analyze endpoint silently routes existing `profile: true` requests to the new analyze handler (`transformedRequest.analyze() \|\| transformedRequest.profile()`), changing the response format for all current profile consumers. While the profile sub-object is preserved, additional fields (logicalPlan, physicalPlan, operator_tree, datarows, schema, etc.) are now returned unconditionally. Consumers expecting the old profile-only response shape may inadvertently expose or log internal query plan details (table structures, filter predicates, cost estimates, row counts) that were not previously surfaced.
core/src/main/java/org/opensearch/sql/executor/QueryService.java	302	low	CountDownLatch.await() is called with no timeout. If the inner executeWithCalcite callback never fires (e.g., thread pool exhaustion or a hung query), the calling thread blocks indefinitely, potentially exhausting the thread pool and causing a denial-of-service condition.
core/src/main/java/org/opensearch/sql/executor/QueryService.java	358	low	The analyze path executes the user query twice — once via executeWithCalcite for profiling and a second time via OpenSearchRelRunners.run() for plan capture — doubling datastore load per analyze request. On expensive queries this doubles resource consumption, which could be exploited to amplify resource usage compared to normal query execution.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-06-25T18:01:12Z

Persistent review updated to latest commit 50ddb92

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

github-actions · 2026-06-25T18:13:32Z

Persistent review updated to latest commit 127dfe7

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

github-actions · 2026-06-30T18:12:43Z

Persistent review updated to latest commit 727e3da

ahkcs · 2026-06-30T20:29:13Z

Let's update the endpoint.md(https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/interfaces/endpoint.md) and integ-test/src/yamlRestTest/resources/rest-api-spec/test/api/ppl.profile.yml file for our profile API change

ahkcs · 2026-06-30T21:27:20Z

Let's also document the current limitation for join, also we can provide fallback option to original profile API when we meet limitations

…' on complex queries Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

github-actions · 2026-07-01T19:15:18Z

Persistent review updated to latest commit 7f164a0

…, which is not a part of this PR Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

github-actions · 2026-07-01T20:35:02Z

Persistent review updated to latest commit ff87829

Krish-Gandhi · 2026-07-01T22:25:27Z

Updated docs/user/ppl/interfaces/endpoint.md to add documentation for current state of analyze and update note for profile endpoint, as well as a note about when the logic of analyze falls apart and fallback to profile. Also updated integ-test/src/yamlRestTest/resources/rest-api-spec/test/api/ppl.profile.yml and added integ-test/src/yamlRestTest/resources/rest-api-spec/test/api/ppl.analyze.yml.

Overriding profile endpoint with analyze endpoint with operator tree …

b925ffa

…and profiling Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Krish-Gandhi requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners June 19, 2026 00:06

Krish-Gandhi marked this pull request as draft June 19, 2026 18:21

Updating AnalyzeResponse format

2a645bf

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Krish-Gandhi marked this pull request as ready for review June 19, 2026 22:16

ahkcs added the enhancement New feature or request label Jun 24, 2026

github-advanced-security AI found potential problems Jun 25, 2026

View reviewed changes

Comment thread core/src/main/java/org/opensearch/sql/executor/QueryService.java Fixed

Krish-Gandhi added 3 commits June 25, 2026 10:56

Cleaner method to combine profile and analyze

a3a4191

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Adding tests and CI test fixes

ffe4972

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Ran ./gradlew spotlessApply for formatting

50ddb92

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Krish-Gandhi force-pushed the feature/analyze-endpoint branch from 726b7e6 to 50ddb92 Compare June 25, 2026 17:58

Fixing narrowing type conversion

127dfe7

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Krish-Gandhi marked this pull request as draft June 25, 2026 21:56

Krish-Gandhi marked this pull request as ready for review June 30, 2026 16:21

Adding querySegments back to AnalyzeResponse body

727e3da

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

ahkcs reviewed Jun 30, 2026

View reviewed changes

Comment thread plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java

ahkcs reviewed Jun 30, 2026

View reviewed changes

Comment thread core/src/main/java/org/opensearch/sql/executor/execution/AnalyzePlan.java

Adding docs, integ-test, handling when Calcite is disabled, 'fallback…

7f164a0

…' on complex queries Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

Updating integ-test correct version, previously tested recommendation…

ff87829

…, which is not a part of this PR Signed-off-by: Krish Gandhi <kjg2352@gmail.com>

ahkcs approved these changes Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

Krish-Gandhi commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Important

Example Query and Response

Performance of analyze

Related Issues

Check List

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit ff87829)

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

ahkcs commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahkcs commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Krish-Gandhi commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Krish-Gandhi commented Jun 19, 2026 •

edited

Loading

Performance of `analyze`

github-actions Bot commented Jun 25, 2026 •

edited

Loading

(Review updated until commit `ff87829`)

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

ahkcs commented Jun 30, 2026 •

edited

Loading