Skip to content

[Feature] Add PPL rest command#5599

Open
noCharger wants to merge 6 commits into
opensearch-project:mainfrom
noCharger:feature/ppl-rest-command
Open

[Feature] Add PPL rest command#5599
noCharger wants to merge 6 commits into
opensearch-project:mainfrom
noCharger:feature/ppl-rest-command

Conversation

@noCharger

@noCharger noCharger commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Description

Add a leading rest <endpoint> command that exposes a curated, read-only, fixed-schema set of in-cluster management endpoints as a PPL table, modeled as a system row source bridged through visitRelation (the same seam as describe and the system-index family), so it runs on the Calcite engine without the unsupported table-function path.

  • Grammar/AST: REST/TIMEOUT tokens, restCommand rule, RestRelation, AstBuilder visit, query anonymizer.
  • Execution: RestSourceTable -> CalciteLogicalRestScan / CalciteEnumerableRestScan; RestEndpointRegistry (read-only allow-list + fixed schema + accepted args); RestEnumerator/RestRequest dispatch via OpenSearchClient (NodeClient in-cluster, RestClient standalone).
  • 9 endpoints: cluster health/state/settings, cat indices/nodes/cluster_manager/ plugins/shards, resolve/index.
  • Output shaping: numeric type normalization, id-to-name resolution, role-name expansion, structural flattening, graceful null degrade.
  • Args: count caps emitted rows; timeout reserved but rejected with 400; get-args applied server-side with per-arg value validation (local on health, health on cat/indices, expand_wildcards on resolve/index). Undeclared arg or out-of-domain value is rejected with a 400. level and include_defaults are deferred to a later release; flat_settings is dropped as redundant.
  • Error handling: blank endpoint, negative count, disallowed arg, and uncoercible value all surface clean 400s rather than 500s.

Related Issues

Resolves #5597

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Add a leading `rest <endpoint>` command that exposes a curated, read-only,
fixed-schema set of in-cluster management endpoints as a PPL table, modeled as
a system row source bridged through visitRelation (the same seam as describe
and the system-index family), so it runs on the Calcite engine without the
unsupported table-function path.

- Grammar/AST: REST/TIMEOUT tokens, restCommand rule, RestRelation, AstBuilder
  visit, query anonymizer.
- Execution: RestSourceTable -> CalciteLogicalRestScan / CalciteEnumerableRestScan;
  RestEndpointRegistry (read-only allow-list + fixed schema + accepted args);
  RestEnumerator/RestRequest dispatch via OpenSearchClient (NodeClient in-cluster,
  RestClient standalone).
- 9 endpoints: cluster health/state/settings, cat indices/nodes/cluster_manager/
  plugins/shards, resolve/index.
- Output shaping: numeric type normalization, id-to-name resolution, role-name
  expansion, structural flattening, graceful null degrade.
- Args: count caps emitted rows; timeout reserved but rejected with 400; get-args
  applied server-side with per-arg value validation (local on health, health on
  cat/indices, expand_wildcards on resolve/index). Undeclared arg or out-of-domain
  value is rejected with a 400. level and include_defaults are deferred to a later
  release; flat_settings is dropped as redundant.
- Error handling: blank endpoint, negative count, disallowed arg, and uncoercible
  value all surface clean 400s rather than 500s.

Tests: CalcitePPLRestIT 25/25, RestEndpointRegistryTest 16, RestSourceTableTest 10.

Signed-off-by: Louis Chu <clingzhi@amazon.com>
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 42f77a5.

PathLineSeverityDescription
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java242medium/_cluster/settings endpoint exposes all persistent and transient cluster settings to any user with cluster-monitor privilege. Redaction depends on a static volatile SettingsFilter initialized at startup; the fail-closed guard (throws if null) is correct, but the endpoint still surfaces cluster-wide configuration including tuning knobs and plugin settings not matched by filter patterns, which could aid reconnaissance.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java263medium/_cat/plugins endpoint reveals the full installed plugin inventory (name, component, version) per node to any user with cluster-monitor privilege. This exposes the security plugin stack (e.g., which version of the security plugin is installed), which is operationally useful for targeted attacks against known plugin CVEs.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java228low/_cat/nodes endpoint exposes node IP addresses, heap/RAM/CPU utilization, and role assignments. The documentation acknowledges this as intentional, but surfacing internal network addresses through a SQL/PPL interface increases the reconnaissance surface for authorized users.
core/src/main/java/org/opensearch/sql/utils/SystemIndexUtils.java63lowEndpoint specs (including user-supplied get-arg values) are hex-encoded into reserved table names and decoded at storage-engine resolution time. The decode path (fromHex -> split on newline -> key=value parse) trusts that the token was produced only by restTable(); a malformed or hand-crafted token that passes the isRestSource() prefix/suffix check could inject unexpected key-value pairs. The validate() call in RestSourceTable's constructor partially mitigates this, but the decode happens before full validation in decodeRestSpec.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 2 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 42f77a5)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns

Sensitive information exposure:
The /_cluster/settings endpoint can expose sensitive configuration values (e.g., API keys, passwords, cloud credentials) if not properly redacted. The PR mitigates this by requiring a SettingsFilter to be published via RestSettingsFilterHolder and failing closed if the filter is unavailable (line 458-464 in OpenSearchNodeClient.java). However, this relies on SQLPlugin.getRestHandlers always running successfully and setting the filter before any query executes. If there's a race condition or initialization failure where the filter is not set, the fail-closed behavior is correct, but the error message could be clearer to operators. Additionally, the standalone OpenSearchRestClient path (line 450-466 in OpenSearchRestClient.java) relies on the server-side GET /_cluster/settings endpoint to apply redaction via the flat_settings=true parameter, which assumes the server's REST layer is correctly configured. If the server's SettingsFilter is misconfigured or bypassed, sensitive settings could leak through the standalone client path.

✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The clusterSettings method fails closed when RestSettingsFilterHolder.get() returns null, throwing IllegalStateException. However, the holder is only set in SQLPlugin.getRestHandlers, which runs once at plugin initialization. If that initialization fails or is delayed, or if the holder is cleared (e.g., via the test cleanup in OpenSearchNodeClientClusterSettingsFilterTest.clearHolder), subsequent calls to clusterSettings will fail. In a production cluster, if getRestHandlers completes successfully, this is safe. But if there's any scenario where the holder remains unset after plugin startup (e.g., plugin reload, race condition during initialization), this will cause runtime failures for legitimate queries.

org.opensearch.common.settings.SettingsFilter filter =
    org.opensearch.sql.opensearch.storage.rest.RestSettingsFilterHolder.get();
if (filter == null) {
  throw new IllegalStateException(
      "cluster settings redaction filter is not initialized; refusing to return unredacted"
          + " settings");
}
Possible Issue

The decodeRestSpec method parses integer values from the hex-decoded string using Integer.parseInt(v) at line 106. If the encoded count value is malformed or outside the valid integer range, this will throw NumberFormatException, which is not caught. While the encoding side (restTable) should only write valid integers, a corrupted or manually crafted token could cause an uncaught exception here, resulting in an HTTP 500 instead of a clear client error (HTTP 400). The method should wrap this parse in a try-catch and throw IllegalArgumentException with a clear message.

} else if (k.equals("count")) {
  count = Integer.parseInt(v);
} else if (k.equals("timeout")) {
Possible Issue

The toNumber method at line 365 attempts to parse numeric strings by checking for the presence of ., e, or E to decide between Double.parseDouble and Long.parseLong. However, this logic does not handle edge cases like "1e10" (scientific notation without a decimal point) correctly—it would attempt Long.parseLong("1e10"), which throws NumberFormatException. While the catch block at line 347 converts this to IllegalArgumentException, the detection logic is fragile. A more robust approach would be to always try parsing as a double first, or use a more reliable numeric detection method.

private static Number toNumber(Object value) {
  if (value instanceof Number n) {
    return n;
  }
  String s = String.valueOf(value).trim();
  if (s.isEmpty()) {
    throw new NumberFormatException("empty string");
  }
  if (s.indexOf('.') >= 0 || s.indexOf('e') >= 0 || s.indexOf('E') >= 0) {
    return Double.parseDouble(s);
  }
  return Long.parseLong(s);
}

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 42f77a5

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Prevent silent type coercion fallback

The coercion logic silently converts all non-matching types to STRING by falling
through to the final return statement. This can mask type mismatches and produce
unexpected string values for columns that should be numeric or boolean. Consider
throwing an exception for unsupported type coercions to fail fast and provide clear
feedback.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java [330-362]

 private static ExprValue coerce(String column, ExprType type, Object value) {
   if (value == null) {
     return ExprNullValue.of();
   }
   try {
     if (type == INTEGER) {
       return integerValue(toNumber(value).intValue());
     }
     if (type == LONG) {
       return longValue(toNumber(value).longValue());
     }
     if (type == DOUBLE) {
       return doubleValue(toNumber(value).doubleValue());
     }
     if (type == BOOLEAN) {
       return booleanValue(toBoolean(value));
     }
+    if (type == STRING) {
+      return stringValue(String.valueOf(value));
+    }
+    throw new IllegalArgumentException(
+        "rest endpoint value for column ["
+            + column
+            + "] has unsupported type "
+            + type);
   } catch (IllegalArgumentException | ClassCastException e) {
     throw new IllegalArgumentException(
         "rest endpoint value for column ["
             + column
             + "] could not be coerced to "
             + type
             + ": ["
             + value
             + "]");
   }
-  return stringValue(String.valueOf(value));
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the coerce method silently falls through to return a STRING for any type not explicitly handled. Adding an explicit check for STRING type and throwing an exception for unsupported types would make the code more robust and fail-fast, improving error detection.

Medium
Remove redundant count validation check

The count truncation logic checks spec.getCount() >= 0 after already validating
non-negativity in RestEndpointRegistry.validate(). The redundant check is
unnecessary. More critically, if spec.getCount() is 0, the code returns an empty
sublist, but the validation allows 0 as valid. Ensure the behavior for count=0 is
intentional and documented.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestRequest.java [34-41]

 @Override
 public List<ExprValue> search() {
   List<ExprValue> rows = endpoint.toRows(client, spec);
-  if (spec.getCount() != null && spec.getCount() >= 0 && rows.size() > spec.getCount()) {
+  if (spec.getCount() != null && rows.size() > spec.getCount()) {
     return rows.subList(0, spec.getCount());
   }
   return rows;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion identifies a redundant check for spec.getCount() >= 0 since validation already occurs in RestEndpointRegistry.validate(). However, the improvement is minor as the redundant check doesn't cause incorrect behavior and may serve as defensive programming. The concern about count=0 behavior is valid but doesn't constitute a bug.

Low

Previous suggestions

Suggestions up to commit 343e93a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate hex string length before decoding

The fromHex method does not validate that the hex string has an even length before
dividing by 2. An odd-length string will silently truncate the last character,
potentially causing data corruption. Add a length check to fail fast with a clear
error message.

core/src/main/java/org/opensearch/sql/utils/SystemIndexUtils.java [127-136]

 private static String fromHex(String h) {
+  if (h.length() % 2 != 0) {
+    throw new IllegalArgumentException("hex string must have even length, got: " + h.length());
+  }
   byte[] bytes = new byte[h.length() / 2];
   for (int i = 0; i < bytes.length; i++) {
     bytes[i] =
         (byte)
             ((Character.digit(h.charAt(2 * i), 16) << 4)
                 + Character.digit(h.charAt(2 * i + 1), 16));
   }
   return new String(bytes, StandardCharsets.UTF_8);
 }
Suggestion importance[1-10]: 8

__

Why: This is a valid bug fix. The fromHex method should validate that the input string has even length before attempting to decode it. An odd-length string would cause silent data corruption or incorrect decoding, and the fix provides a clear error message.

Medium
Add null check for iterator

The resource health check only occurs every 1000 rows. If the iterator is null
(after close() is called), calling hasNext() will throw a NullPointerException. Add
a null check for the iterator before accessing it to prevent potential crashes.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEnumerator.java [60-72]

 @Override
 public boolean moveNext() {
+  if (iterator == null) {
+    return false;
+  }
   boolean shouldCheck = (queryCount % NUMBER_OF_NEXT_CALL_TO_CHECK == 0);
   if (shouldCheck && !this.monitor.isHealthy()) {
     throw new NonFallbackCalciteException("insufficient resources to load next row, quit.");
   }
   if (iterator.hasNext()) {
     current = iterator.next();
     queryCount++;
     return true;
   } else {
     return false;
   }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion identifies a potential NullPointerException after close() is called. While adding a null check improves robustness, the enumerator contract may already prevent calling moveNext() after close(), making this a defensive improvement rather than a critical bug fix.

Low
General
Handle unsupported type conversions explicitly

The coercion logic silently converts all non-matching types to STRING by falling
through to the final return statement. This can mask type mismatches and produce
unexpected results. Consider explicitly handling the STRING type case and throwing
an exception for truly unsupported types to make type conversion failures more
visible.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java [330-362]

 private static ExprValue coerce(String column, ExprType type, Object value) {
   if (value == null) {
     return ExprNullValue.of();
   }
   try {
     if (type == INTEGER) {
       return integerValue(toNumber(value).intValue());
     }
     if (type == LONG) {
       return longValue(toNumber(value).longValue());
     }
     if (type == DOUBLE) {
       return doubleValue(toNumber(value).doubleValue());
     }
     if (type == BOOLEAN) {
       return booleanValue(toBoolean(value));
     }
+    if (type == STRING) {
+      return stringValue(String.valueOf(value));
+    }
+    throw new IllegalArgumentException(
+        "rest endpoint value for column ["
+            + column
+            + "] has unsupported type "
+            + type);
   } catch (IllegalArgumentException | ClassCastException e) {
     throw new IllegalArgumentException(
         "rest endpoint value for column ["
             + column
             + "] could not be coerced to "
             + type
             + ": ["
             + value
             + "]");
   }
-  return stringValue(String.valueOf(value));
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the coerce method silently converts all non-matching types to STRING by falling through. Making this explicit improves code clarity and helps catch unexpected type mismatches, though the current behavior may be intentional for flexibility.

Medium
Suggestions up to commit e4c63e0
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent numeric overflow during type coercion

The coercion logic does not handle potential overflow when converting numeric
values. For example, converting a large Long to Integer via intValue() can silently
truncate or overflow. Add explicit range checks before narrowing conversions to
prevent data loss and surface clear errors when values exceed the target type's
bounds.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java [330-362]

 private static ExprValue coerce(String column, ExprType type, Object value) {
   if (value == null) {
     return ExprNullValue.of();
   }
   try {
     if (type == INTEGER) {
-      return integerValue(toNumber(value).intValue());
+      Number n = toNumber(value);
+      long longVal = n.longValue();
+      if (longVal < Integer.MIN_VALUE || longVal > Integer.MAX_VALUE) {
+        throw new IllegalArgumentException("value out of INTEGER range: " + value);
+      }
+      return integerValue(n.intValue());
     }
     if (type == LONG) {
       return longValue(toNumber(value).longValue());
     }
     if (type == DOUBLE) {
       return doubleValue(toNumber(value).doubleValue());
     }
     if (type == BOOLEAN) {
       return booleanValue(toBoolean(value));
     }
   } catch (IllegalArgumentException | ClassCastException e) {
-    ...
+    throw new IllegalArgumentException(
+        "rest endpoint value for column ["
+            + column
+            + "] could not be coerced to "
+            + type
+            + ": ["
+            + value
+            + "]");
   }
   return stringValue(String.valueOf(value));
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential overflow issue when converting numeric values to INTEGER. Adding range checks before narrowing conversions prevents silent data loss and provides clearer error messages. However, this is a defensive improvement rather than a critical bug fix, as the endpoints are expected to return values within reasonable ranges.

Medium
Security
Validate decoded token line format

The decoder silently skips malformed lines (missing '=' or empty lines) which could
mask encoding corruption or injection attempts. Consider logging warnings or
throwing exceptions for unexpected line formats to detect potential security issues
or data corruption early.

core/src/main/java/org/opensearch/sql/utils/SystemIndexUtils.java [78-117]

 public static RestSpec decodeRestSpec(String indexName) {
   if (!isRestSource(indexName)) {
     throw new IllegalArgumentException("not a valid rest source token: " + indexName);
   }
   String body =
       indexName.substring(
           REST_SOURCE_PREFIX.length(), indexName.length() - REST_SOURCE_SUFFIX.length());
   String decoded = fromHex(body);
   ...
   for (String line : decoded.split("\n")) {
     if (line.isEmpty()) {
       continue;
     }
     int eq = line.indexOf('=');
     if (eq < 0) {
-      continue;
+      throw new IllegalArgumentException("malformed rest source token line (missing '='): " + line);
     }
     ...
   }
Suggestion importance[1-10]: 6

__

Why: The suggestion to throw an exception for malformed lines instead of silently skipping them improves security and debugging. However, the current behavior of skipping empty lines is intentional and reasonable. The suggestion should only apply to non-empty lines missing the = delimiter, making it a moderate improvement rather than a critical fix.

Low
General
Improve resource monitoring frequency

The resource health check is performed only every 1000 rows, which may allow
resource exhaustion to occur before detection. For REST endpoints that can return
large result sets, consider reducing the check interval or implementing adaptive
checking based on result size to ensure timely resource monitoring.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEnumerator.java [60-72]

 @Override
 public boolean moveNext() {
   boolean shouldCheck = (queryCount % NUMBER_OF_NEXT_CALL_TO_CHECK == 0);
   if (shouldCheck && !this.monitor.isHealthy()) {
     throw new NonFallbackCalciteException("insufficient resources to load next row, quit.");
   }
   if (iterator.hasNext()) {
     current = iterator.next();
     queryCount++;
+    // Additional check after large batches
+    if (queryCount > 0 && queryCount % 10000 == 0 && !this.monitor.isHealthy()) {
+      throw new NonFallbackCalciteException("insufficient resources to load next row, quit.");
+    }
     return true;
   } else {
     return false;
   }
 }
Suggestion importance[1-10]: 4

__

Why: While the suggestion to add more frequent resource checks has merit, the proposed implementation adds a redundant check at 10000 rows when the existing check already runs every 1000 rows. The concern about resource exhaustion is valid, but the solution is not optimal and the impact is relatively minor for typical REST endpoint result sets.

Low
Suggestions up to commit 8b0a147
CategorySuggestion                                                                                                                                    Impact
Security
Validate filter before retrieving sensitive data

The filter null-check happens after retrieving cluster state, which could expose
unredacted settings in logs or error paths if the filter is unexpectedly null. Move
the filter validation before the cluster state request to fail fast.

opensearch/src/main/java/org/opensearch/sql/opensearch/client/OpenSearchNodeClient.java [454-486]

 public List<Map<String, Object>> clusterSettings(Map<String, String> params) {
-  ...
   org.opensearch.common.settings.SettingsFilter filter =
       org.opensearch.sql.opensearch.storage.rest.RestSettingsFilterHolder.get();
   if (filter == null) {
     throw new IllegalStateException(
         "cluster settings redaction filter is not initialized; refusing to return unredacted"
             + " settings");
   }
+  org.opensearch.action.admin.cluster.state.ClusterStateResponse response =
+      client
+          .admin()
+          .cluster()
+          .state(new org.opensearch.action.admin.cluster.state.ClusterStateRequest())
+          .actionGet();
+  List<Map<String, Object>> rows = new java.util.ArrayList<>();
+  org.opensearch.common.settings.Settings persistent =
+      response.getState().metadata().persistentSettings();
+  org.opensearch.common.settings.Settings transientSettings =
+      response.getState().metadata().transientSettings();
   persistent = filter.filter(persistent);
   transientSettings = filter.filter(transientSettings);
   collectSettings(persistent, "persistent", rows);
   collectSettings(transientSettings, "transient", rows);
   return rows;
 }
Suggestion importance[1-10]: 7

__

Why: Moving the filter validation before retrieving cluster state is a good security practice that prevents potential exposure of unredacted settings in error paths or logs if the filter is unexpectedly null.

Medium
General
Explicit type handling prevents silent fallback

The coercion logic silently converts all non-matching types to STRING as a fallback.
This can mask type mismatches and lead to unexpected behavior. Consider throwing an
exception for unsupported type conversions to fail fast and provide clearer error
messages.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java [330-362]

 private static ExprValue coerce(String column, ExprType type, Object value) {
   if (value == null) {
     return ExprNullValue.of();
   }
   try {
     if (type == INTEGER) {
       return integerValue(toNumber(value).intValue());
     }
     if (type == LONG) {
       return longValue(toNumber(value).longValue());
     }
     if (type == DOUBLE) {
       return doubleValue(toNumber(value).doubleValue());
     }
     if (type == BOOLEAN) {
       return booleanValue(toBoolean(value));
     }
+    if (type == STRING) {
+      return stringValue(String.valueOf(value));
+    }
+    throw new IllegalArgumentException(
+        "rest endpoint value for column ["
+            + column
+            + "] has unsupported type "
+            + type);
   } catch (IllegalArgumentException | ClassCastException e) {
     throw new IllegalArgumentException(
         "rest endpoint value for column ["
             + column
             + "] could not be coerced to "
             + type
             + ": ["
             + value
             + "]");
   }
-  return stringValue(String.valueOf(value));
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the fallback to STRING is implicit. Making it explicit improves code clarity and maintainability, though the current behavior is intentional (coerce unknown types to string).

Low
Validate token structure during decoding

The decoder silently skips malformed lines (empty or missing '='). A corrupted token
could pass validation but produce incomplete specs. Consider throwing an exception
for malformed lines to detect token corruption early.

core/src/main/java/org/opensearch/sql/utils/SystemIndexUtils.java [93-100]

 public static RestSpec decodeRestSpec(String indexName) {
   if (!isRestSource(indexName)) {
     throw new IllegalArgumentException("not a valid rest source token: " + indexName);
   }
   String body =
       indexName.substring(
           REST_SOURCE_PREFIX.length(), indexName.length() - REST_SOURCE_SUFFIX.length());
   String decoded = fromHex(body);
   ...
   for (String line : decoded.split("\n")) {
     if (line.isEmpty()) {
       continue;
     }
     int eq = line.indexOf('=');
     if (eq < 0) {
-      continue;
+      throw new IllegalArgumentException("malformed rest source token line: " + line);
     }
     ...
   }
Suggestion importance[1-10]: 4

__

Why: The suggestion proposes throwing an exception for malformed lines, but the current code intentionally skips empty lines and lines without = as a defensive measure. The change would make the decoder more strict but could break valid edge cases.

Low
Optimize memory usage for count limits

The count limit is applied after fetching all rows from the endpoint, which could
cause memory issues for large responses. Consider applying the limit during row
collection if the endpoint supports pagination or streaming.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestRequest.java [34-41]

 @Override
 public List<ExprValue> search() {
   List<ExprValue> rows = endpoint.toRows(client, spec);
-  if (spec.getCount() != null && spec.getCount() >= 0 && rows.size() > spec.getCount()) {
-    return rows.subList(0, spec.getCount());
+  if (spec.getCount() != null && spec.getCount() >= 0) {
+    int limit = Math.min(spec.getCount(), rows.size());
+    return rows.subList(0, limit);
   }
   return rows;
 }
Suggestion importance[1-10]: 3

__

Why: The improved_code uses Math.min which is slightly cleaner, but doesn't address the core issue mentioned (fetching all rows before limiting). The suggestion's rationale about pagination is valid but the proposed code doesn't implement it.

Low
Suggestions up to commit eeb4e5b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate token format before decoding

The decodeRestSpec method does not validate that indexName actually ends with
REST_SOURCE_SUFFIX before calling substring, which can cause
StringIndexOutOfBoundsException if an invalid token is passed. Add a guard to verify
the suffix exists before decoding.

core/src/main/java/org/opensearch/sql/utils/SystemIndexUtils.java [78-111]

 public static RestSpec decodeRestSpec(String indexName) {
+  if (!isRestSource(indexName)) {
+    throw new IllegalArgumentException("not a valid rest source token: " + indexName);
+  }
   String body =
       indexName.substring(
           REST_SOURCE_PREFIX.length(), indexName.length() - REST_SOURCE_SUFFIX.length());
   String decoded = fromHex(body);
   ...
   if (endpoint == null) {
     throw new IllegalArgumentException("rest source token is missing the endpoint: " + indexName);
   }
   return new RestSpec(endpoint, args, count, timeout);
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that decodeRestSpec should validate the token format before attempting to decode it. Adding the isRestSource check prevents potential StringIndexOutOfBoundsException and provides a clearer error message. This is a good defensive programming practice that improves robustness and error handling.

Medium
Validate numeric range before casting

The coerce method does not handle numeric overflow when converting from Number to
intValue() or longValue(). If the source value exceeds the target type's range,
silent truncation or unexpected values may occur. Add explicit range validation
before casting to prevent data corruption.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java [330-362]

 private static ExprValue coerce(String column, ExprType type, Object value) {
   if (value == null) {
     return ExprNullValue.of();
   }
   try {
     if (type == INTEGER) {
-      return integerValue(toNumber(value).intValue());
+      Number n = toNumber(value);
+      long longVal = n.longValue();
+      if (longVal < Integer.MIN_VALUE || longVal > Integer.MAX_VALUE) {
+        throw new IllegalArgumentException("value out of INTEGER range");
+      }
+      return integerValue(n.intValue());
     }
     if (type == LONG) {
       return longValue(toNumber(value).longValue());
     }
     if (type == DOUBLE) {
       return doubleValue(toNumber(value).doubleValue());
     }
     if (type == BOOLEAN) {
       return booleanValue(toBoolean(value));
     }
   } catch (IllegalArgumentException | ClassCastException e) {
-    ...
+    throw new IllegalArgumentException(
+        "rest endpoint value for column ["
+            + column
+            + "] could not be coerced to "
+            + type
+            + ": ["
+            + value
+            + "]");
   }
   return stringValue(String.valueOf(value));
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential overflow issue when converting Number to intValue(). However, the existing code already catches IllegalArgumentException and ClassCastException, which would surface as a client error. The improvement adds explicit range validation for INTEGER type, which is a good defensive practice but not critical since the error handling is already in place.

Medium
Handle race condition in filter initialization

The clusterSettings method relies on a static holder for the SettingsFilter, which
introduces a race condition during initialization. If a query executes before
SQLPlugin#getRestHandlers publishes the filter, the method throws an exception.
Consider using dependency injection or a thread-safe lazy initialization pattern to
ensure the filter is always available.

opensearch/src/main/java/org/opensearch/sql/opensearch/client/OpenSearchNodeClient.java [453-486]

 @Override
 public List<Map<String, Object>> clusterSettings(Map<String, String> params) {
   ...
   org.opensearch.common.settings.SettingsFilter filter =
       org.opensearch.sql.opensearch.storage.rest.RestSettingsFilterHolder.get();
   if (filter == null) {
+    // Log the error for debugging and fail closed
     throw new IllegalStateException(
         "cluster settings redaction filter is not initialized; refusing to return unredacted"
-            + " settings");
+            + " settings. Ensure SQLPlugin#getRestHandlers has been called.");
   }
   persistent = filter.filter(persistent);
   transientSettings = filter.filter(transientSettings);
   ...
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify initialization timing, but the code already has a fail-closed guard (if (filter == null) throw) with a clear error message. The comment in the code explicitly states this guards against "unexpected uninitialized state" and that "the filter is published at startup before any query runs." The suggested improvement only adds a log message, which provides minimal value.

Low
General
Optimize count limit application

The search method applies the count limit after fetching all rows from the endpoint,
which is inefficient for large result sets. If the endpoint supports server-side
pagination or limiting, pass the count parameter to the client call to reduce memory
usage and network overhead.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestRequest.java [34-41]

 @Override
 public List<ExprValue> search() {
+  // If the endpoint supports server-side limiting, pass spec.getCount() to the client call
+  // to avoid fetching unnecessary rows. For now, apply client-side truncation.
   List<ExprValue> rows = endpoint.toRows(client, spec);
   if (spec.getCount() != null && spec.getCount() >= 0 && rows.size() > spec.getCount()) {
     return rows.subList(0, spec.getCount());
   }
   return rows;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that count is applied client-side after fetching all rows, which could be inefficient for large result sets. However, the improvement only adds a comment acknowledging this limitation without implementing server-side limiting. The suggestion is valid but the improved code doesn't actually implement the optimization, making it more of a documentation improvement than a functional enhancement.

Low
Suggestions up to commit a7a8a20
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate endpoint presence after decoding

The decodeRestSpec method does not validate that the decoded endpoint is non-null
before returning. If the encoded spec is malformed or missing the endpoint line,
this will return a RestSpec with a null endpoint, which could cause issues
downstream.

core/src/main/java/org/opensearch/sql/utils/SystemIndexUtils.java [78-108]

 public static RestSpec decodeRestSpec(String indexName) {
   String body =
       indexName.substring(
           REST_SOURCE_PREFIX.length(), indexName.length() - REST_SOURCE_SUFFIX.length());
   String decoded = fromHex(body);
   String endpoint = null;
   Integer count = null;
   String timeout = null;
   LinkedHashMap<String, String> args = new LinkedHashMap<>();
   for (String line : decoded.split("\n")) {
     if (line.isEmpty()) {
       continue;
     }
     int eq = line.indexOf('=');
     if (eq < 0) {
       continue;
     }
     String k = line.substring(0, eq);
     String v = line.substring(eq + 1);
     if (k.equals("endpoint")) {
       endpoint = v;
     } else if (k.equals("count")) {
       count = Integer.parseInt(v);
     } else if (k.equals("timeout")) {
       timeout = v;
     } else if (k.startsWith("arg.")) {
       args.put(k.substring("arg.".length()), v);
     }
   }
+  if (endpoint == null) {
+    throw new IllegalArgumentException("Decoded rest spec missing endpoint: " + indexName);
+  }
   return new RestSpec(endpoint, args, count, timeout);
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion identifies a potential issue where decodeRestSpec could return a RestSpec with a null endpoint if the encoded data is malformed. Adding validation to throw an exception when endpoint is null prevents downstream errors and improves robustness.

Medium
General
Handle unsupported types explicitly

The coercion logic does not handle all declared ExprType values. If a column has a
type other than INTEGER, LONG, DOUBLE, or BOOLEAN, the method falls through to
stringValue(String.valueOf(value)) without validation. This could mask type
mismatches and produce incorrect results for unsupported types.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestEndpointRegistry.java [330-362]

 private static ExprValue coerce(String column, ExprType type, Object value) {
   if (value == null) {
     return ExprNullValue.of();
   }
   try {
     if (type == INTEGER) {
       return integerValue(toNumber(value).intValue());
     }
     if (type == LONG) {
       return longValue(toNumber(value).longValue());
     }
     if (type == DOUBLE) {
       return doubleValue(toNumber(value).doubleValue());
     }
     if (type == BOOLEAN) {
       return booleanValue(toBoolean(value));
     }
+    if (type == STRING) {
+      return stringValue(String.valueOf(value));
+    }
+    throw new IllegalArgumentException(
+        "rest endpoint column [" + column + "] has unsupported type: " + type);
   } catch (IllegalArgumentException | ClassCastException e) {
     throw new IllegalArgumentException(
         "rest endpoint value for column ["
             + column
             + "] could not be coerced to "
             + type
             + ": ["
             + value
             + "]");
   }
-  return stringValue(String.valueOf(value));
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the coerce method falls through to stringValue for any type not explicitly handled. Adding explicit validation for STRING type and throwing an exception for unsupported types improves type safety and makes the code more maintainable.

Medium
Remove redundant count validation check

The count parameter truncation logic checks spec.getCount() >= 0 after already
validating non-negativity in RestEndpointRegistry.validate(). The redundant check is
unnecessary and the truncation should handle the case where count is zero correctly
by returning an empty list.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/rest/RestRequest.java [34-41]

 @Override
 public List<ExprValue> search() {
   List<ExprValue> rows = endpoint.toRows(client, spec);
-  if (spec.getCount() != null && spec.getCount() >= 0 && rows.size() > spec.getCount()) {
+  if (spec.getCount() != null && rows.size() > spec.getCount()) {
     return rows.subList(0, spec.getCount());
   }
   return rows;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly notes that spec.getCount() >= 0 is redundant since validation already occurs in RestEndpointRegistry.validate(). However, the check rows.size() > spec.getCount() already handles the zero case correctly (returns empty list), so the improvement is minor.

Low

@noCharger noCharger added the v3.8.0 Issues and PRs related to version v3.8.0 label Jun 30, 2026
@noCharger noCharger moved this from Todo to In progress in PPL 2026 Roadmap Jun 30, 2026
@noCharger noCharger added feature calcite calcite migration releated labels Jun 30, 2026
…xes; comment cleanup

- /_cluster/settings: run the persistent and transient tiers through the node SettingsFilter
  (published via RestSettingsFilterHolder from SQLPlugin#getRestHandlers) so Property.Filtered
  and plugin-registered pattern settings are redacted exactly as the native
  GET /_cluster/settings endpoint. Remove the dead secretFields column-filter, which was the
  wrong shape for the (setting, value, tier) rows.
- Parser: add TIMEOUT to searchableKeyWord so a bare 'timeout' term still matches searchLiteral.
- coerce(): narrow the catch to IllegalArgumentException | ClassCastException; add empty-string
  guards in toNumber/toBoolean.
- spotlessApply formatting; drop outdated and redundant comments.

Tests: RestEndpointRegistryTest, RestSourceTableTest, OpenSearchNodeClientClusterSettingsFilterTest green.

Signed-off-by: Louis Chu <clingzhi@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a7a8a20

- clusterSettings: fail closed (throw IllegalStateException) when the node
  SettingsFilter is unavailable, instead of returning unredacted settings
- collectSettings: handle list-type settings via getAsList fallback
- decodeRestSpec: reject a blank/missing endpoint with a clear error
- docs: correct rest.md allow-list table (9 endpoints + accepted args),
  quote endpoint literals, fix timeout/get-arg descriptions, add security note
- register docs/user/ppl/cmd/rest.md in docs/category.json (deterministic
  single-node examples: number_of_nodes=1, cluster_manager count=1)

Signed-off-by: Louis Chu <clingzhi@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit eeb4e5b

decodeRestSpec is only ever called behind an isRestSource gate today, but as
a public decoder it must not assume its precondition. Without the guard a
malformed token would surface an opaque StringIndexOutOfBoundsException from
substring; now it throws a clear IllegalArgumentException instead. Addresses
the PR opensearch-project#5599 Code Suggestions finding (importance 8).

Signed-off-by: Louis Chu <clingzhi@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8b0a147

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e4c63e0

…fetch

- CalciteExplainIT.explainRestCommand: single-quote the endpoint literal; the
  explain harness inlines the query into a JSON body without escaping, so a
  double-quoted literal produced an invalid payload and a 400 (integration CI
  failure).
- OpenSearchNodeClient.clusterSettings: resolve the SettingsFilter and fail
  closed BEFORE fetching cluster state, so settings are never read into memory
  when the redaction filter is unavailable.
- OpenSearchRestClient.clusterState: narrow filter_path to nodes.*.name so node
  IPs/attributes are not over-fetched; manager-name resolution is preserved.

Signed-off-by: Louis Chu <clingzhi@amazon.com>
@noCharger noCharger force-pushed the feature/ppl-rest-command branch from e4c63e0 to 343e93a Compare July 1, 2026 06:06
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 343e93a

… decode

- AnalyticsEngineCompatIT: assert | rest '/_cluster/health' behaves identically
  with the analytics engine enabled (rest is never routed to DataFusion).
- SystemIndexUtils.fromHex: reject an odd-length hex body so a crafted source name
  that passes the isRestSource suffix check fails clearly rather than silently
  dropping the trailing half-byte.

Signed-off-by: Louis Chu <clingzhi@amazon.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 42f77a5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated feature v3.8.0 Issues and PRs related to version v3.8.0

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

[RFC] PPL rest command

1 participant