Skip to content

Detect long (BIGINT) arithmetic overflow instead of silently wrapping (#5164)#5604

Open
ahkcs wants to merge 2 commits into
opensearch-project:mainfrom
ahkcs:fix/5164-checked-arithmetic
Open

Detect long (BIGINT) arithmetic overflow instead of silently wrapping (#5164)#5604
ahkcs wants to merge 2 commits into
opensearch-project:mainfrom
ahkcs:fix/5164-checked-arithmetic

Conversation

@ahkcs

@ahkcs ahkcs commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Description

long (BIGINT) arithmetic (+, -, *) in PPL/SQL eval/SELECT expressions silently wrapped on overflow in the Calcite engine — e.g. long_field * 9999999999 at the top of the 64-bit range returned a negative value with HTTP 200. SqlStdOperatorTable.PLUS/MINUS/MULTIPLY generate plain Java +/-/* in the Enumerable code path, which wrap.

This rewrites long +/-/* to their overflow-checked variants (CHECKED_PLUS / CHECKED_MINUS / CHECKED_MULTIPLY), which generate Math.addExact / subtractExact / multiplyExact and throw ArithmeticException on overflow. The exception is caught in QueryService and surfaced as a 4xx client error (NonFallbackCalciteException) instead of wrapping or falling back to the V2 engine. (The V2 engine already uses Math.*Exact.)

Scope: BIGINT-only

The checked rewrite is gated to long (BIGINT) operands. Narrower integer arithmetic (byte/short/int) is widened to a type that cannot overflow before this rewrite runs — PPLFuncImpTable promotes byte/shortint and any int/long operand → long for +/-/* (#5603) — so long, which has no wider integer type to widen into, is the only remaining overflow case on the Calcite engine. float/double/decimal follow IEEE 754 / decimal semantics and have no CHECKED_* runtime, so they are left untouched.

The rewrite runs after the analytics-engine fork in convertToCalcitePlan, so only the Calcite/Enumerable path is affected. The DataFusion analytics backend handles its own long overflow (checked arrow kernels, opensearch-project/OpenSearch#22378) — the two engines now converge on erroring for long overflow.

It runs before pushdown so both coordinator-executed and pushed-down (script) arithmetic are checked. PPLAggregateConvertRule, OpenSearchRelOptUtil, and ExtendedRelJson recognize the CHECKED_* kinds so aggregate-pushdown, sort-expression pushdown, and script serialization keep working for pushed-down checked arithmetic.

Stacked on #5603

This PR is stacked on #5603 (operand widening) and depends on it: without the widening underneath, short/int overflow would slip through un-checked on the Calcite path. The first commit here is #5603's widening; it drops out of this diff once #5603 merges. Together the two PRs close integer-arithmetic overflow on the Calcite engine (#5603 widens the narrow tiers; this errors the long tier), matching the analytics-engine behavior.

Testing

test behavior
issues/5164.yml (REST) int/short overflow → widened correct value (200); long overflow (+/-/*) → 4xx; normal arithmetic unaffected
CalciteExplainIT (pushdown) 258/258 pass — explain fixtures regenerated to show CHECKED_* in pushed scripts
CalciteNoPushdownIT (CalciteExplainIT) 258/258 pass — pushdown-only arithmetic tests skipped as expected

Related Issues

Resolves #5164. Stacked on #5603. Analytics-engine counterpart: opensearch-project/OpenSearch#22378.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff or -s.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 447b958)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The coerceNumericComparisonOperands method widens integer operands when one side is a scalar subquery, but it does not recursively check nested subqueries within RexCall nodes. If a subquery is nested inside a function call (e.g., CAST(subquery) or +(subquery, 1)), containsSubQuery will find it, but the widening logic only casts the top-level operands. This could leave the nested subquery's result unwidened, potentially causing the same join-key type mismatch the method is designed to prevent.

private static RexNode[] coerceNumericComparisonOperands(
    RexBuilder builder, BuiltinFunctionName functionName, RexNode... args) {
  if (args.length != 2 || !BuiltinFunctionName.COMPARATORS.contains(functionName)) {
    return args;
  }
  if (!containsSubQuery(args[0]) && !containsSubQuery(args[1])) {
    return args;
  }
  RelDataType leftType = args[0].getType();
  RelDataType rightType = args[1].getType();
  if (!SqlTypeName.INT_TYPES.contains(leftType.getSqlTypeName())
      || !SqlTypeName.INT_TYPES.contains(rightType.getSqlTypeName())
      || leftType.getSqlTypeName() == rightType.getSqlTypeName()) {
    return args;
  }
  RelDataType commonType =
      builder.getTypeFactory().leastRestrictive(List.of(leftType, rightType));
  if (commonType == null) {
    return args;
  }
  return new RexNode[] {
    builder.makeCast(
        TYPE_FACTORY.createTypeWithNullability(commonType, leftType.isNullable()), args[0]),
    builder.makeCast(
        TYPE_FACTORY.createTypeWithNullability(commonType, rightType.isNullable()), args[1])
  };
}
Possible Issue

The findArithmeticOverflow method walks the cause chain to detect ArithmeticException, but it does not guard against infinite loops if a cause chain is circular (e.g., cause.getCause() == cause after some indirection). While the condition cause != cause.getCause() prevents immediate self-loops, a cycle like A -> B -> A would loop indefinitely. This is unlikely in practice but could hang the error-handling path if a malformed exception chain is encountered.

private static ArithmeticException findArithmeticOverflow(@Nullable Throwable t) {
  for (Throwable cause = t;
      cause != null && cause != cause.getCause();
      cause = cause.getCause()) {
    if (cause instanceof ArithmeticException arithmeticException) {
      return arithmeticException;
    }
  }
  return null;
}
Possible Issue

The coerce method converts numeric values to the target element type, but it does not handle overflow or precision loss. For example, coercing a Long value that exceeds Integer.MAX_VALUE to INTEGER via num.intValue() will silently wrap (e.g., 2147483648L becomes -2147483648). This contradicts the PR's goal of detecting overflow and could produce incorrect array elements when widened operands are flattened into a narrower array type.

private static Object coerce(Object value, SqlTypeName elementType) {
  if (elementType == null || !(value instanceof Number)) {
    return value;
  }
  Number num = (Number) value;
  return switch (elementType) {
    case TINYINT -> num.byteValue();
    case SMALLINT -> num.shortValue();
    case INTEGER -> num.intValue();
    case BIGINT -> num.longValue();
    case FLOAT, REAL -> num.floatValue();
    case DOUBLE -> num.doubleValue();
    case DECIMAL -> num instanceof BigDecimal ? num : BigDecimal.valueOf(num.doubleValue());
    default -> value;
  };
}

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 447b958

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Prevent infinite loop in exception chain

The loop condition cause != cause.getCause() may not detect all circular references
if the cause chain forms a cycle at a deeper level. Add a visited set or depth limit
to prevent infinite loops in malformed exception chains.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [477-486]

 private static ArithmeticException findArithmeticOverflow(@Nullable Throwable t) {
-  for (Throwable cause = t;
-      cause != null && cause != cause.getCause();
-      cause = cause.getCause()) {
+  Set<Throwable> visited = new HashSet<>();
+  for (Throwable cause = t; cause != null; cause = cause.getCause()) {
+    if (!visited.add(cause)) {
+      break;
+    }
     if (cause instanceof ArithmeticException arithmeticException) {
       return arithmeticException;
     }
   }
   return null;
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion addresses a valid edge case where a circular exception chain could cause an infinite loop. The existing cause != cause.getCause() check only catches immediate self-references, not deeper cycles. Using a visited set is a robust solution to prevent this issue.

Low
Validate numeric coercion bounds

Narrowing conversions (e.g., longValue() to byteValue()) may silently truncate
values outside the target type's range. Validate that the numeric value fits within
the target type's bounds before coercing to prevent data loss.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVAppendCore.java [60-75]

 private static Object coerce(Object value, SqlTypeName elementType) {
   if (elementType == null || !(value instanceof Number)) {
     return value;
   }
   Number num = (Number) value;
   return switch (elementType) {
-    case TINYINT -> num.byteValue();
-    case SMALLINT -> num.shortValue();
-    case INTEGER -> num.intValue();
-    case BIGINT -> num.longValue();
+    case TINYINT -> {
+      long val = num.longValue();
+      if (val < Byte.MIN_VALUE || val > Byte.MAX_VALUE) {
+        throw new ArithmeticException("Value out of range for TINYINT: " + val);
+      }
+      yield num.byteValue();
+    }
+    case SMALLINT -> {
+      long val = num.longValue();
+      if (val < Short.MIN_VALUE || val > Short.MAX_VALUE) {
+        throw new ArithmeticException("Value out of range for SMALLINT: " + val);
+      }
+      yield num.shortValue();
+    }
     ...
   };
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion identifies a potential data loss issue during narrowing conversions. However, the coerce method is designed to align boxed numeric types for array elements after widening has already occurred upstream (in PPLFuncImpTable), so out-of-range values should not reach this point. Adding bounds checks would be defensive but may be unnecessary given the context.

Low
Possible issue
Prevent stack overflow in recursion

The recursive lambda parameter check may cause a StackOverflowError on deeply nested
expressions. Consider adding a depth limit or using an iterative approach with an
explicit stack to prevent unbounded recursion.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [847-855]

 private static boolean referencesLambdaParameter(RexNode node) {
+  return referencesLambdaParameterWithDepth(node, 0, 100);
+}
+
+private static boolean referencesLambdaParameterWithDepth(RexNode node, int depth, int maxDepth) {
+  if (depth > maxDepth) {
+    return false;
+  }
   if (node instanceof RexLambdaRef) {
     return true;
   }
   if (node instanceof RexCall call) {
-    return call.getOperands().stream().anyMatch(AbstractBuilder::referencesLambdaParameter);
+    return call.getOperands().stream().anyMatch(op -> referencesLambdaParameterWithDepth(op, depth + 1, maxDepth));
   }
   return false;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies a potential stack overflow risk in the recursive referencesLambdaParameter method. However, the risk is low in practice since SQL expression trees are typically shallow. The proposed depth limit is a reasonable safeguard, though the hardcoded 100 could be configurable.

Low

Previous suggestions

Suggestions up to commit 2d6e9a1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent infinite loop in cause chain

The loop condition cause != cause.getCause() is insufficient to prevent infinite
loops if the cause chain contains a cycle where a cause references itself
indirectly. Add a visited set or depth limit to guard against circular cause chains
that could lead to infinite loops.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [451-460]

 private static ArithmeticException findArithmeticOverflow(@Nullable Throwable t) {
-  for (Throwable cause = t;
-      cause != null && cause != cause.getCause();
-      cause = cause.getCause()) {
+  Set<Throwable> visited = new HashSet<>();
+  for (Throwable cause = t; cause != null; cause = cause.getCause()) {
+    if (!visited.add(cause)) {
+      break;
+    }
     if (cause instanceof ArithmeticException arithmeticException) {
       return arithmeticException;
     }
   }
   return null;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential infinite loop risk if the cause chain contains indirect cycles. While the condition cause != cause.getCause() prevents direct self-reference, it doesn't guard against indirect cycles (A→B→A). Adding a visited set is a valid defensive improvement, though such cycles are rare in practice.

Medium
Suggestions up to commit 2d2ef9f
CategorySuggestion                                                                                                                                    Impact
General
Prevent infinite loop on circular exceptions

The loop condition cause != cause.getCause() prevents infinite loops but doesn't
guard against circular cause chains where A->B->A. This could cause an infinite loop
if the exception chain forms a cycle that doesn't include self-reference.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [452-461]

 private static ArithmeticException findArithmeticOverflow(@Nullable Throwable t) {
-  for (Throwable cause = t;
-      cause != null && cause != cause.getCause();
-      cause = cause.getCause()) {
+  Set<Throwable> seen = new HashSet<>();
+  for (Throwable cause = t; cause != null; cause = cause.getCause()) {
+    if (!seen.add(cause)) {
+      break; // Circular reference detected
+    }
     if (cause instanceof ArithmeticException arithmeticException) {
       return arithmeticException;
     }
   }
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion addresses a theoretical edge case of circular exception chains. While the current cause != cause.getCause() check handles self-references, circular chains (A->B->A) are rare in practice. Using a HashSet adds overhead but improves robustness for this uncommon scenario.

Low
Add null check for exception message

The overflow detection logic should be placed before the general fallback check to
ensure arithmetic overflow errors are always surfaced as client errors. However, the
current placement already achieves this. Consider adding a null check for
overflow.getMessage() to prevent potential NPE if the message is null.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [186-194]

 ArithmeticException overflow = findArithmeticOverflow(t);
 if (overflow != null) {
   // Checked arithmetic detected integer/long overflow. Surface as a client error
   // instead of wrapping (silently) or falling back to the V2 engine.
+  String message = overflow.getMessage() != null ? overflow.getMessage() : "integer overflow";
   propagateCalciteError(
       new NonFallbackCalciteException(
-          "Arithmetic overflow: " + overflow.getMessage(), overflow),
+          "Arithmetic overflow: " + message, overflow),
       listener);
 } else if (isCalciteFallbackAllowed(t) && !(t instanceof NonFallbackCalciteException)) {
Suggestion importance[1-10]: 3

__

Why: While adding a null check for overflow.getMessage() is a defensive practice, ArithmeticException.getMessage() typically returns a non-null message. This is a minor improvement that enhances robustness but has low impact.

Low
Validate operands list is non-empty

The method checks if all operands are exact integer types but doesn't verify that
the call has operands. If getOperands() returns an empty list, the stream will
vacuously return true, potentially causing incorrect behavior for operators without
operands.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [433-438]

 private static boolean isCheckableIntegerArithmetic(RexCall call) {
-  if (!isExactIntegerType(call.getType())) {
+  if (!isExactIntegerType(call.getType()) || call.getOperands().isEmpty()) {
     return false;
   }
   return call.getOperands().stream().allMatch(op -> isExactIntegerType(op.getType()));
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion to check for empty operands is overly cautious. Arithmetic operators (+, -, *) always have operands by definition, so this check is unnecessary in this context and adds minimal value.

Low
Suggestions up to commit 7e0a74d
CategorySuggestion                                                                                                                                    Impact
General
Add null-safety for exception message

The overflow detection logic should be placed before the general fallback check to
ensure overflow errors are always caught. However, the current placement is correct.
Consider adding a null-safety check for overflow.getMessage() to prevent potential
NPE if the exception message is null.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [186-194]

 ArithmeticException overflow = findArithmeticOverflow(t);
 if (overflow != null) {
   // Checked arithmetic detected integer/long overflow. Surface as a client error
   // instead of wrapping (silently) or falling back to the V2 engine.
+  String message = overflow.getMessage() != null ? overflow.getMessage() : "integer overflow";
   propagateCalciteError(
       new NonFallbackCalciteException(
-          "Arithmetic overflow: " + overflow.getMessage(), overflow),
+          "Arithmetic overflow: " + message, overflow),
       listener);
 } else if (isCalciteFallbackAllowed(t) && !(t instanceof NonFallbackCalciteException)) {
Suggestion importance[1-10]: 4

__

Why: While adding null-safety for overflow.getMessage() is a defensive programming practice, ArithmeticException typically provides a message, making this edge case unlikely. The suggestion is valid but addresses a minor robustness concern rather than a critical issue.

Low
Validate operands list is non-empty

The method checks if all operands are exact integer types, but doesn't verify that
the call has operands. If getOperands() returns an empty list, allMatch will return
true, potentially causing incorrect behavior for operators without operands.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [433-438]

 private static boolean isCheckableIntegerArithmetic(RexCall call) {
   if (!isExactIntegerType(call.getType())) {
     return false;
   }
-  return call.getOperands().stream().allMatch(op -> isExactIntegerType(op.getType()));
+  List<RexNode> operands = call.getOperands();
+  return !operands.isEmpty() && operands.stream().allMatch(op -> isExactIntegerType(op.getType()));
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies that allMatch returns true for empty streams. However, arithmetic operators (+, -, *) always have operands in Calcite's RexCall, making this scenario unlikely in practice. The check adds defensive validation but addresses a theoretical edge case with low practical impact.

Low

@ahkcs ahkcs added the enhancement New feature or request label Jul 1, 2026
@ahkcs ahkcs force-pushed the fix/5164-checked-arithmetic branch from 7e0a74d to 2d2ef9f Compare July 1, 2026 22:37
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2d2ef9f

@ahkcs ahkcs force-pushed the fix/5164-checked-arithmetic branch from 2d2ef9f to 2d6e9a1 Compare July 2, 2026 20:53
@ahkcs ahkcs changed the title Detect integer/long arithmetic overflow instead of silently wrapping (#5164) Detect long (BIGINT) arithmetic overflow instead of silently wrapping (#5164) Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2d6e9a1

@ahkcs ahkcs force-pushed the fix/5164-checked-arithmetic branch from 2d6e9a1 to fc2e0f2 Compare July 2, 2026 22:04
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit fc2e0f2

ahkcs added 2 commits July 2, 2026 15:14
PPL integer arithmetic (`+`, `-`, `*` and the named add/subtract/multiply
functions) inferred `SMALLINT op SMALLINT -> SMALLINT` (and likewise for
TINYINT) because the operators were registered with Calcite's stock
`SqlStdOperatorTable.PLUS/MINUS/MULTIPLY`, whose return-type inference
(`ReturnTypes.NULLABLE_SUM` / `PRODUCT_NULLABLE`) falls through to
`LEAST_RESTRICTIVE` for non-decimal integers.

As a result the product/sum of two narrow-integer columns overflowed the
inferred type on every backend, just differently:
  - DataFusion / analytics-engine: silently wraps the i16 result, so e.g.
    `eval area = ResolutionWidth * ResolutionHeight | where area > 2000000`
    returned 0 rows instead of the matching row.
  - Calcite Enumerable engine: throws `ArithmeticException: value out of
    range`.
  - v2 legacy engine: `ExprShortValue` narrows via `shortValue()`, wrapping.

Fix in the SQL-plugin lowering so all backends are corrected at once: widen
the operands (byte/short -> INTEGER, any int/long -> BIGINT) before applying
the operator. Casting the operands rather than only relabelling the result
type is required, otherwise DataFusion still computes the narrow multiply and
wraps before any outer cast. Non-integral operands (float/double/decimal/
datetime/mixed) are left untouched and defer to Calcite's default inference.
The string-concat `ADD` variant and the DATETIME-DATETIME `SUBTRACT` variant
are unchanged.

mvindex's internal array-index arithmetic now uses the raw Calcite
PLUS/MINUS operators so array indices stay INTEGER for ITEM/ARRAY_SLICE
codegen (the widened result would otherwise be rejected as a long index).

Note: this changes user-visible result column types for integer arithmetic
(int-operand expressions now report bigint), which is the intended trade-off
for overflow-safe, backend-consistent results.

Adds CalcitePPLBuiltinFunctionIT coverage for the short->int and int->bigint
widening tiers and updates the affected logical-plan / Spark-SQL snapshots.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
PPL/SQL long arithmetic (+, -, *) in eval/SELECT expressions silently
wrapped on overflow in the Calcite engine (e.g. long_field * 999...9 wrapped
to a negative value with HTTP 200). SqlStdOperatorTable.PLUS/MINUS/MULTIPLY
generate plain Java +/-/* in the Enumerable code path, which wrap.

Rewrite long +/-/* to their overflow-checked variants (CHECKED_PLUS /
CHECKED_MINUS / CHECKED_MULTIPLY), which generate Math.addExact etc. and throw
ArithmeticException on overflow. The exception is caught in QueryService and
surfaced as a 4xx client error instead of wrapping or falling back to V2.

Scoped to BIGINT operands only: narrower integer arithmetic (byte/short/int)
is widened to a type that cannot overflow before this rewrite runs (PPLFuncImpTable
promotes byte/short to INT and any int/long to BIGINT for +/-/*), so long — which
has no wider integer type — is the sole remaining overflow case on the Calcite
engine. Float/double/decimal follow IEEE 754 / decimal semantics and have no
CHECKED_* runtime, so they are left untouched. The rewrite runs after the
analytics-engine fork, so only the Calcite path is affected (the DataFusion
backend handles its own overflow).

The rewrite runs before pushdown so both coordinator-executed and pushed-down
(script) arithmetic are checked; PPLAggregateConvertRule, OpenSearchRelOptUtil,
and ExtendedRelJson recognize the CHECKED_* kinds so aggregate/sort pushdown and
script serialization keep working. Adds a REST yaml test (issues/5164.yml) and
updates the affected explain fixtures.

Resolves opensearch-project#5164

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix/5164-checked-arithmetic branch from fc2e0f2 to 447b958 Compare July 2, 2026 22:19
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 447b958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Silent integer and long overflow wrapping in Calcite arithmetic path

1 participant