Widen narrow integer operands in PPL +/-/* to prevent overflow by ahkcs · Pull Request #5603 · opensearch-project/sql

ahkcs · 2026-07-01T16:51:39Z

Description

Integer arithmetic (+, -, *) on narrow integer types silently overflowed because the operators kept Calcite's LEAST_RESTRICTIVE result type (SMALLINT * SMALLINT -> SMALLINT, INTEGER * INTEGER -> INTEGER). For example eval area = ResolutionWidth * ResolutionHeight (both short) computed 28,777,836 but wrapped into the 16-bit range, so where area > 2000000 returned 0 rows. The reported workaround was cast(... as long).

This widens narrow integer operands before the operation so the result cannot overflow the (mis-inferred) narrow type:

operands	result
`byte`/`short`	`int`
any `int`/`long` involved	`long` (`bigint`)
any `float`/`double`/`decimal`/datetime	unchanged (defers to Calcite)

The widening is a CAST injected on the operands in the SQL-plugin lowering (PPLFuncImpTable), applied to +, -, * and the named add/subtract/multiply functions. The string-concat + and the DATETIME - DATETIME variant are unchanged. Arithmetic inside higher-order-function lambda bodies (reduce/mvmap) is left unwidened — those operands are lambda-parameter placeholders resolved by the HOF's own type inference, and they execute JVM-side where operands already promote to int.

Why widening (and not error-on-overflow)

An alternative is to detect overflow and error (Calcite's CHECKED_PLUS/CHECKED_MINUS/CHECKED_MULTIPLY → Math.addExact, explored in #5604). Widening was chosen because it is the only approach that fixes both execution engines from a single change:

The CAST is baked into the RelNode before the engine fork. PPL runs on two paths — the Calcite/Enumerable engine and the analytics-engine (DataFusion) backend. Because widening happens in PPLFuncImpTable (shared lowering, upstream of the fork), the widened operand types flow through to both — including through Substrait to DataFusion. Verified live: short * short (30000×30000) returns int 900000000 on the Calcite path and on the DataFusion route (profile confirms *(CAST($n):INTEGER, ...) with backends=[datafusion]); the reported | where area > 2000000 now returns the matching row on both.
Error-on-overflow cannot be done consistently across engines. The checked operators are a Calcite-codegen mechanism (Math.addExact in generated Java). CHECKED_* has no Substrait representation, so on the analytics-engine route the arithmetic executes natively in DataFusion and the checked rewrite never reaches it — leaving the overflow unfixed there. Making DataFusion error would require separate backend work (a custom Rust checked-arithmetic UDF). So erroring would need two independent implementations (Calcite checked-arith + a DataFusion UDF) and still diverge; widening needs one and is consistent.

Widening fully resolves the reported short/int overflow on both engines. It does not address long * long overflow (no wider primitive type) — that residual is out of scope here and, if needed, is a separate error-on-overflow effort.

Verification

query	before	after (both engines)
`eval area = short * short` (30000²)	wraps (`smallint`, negative)	`int` `900000000`
`eval area = short * short \| where area > 2000000`	0 rows	1 row
`eval x = int * int` (2e9²)	wraps	`bigint`, correct

test	result
`CalcitePPLBuiltinFunctionIT` widening ITs (new)	pass
`CalcitePPLEvalTest`/`ArrayFunctionTest`/`DedupTest`/etc. + `UnifiedQueryPlannerSqlV2Test` (plan snapshots)	pass (updated for widening CAST)
`CalciteExplainIT` / `CalciteNoPushdownIT` (explain fixtures)	pass (regenerated)
`collection.md` doctest (reduce/mvmap unaffected)	pass

Related Issues

Resolves the "eval integer arithmetic overflow (no type widening)" bug. See #5164 / #5604 for the alternative error-on-overflow approach (Calcite-path only).

Check List

New functionality includes testing.
New functionality has been documented.
Commits are signed per the DCO using --signoff or -s.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions · 2026-07-01T16:52:59Z

PR Reviewer Guide 🔍

(Review updated until commit `a2e7706`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue The `coerceNumericComparisonOperands` method casts both sides of a comparison to their least-restrictive common integer type when one side is a scalar subquery. However, the method does not verify that the common type returned by `leastRestrictive` is actually an integer type. If `leastRestrictive` returns a non-integer type (e.g., DOUBLE when mixing INT and DOUBLE), the cast proceeds anyway, potentially causing incorrect comparisons or runtime errors. This can occur when a subquery returns a floating-point result compared against an integer column. private static RexNode[] coerceNumericComparisonOperands( RexBuilder builder, BuiltinFunctionName functionName, RexNode... args) { if (args.length != 2 \|\| !BuiltinFunctionName.COMPARATORS.contains(functionName)) { return args; } if (!containsSubQuery(args[0]) && !containsSubQuery(args[1])) { return args; } RelDataType leftType = args[0].getType(); RelDataType rightType = args[1].getType(); if (!SqlTypeName.INT_TYPES.contains(leftType.getSqlTypeName()) \|\| !SqlTypeName.INT_TYPES.contains(rightType.getSqlTypeName()) \|\| leftType.getSqlTypeName() == rightType.getSqlTypeName()) { return args; } RelDataType commonType = builder.getTypeFactory().leastRestrictive(List.of(leftType, rightType)); if (commonType == null) { return args; } return new RexNode[] { builder.makeCast( TYPE_FACTORY.createTypeWithNullability(commonType, leftType.isNullable()), args[0]), builder.makeCast( TYPE_FACTORY.createTypeWithNullability(commonType, rightType.isNullable()), args[1]) }; } Possible Issue The `coerce` method performs numeric conversions without checking for overflow or precision loss. For example, converting a BIGINT value to TINYINT via `num.byteValue()` silently truncates the value if it exceeds the byte range (-128 to 127). This can produce incorrect array elements when widened integer operands (e.g., from a BIGINT arithmetic result) are coerced down to a narrower target element type. The issue arises when `mvappend` is called with mixed-width integer operands and the inferred element type is narrower than the actual operand values. private static Object coerce(Object value, SqlTypeName elementType) { if (elementType == null \|\| !(value instanceof Number)) { return value; } Number num = (Number) value; return switch (elementType) { case TINYINT -> num.byteValue(); case SMALLINT -> num.shortValue(); case INTEGER -> num.intValue(); case BIGINT -> num.longValue(); case FLOAT, REAL -> num.floatValue(); case DOUBLE -> num.doubleValue(); case DECIMAL -> num instanceof BigDecimal ? num : BigDecimal.valueOf(num.doubleValue()); default -> value; }; }

RyanL1997

Hi @ahkcs , thanks for the change, and here are somer reviews.

RyanL1997 · 2026-07-01T21:04:59Z

+      // Register ADDFUNCTION for numeric addition only. Widen narrow integer operands so the sum
+      // cannot overflow the (mis-)inferred SMALLINT/TINYINT result type; see
+      // registerWideningIntegerOperator.
+      registerWideningIntegerOperator(ADDFUNCTION, SqlStdOperatorTable.PLUS);


Does this means the behavior-change note in the PR description applies to SQL too, not just PPL eval?

RyanL1997 · 2026-07-01T21:04:59Z

    RelNode root = getRelNode(ppl);
    String expectedLogical =
-        "LogicalProject(EMPNO=[$0], total=[+(1, +(2, 3))])\n"
+        "LogicalProject(EMPNO=[$0], total=[+(1, +(2:BIGINT, 3:BIGINT))])\n"


This snapshot is the one I couldn't reconcile: the inner +(2:BIGINT, 3:BIGINT) widens both literals, but the outer operand stays plain 1 (INTEGER), whereas selectExpressionWithoutFrom's 1 + 1 widens both to 1:BIGINT. If the outer + in the sum reduction goes through the same widening operator, I'd expect +(1:BIGINT, +(2:BIGINT, 3:BIGINT)). Can you confirm whether the outer add is intentionally not widened (and that a single + mixing INTEGER + BIGINT operands is expected)?

ahkcs · 2026-07-01T22:25:57Z

Superseded by #5604. After discussion on #5164/#5204, the approach changed from operand-widening (silently promoting short→int, int→long, which also changed result column types) to overflow detection via Calcite checked arithmetic (error on overflow, matching the maintainer direction in #5204). Closing this in favor of #5604.

ahkcs · 2026-07-01T23:45:03Z

Reopening. Verified this widening approach fixes the overflow on both execution paths from a single change in the SQL-plugin lowering (PPLFuncImpTable), because the operand CASTs are baked into the RelNode before the engine fork:

Calcite/v2 path: short * short (30000×30000) → result type int, value 900000000 (correct, no wrap).
Analytics-engine / DataFusion path: same query routes to DataFusion as *(CAST($n):INTEGER, CAST($m):INTEGER) (confirmed via profile backends=[datafusion]) → int 900000000; the reported | where area > 2000000 now returns the matching row instead of 0. int * int widens to bigint.

Note: this is the widening approach; #5604 explores the alternative of erroring on overflow via Calcite checked arithmetic. The checked approach does not reach the DataFusion backend (CHECKED_* has no Substrait representation), so it only fixes the Calcite path. Flagging both for a direction decision.

github-actions · 2026-07-01T23:45:51Z

Persistent review updated to latest commit 6d53547

github-actions · 2026-07-01T23:46:15Z

PR Code Suggestions ✨

Latest suggestions up to a2e7706

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Verify scalar subquery before widening The method only checks if either operand contains a subquery but doesn't verify that the comparison actually involves a scalar subquery. Non-scalar subqueries (e.g., `IN` with a multi-row result) should not trigger this widening logic, as they have different semantics. core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [633-640] private static RexNode[] coerceNumericComparisonOperands( RexBuilder builder, BuiltinFunctionName functionName, RexNode... args) { if (args.length != 2 \|\| !BuiltinFunctionName.COMPARATORS.contains(functionName)) { return args; } - if (!containsSubQuery(args[0]) && !containsSubQuery(args[1])) { + boolean hasScalarSubquery = (args[0] instanceof RexSubQuery && ((RexSubQuery) args[0]).getKind() == SqlKind.SCALAR_QUERY) + \|\| (args[1] instanceof RexSubQuery && ((RexSubQuery) args[1]).getKind() == SqlKind.SCALAR_QUERY); + if (!hasScalarSubquery) { return args; } ... } Suggestion importance[1-10]: 7 __ Why: The suggestion identifies a valid concern that the method doesn't distinguish between scalar and non-scalar subqueries. The improved code adds a check for `SqlKind.SCALAR_QUERY`, which would make the logic more precise. However, the existing `containsSubQuery` helper may already handle this correctly through Calcite's decorrelation logic, so the impact is moderate.	Medium
	Preserve precision in DECIMAL coercion The `DECIMAL` case uses `doubleValue()` which can lose precision for large integers. For numeric types that aren't already `BigDecimal`, consider using `BigDecimal.valueOf(num.longValue())` for integer types or checking the actual runtime type to preserve precision. core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVAppendCore.java [60-75] private static Object coerce(Object value, SqlTypeName elementType) { if (elementType == null \|\| !(value instanceof Number)) { return value; } Number num = (Number) value; return switch (elementType) { case TINYINT -> num.byteValue(); case SMALLINT -> num.shortValue(); case INTEGER -> num.intValue(); case BIGINT -> num.longValue(); case FLOAT, REAL -> num.floatValue(); case DOUBLE -> num.doubleValue(); - case DECIMAL -> num instanceof BigDecimal ? num : BigDecimal.valueOf(num.doubleValue()); + case DECIMAL -> num instanceof BigDecimal ? num : + (num instanceof Float \|\| num instanceof Double) ? + BigDecimal.valueOf(num.doubleValue()) : + BigDecimal.valueOf(num.longValue()); default -> value; }; } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a potential precision loss when coercing to `DECIMAL` using `doubleValue()` for integer types. However, the impact is moderate since this affects edge cases with large integers, and the improved code adds complexity. The suggestion is valid but not critical.	Low
	Handle nested lambda expressions The method recursively checks for lambda parameters but doesn't handle `RexLambda` nodes themselves. If a lambda expression is nested within the arithmetic operation, the check should also inspect the lambda body to avoid missing lambda parameter references. core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [847-855] private static boolean referencesLambdaParameter(RexNode node) { if (node instanceof RexLambdaRef) { return true; + } + if (node instanceof RexLambda lambda) { + return referencesLambdaParameter(lambda.getExpression()); } if (node instanceof RexCall call) { return call.getOperands().stream().anyMatch(AbstractBuilder::referencesLambdaParameter); } return false; } Suggestion importance[1-10]: 5 __ Why: The suggestion points out that `RexLambda` nodes are not explicitly handled in the recursive check. While this is a valid observation, the existing code already handles `RexCall` which would include lambda expressions as operands. The improvement is minor and may not be necessary given the current implementation context.	Low

Previous suggestions

Suggestions up to commit 82ed0c1

Category	Suggestion	Impact
General	Preserve integer precision in DECIMAL coercion The `DECIMAL` case uses `doubleValue()` which can lose precision for large integers. When coercing integer types to `DECIMAL`, use `BigDecimal.valueOf(num.longValue())` to preserve exact integer values without floating-point rounding errors. core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVAppendCore.java [60-75] private static Object coerce(Object value, SqlTypeName elementType) { if (elementType == null \|\| !(value instanceof Number)) { return value; } Number num = (Number) value; return switch (elementType) { case TINYINT -> num.byteValue(); case SMALLINT -> num.shortValue(); case INTEGER -> num.intValue(); case BIGINT -> num.longValue(); case FLOAT, REAL -> num.floatValue(); case DOUBLE -> num.doubleValue(); - case DECIMAL -> num instanceof BigDecimal ? num : BigDecimal.valueOf(num.doubleValue()); + case DECIMAL -> num instanceof BigDecimal ? num : + (num instanceof Byte \|\| num instanceof Short \|\| num instanceof Integer \|\| num instanceof Long) + ? BigDecimal.valueOf(num.longValue()) + : BigDecimal.valueOf(num.doubleValue()); default -> value; }; } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a precision loss issue when coercing large integers to `DECIMAL` via `doubleValue()`. Using `longValue()` for integer types preserves exact values. However, the improved code could be simplified and the impact is moderate since `DECIMAL` coercion is a less common path.	Medium
General	Document subquery-specific coercion rationale The method only coerces when a subquery is present, but the join key type mismatch can occur in other scenarios. Consider broadening the condition or documenting why this specific case requires special handling while others don't. core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [633-640] private static RexNode[] coerceNumericComparisonOperands( RexBuilder builder, BuiltinFunctionName functionName, RexNode... args) { if (args.length != 2 \|\| !BuiltinFunctionName.COMPARATORS.contains(functionName)) { return args; } + // Only coerce for subquery comparisons where decorrelation creates join keys if (!containsSubQuery(args[0]) && !containsSubQuery(args[1])) { return args; } ... } Suggestion importance[1-10]: 3 __ Why: The suggestion asks for additional documentation, but the existing comment at lines 617-632 already explains why subquery comparisons require special handling (decorrelation creates join keys with mismatched types). The improved code only adds a redundant comment without substantive improvement.	Low

Suggestions up to commit 6d53547

Category	Suggestion	Impact
General	Skip redundant casts for matching types The cast operation may fail or produce unexpected results if the source type is already wider than the target. Add a check to skip casting when `args[i]` already has the promoted type to avoid redundant operations and potential issues. core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [761-772] private static RexNode[] widenIntegerOperands(RexBuilder builder, RexNode... args) { SqlTypeName promoted = promotedIntegerType(args); if (promoted == null) { return args; } RexNode[] widened = new RexNode[args.length]; for (int i = 0; i < args.length; i++) { - RelDataType target = TYPE_FACTORY.createSqlType(promoted, args[i].getType().isNullable()); - widened[i] = builder.makeCast(target, args[i]); + if (args[i].getType().getSqlTypeName() == promoted) { + widened[i] = args[i]; + } else { + RelDataType target = TYPE_FACTORY.createSqlType(promoted, args[i].getType().isNullable()); + widened[i] = builder.makeCast(target, args[i]); + } } return widened; } Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that redundant casts could be avoided when the operand already has the promoted type. However, this is a minor optimization that doesn't fix a bug or significantly improve performance. The current implementation is correct and the suggested optimization would only marginally improve efficiency in specific cases.	Low

github-actions · 2026-07-02T09:31:43Z

Persistent review updated to latest commit 2ea4a2d

github-actions · 2026-07-02T22:03:50Z

Persistent review updated to latest commit 82ed0c1

PPL integer arithmetic (`+`, `-`, `*` and the named add/subtract/multiply functions) inferred `SMALLINT op SMALLINT -> SMALLINT` (and likewise for TINYINT) because the operators were registered with Calcite's stock `SqlStdOperatorTable.PLUS/MINUS/MULTIPLY`, whose return-type inference (`ReturnTypes.NULLABLE_SUM` / `PRODUCT_NULLABLE`) falls through to `LEAST_RESTRICTIVE` for non-decimal integers. As a result the product/sum of two narrow-integer columns overflowed the inferred type on every backend, just differently: - DataFusion / analytics-engine: silently wraps the i16 result, so e.g. `eval area = ResolutionWidth * ResolutionHeight | where area > 2000000` returned 0 rows instead of the matching row. - Calcite Enumerable engine: throws `ArithmeticException: value out of range`. - v2 legacy engine: `ExprShortValue` narrows via `shortValue()`, wrapping. Fix in the SQL-plugin lowering so all backends are corrected at once: widen the operands (byte/short -> INTEGER, any int/long -> BIGINT) before applying the operator. Casting the operands rather than only relabelling the result type is required, otherwise DataFusion still computes the narrow multiply and wraps before any outer cast. Non-integral operands (float/double/decimal/ datetime/mixed) are left untouched and defer to Calcite's default inference. The string-concat `ADD` variant and the DATETIME-DATETIME `SUBTRACT` variant are unchanged. mvindex's internal array-index arithmetic now uses the raw Calcite PLUS/MINUS operators so array indices stay INTEGER for ITEM/ARRAY_SLICE codegen (the widened result would otherwise be rejected as a long index). Note: this changes user-visible result column types for integer arithmetic (int-operand expressions now report bigint), which is the intended trade-off for overflow-safe, backend-consistent results. Adds CalcitePPLBuiltinFunctionIT coverage for the short->int and int->bigint widening tiers and updates the affected logical-plan / Spark-SQL snapshots. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-07-02T22:16:09Z

Persistent review updated to latest commit a2e7706

ahkcs requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners July 1, 2026 16:51

RyanL1997 reviewed Jul 1, 2026

View reviewed changes

ahkcs closed this Jul 1, 2026

ahkcs reopened this Jul 1, 2026

ahkcs force-pushed the pr/integer-arithmetic-overflow-widening branch from 6d53547 to 2ea4a2d Compare July 2, 2026 09:29

ahkcs added the bug Something isn't working label Jul 2, 2026

ahkcs added the enhancement New feature or request label Jul 2, 2026

This was referenced Jul 2, 2026

[Analytics Engine] Error on BIGINT arithmetic overflow instead of wrapping opensearch-project/OpenSearch#22378

Open

Detect long (BIGINT) arithmetic overflow instead of silently wrapping (#5164) #5604

Open

ahkcs force-pushed the pr/integer-arithmetic-overflow-widening branch from 2ea4a2d to 82ed0c1 Compare July 2, 2026 22:02

ahkcs force-pushed the pr/integer-arithmetic-overflow-widening branch from 82ed0c1 to a2e7706 Compare July 2, 2026 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Widen narrow integer operands in PPL +/-/* to prevent overflow#5603

Widen narrow integer operands in PPL +/-/* to prevent overflow#5603
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:pr/integer-arithmetic-overflow-widening

ahkcs commented Jul 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

RyanL1997 left a comment

Uh oh!

RyanL1997 Jul 1, 2026

Uh oh!

RyanL1997 Jul 1, 2026

Uh oh!

ahkcs commented Jul 1, 2026

Uh oh!

ahkcs commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ahkcs commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why widening (and not error-on-overflow)

Verification

Related Issues

Check List

Uh oh!

github-actions Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit a2e7706)

Uh oh!

RyanL1997 left a comment

Choose a reason for hiding this comment

Uh oh!

RyanL1997 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

RyanL1997 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs commented Jul 1, 2026

Uh oh!

ahkcs commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ahkcs commented Jul 1, 2026 •

edited

Loading

github-actions Bot commented Jul 1, 2026 •

edited

Loading

(Review updated until commit `a2e7706`)

github-actions Bot commented Jul 1, 2026 •

edited

Loading