Widen narrow integer operands in PPL +/-/* to prevent overflow#5603
Widen narrow integer operands in PPL +/-/* to prevent overflow#5603ahkcs wants to merge 1 commit into
Conversation
PR Reviewer Guide 🔍(Review updated until commit a2e7706)Here are some key observations to aid the review process:
|
| // Register ADDFUNCTION for numeric addition only. Widen narrow integer operands so the sum | ||
| // cannot overflow the (mis-)inferred SMALLINT/TINYINT result type; see | ||
| // registerWideningIntegerOperator. | ||
| registerWideningIntegerOperator(ADDFUNCTION, SqlStdOperatorTable.PLUS); |
There was a problem hiding this comment.
Does this means the behavior-change note in the PR description applies to SQL too, not just PPL eval?
| RelNode root = getRelNode(ppl); | ||
| String expectedLogical = | ||
| "LogicalProject(EMPNO=[$0], total=[+(1, +(2, 3))])\n" | ||
| "LogicalProject(EMPNO=[$0], total=[+(1, +(2:BIGINT, 3:BIGINT))])\n" |
There was a problem hiding this comment.
This snapshot is the one I couldn't reconcile: the inner +(2:BIGINT, 3:BIGINT) widens both literals, but the outer operand stays plain 1 (INTEGER), whereas selectExpressionWithoutFrom's 1 + 1 widens both to 1:BIGINT. If the outer + in the sum reduction goes through the same widening operator, I'd expect +(1:BIGINT, +(2:BIGINT, 3:BIGINT)). Can you confirm whether the outer add is intentionally not widened (and that a single + mixing INTEGER + BIGINT operands is expected)?
|
Superseded by #5604. After discussion on #5164/#5204, the approach changed from operand-widening (silently promoting short→int, int→long, which also changed result column types) to overflow detection via Calcite checked arithmetic (error on overflow, matching the maintainer direction in #5204). Closing this in favor of #5604. |
|
Reopening. Verified this widening approach fixes the overflow on both execution paths from a single change in the SQL-plugin lowering (
Note: this is the widening approach; #5604 explores the alternative of erroring on overflow via Calcite checked arithmetic. The checked approach does not reach the DataFusion backend ( |
|
Persistent review updated to latest commit 6d53547 |
PR Code Suggestions ✨Latest suggestions up to a2e7706 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 82ed0c1
Suggestions up to commit 6d53547
|
6d53547 to
2ea4a2d
Compare
|
Persistent review updated to latest commit 2ea4a2d |
2ea4a2d to
82ed0c1
Compare
|
Persistent review updated to latest commit 82ed0c1 |
PPL integer arithmetic (`+`, `-`, `*` and the named add/subtract/multiply
functions) inferred `SMALLINT op SMALLINT -> SMALLINT` (and likewise for
TINYINT) because the operators were registered with Calcite's stock
`SqlStdOperatorTable.PLUS/MINUS/MULTIPLY`, whose return-type inference
(`ReturnTypes.NULLABLE_SUM` / `PRODUCT_NULLABLE`) falls through to
`LEAST_RESTRICTIVE` for non-decimal integers.
As a result the product/sum of two narrow-integer columns overflowed the
inferred type on every backend, just differently:
- DataFusion / analytics-engine: silently wraps the i16 result, so e.g.
`eval area = ResolutionWidth * ResolutionHeight | where area > 2000000`
returned 0 rows instead of the matching row.
- Calcite Enumerable engine: throws `ArithmeticException: value out of
range`.
- v2 legacy engine: `ExprShortValue` narrows via `shortValue()`, wrapping.
Fix in the SQL-plugin lowering so all backends are corrected at once: widen
the operands (byte/short -> INTEGER, any int/long -> BIGINT) before applying
the operator. Casting the operands rather than only relabelling the result
type is required, otherwise DataFusion still computes the narrow multiply and
wraps before any outer cast. Non-integral operands (float/double/decimal/
datetime/mixed) are left untouched and defer to Calcite's default inference.
The string-concat `ADD` variant and the DATETIME-DATETIME `SUBTRACT` variant
are unchanged.
mvindex's internal array-index arithmetic now uses the raw Calcite
PLUS/MINUS operators so array indices stay INTEGER for ITEM/ARRAY_SLICE
codegen (the widened result would otherwise be rejected as a long index).
Note: this changes user-visible result column types for integer arithmetic
(int-operand expressions now report bigint), which is the intended trade-off
for overflow-safe, backend-consistent results.
Adds CalcitePPLBuiltinFunctionIT coverage for the short->int and int->bigint
widening tiers and updates the affected logical-plan / Spark-SQL snapshots.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
82ed0c1 to
a2e7706
Compare
|
Persistent review updated to latest commit a2e7706 |
Description
Integer arithmetic (
+,-,*) on narrow integer types silently overflowed because the operators kept Calcite'sLEAST_RESTRICTIVEresult type (SMALLINT * SMALLINT -> SMALLINT,INTEGER * INTEGER -> INTEGER). For exampleeval area = ResolutionWidth * ResolutionHeight(bothshort) computed28,777,836but wrapped into the 16-bit range, sowhere area > 2000000returned 0 rows. The reported workaround wascast(... as long).This widens narrow integer operands before the operation so the result cannot overflow the (mis-inferred) narrow type:
byte/shortintint/longinvolvedlong(bigint)float/double/decimal/datetimeThe widening is a CAST injected on the operands in the SQL-plugin lowering (
PPLFuncImpTable), applied to+,-,*and the namedadd/subtract/multiplyfunctions. The string-concat+and theDATETIME - DATETIMEvariant are unchanged. Arithmetic inside higher-order-function lambda bodies (reduce/mvmap) is left unwidened — those operands are lambda-parameter placeholders resolved by the HOF's own type inference, and they execute JVM-side where operands already promote to int.Why widening (and not error-on-overflow)
An alternative is to detect overflow and error (Calcite's
CHECKED_PLUS/CHECKED_MINUS/CHECKED_MULTIPLY→Math.addExact, explored in #5604). Widening was chosen because it is the only approach that fixes both execution engines from a single change:PPLFuncImpTable(shared lowering, upstream of the fork), the widened operand types flow through to both — including through Substrait to DataFusion. Verified live:short * short(30000×30000) returnsint900000000on the Calcite path and on the DataFusion route (profile confirms*(CAST($n):INTEGER, ...)withbackends=[datafusion]); the reported| where area > 2000000now returns the matching row on both.Math.addExactin generated Java).CHECKED_*has no Substrait representation, so on the analytics-engine route the arithmetic executes natively in DataFusion and the checked rewrite never reaches it — leaving the overflow unfixed there. Making DataFusion error would require separate backend work (a custom Rust checked-arithmetic UDF). So erroring would need two independent implementations (Calcite checked-arith + a DataFusion UDF) and still diverge; widening needs one and is consistent.Widening fully resolves the reported
short/intoverflow on both engines. It does not addresslong * longoverflow (no wider primitive type) — that residual is out of scope here and, if needed, is a separate error-on-overflow effort.Verification
eval area = short * short(30000²)smallint, negative)int900000000eval area = short * short | where area > 2000000eval x = int * int(2e9²)bigint, correctCalcitePPLBuiltinFunctionITwidening ITs (new)CalcitePPLEvalTest/ArrayFunctionTest/DedupTest/etc. +UnifiedQueryPlannerSqlV2Test(plan snapshots)CalciteExplainIT/CalciteNoPushdownIT(explain fixtures)collection.mddoctest (reduce/mvmap unaffected)Related Issues
Resolves the "eval integer arithmetic overflow (no type widening)" bug. See #5164 / #5604 for the alternative error-on-overflow approach (Calcite-path only).
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.