Skip to content

fix: codegen dispatcher returns NULL for invalid try_make_timestamp inputs (#4554)#4579

Open
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:fix/issue-4554
Open

fix: codegen dispatcher returns NULL for invalid try_make_timestamp inputs (#4554)#4579
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:fix/issue-4554

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4554.

Rationale for this change

try_make_timestamp rewrites to MakeTimestamp(failOnError = false), which Spark's doGenCode implements with a try { ... } catch (DateTimeException e) { ev.isNull = true; } block. With Comet's codegen dispatcher routing MakeTimestamp through Spark's own codegen, the kernel was honoring the input-null short-circuit but skipping the post-eval ev.isNull guard, so invalid date/time components (month=13, day=32, hour=25, ...) returned stale ev.value bytes instead of NULL.

The dispatcher's defaultBody conflated NullIntolerant's "null in → null out" guarantee with the converse "non-null in → non-null out", which MakeTimestamp(failOnError=false) does not satisfy. The same path also affects non-ANSI make_timestamp directly, since that is the same expression with the same flag.

What changes are included in this PR?

  • CometBatchKernelCodegen.defaultBody: in the NullIntolerant && allNullIntolerant branch, when the bound expression is nullable, wrap the post-ev.code write in if (ev.isNull) setNull else write. Non-nullable roots keep the direct write (the existing optimization).
  • CometCodegenSourceSuite: lock in the fix with a generated-source assertion that MakeTimestamp(failOnError=false) emits two setNull sites — input short-circuit + post-eval guard.
  • make_timestamp.sql: add column and literal queries with invalid components covering month/day/hour/minute out-of-range. These ride the default failOnError=false path (ANSI off in CometSqlFileTestSuite).
  • try_make_timestamp.sql (new, Spark 4.0+): direct coverage for try_make_timestamp with both column and literal invalid inputs, plus a sentinel valid query so the dispatcher must run natively.

How are these changes tested?

  • CometCodegenSourceSuite (Spark 3.5): 51/51 pass; the new test fails on the unmodified codegen and passes with the fix.
  • CometSqlFileTestSuite make_timestamp filter (Spark 3.5 and 4.0): all three fixtures pass; on 3.5 try_make_timestamp.sql is skipped via MinSparkVersion: 4.0.
  • All datetime SQL fixtures (Spark 3.5): 89/89 pass.
  • CometCodegenSuite, CometCodegenHOFSuite, CometTemporalExpressionSuite: pass.

…roots (apache#4554)

`MakeTimestamp(failOnError=false)` is `NullIntolerant=true` but its `doGenCode`
catches `DateTimeException` for invalid year/month/day/hour/min/sec components
and sets `ev.isNull = true`. The codegen dispatcher's NullIntolerant short-
circuit conflated "null in -> null out" with "non-null in -> non-null out" and
dropped the post-eval guard, so invalid rows received stale `ev.value` bytes
instead of NULL. `try_make_timestamp` (which rewrites to MakeTimestamp with
`failOnError=false`) and the non-ANSI MakeTimestamp path were both affected.

Wrap the post-`ev.code` write in `if (ev.isNull) setNull else write` whenever
the bound expression is nullable; non-nullable roots keep the direct write.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @andygrove! I really like seeing these types of fixes to codegen since I'm sure it fixes more than just this one expression that you tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

try_make_timestamp returns wrong values for invalid inputs instead of NULL

2 participants