Skip to content

feat: route aes_encrypt / aes_decrypt / try_aes_decrypt through codegen dispatcher#4557

Open
andygrove wants to merge 8 commits into
apache:mainfrom
andygrove:test-aes-coverage
Open

feat: route aes_encrypt / aes_decrypt / try_aes_decrypt through codegen dispatcher#4557
andygrove wants to merge 8 commits into
apache:mainfrom
andygrove:test-aes-coverage

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Jun 1, 2026

Which issue does this PR close?

Closes #4558.

Rationale for this change

The AES functions (aes_encrypt, aes_decrypt, try_aes_decrypt) previously fell back to Spark. They are RuntimeReplaceable and lower to a StaticInvoke of ExpressionImplUtils.aesEncrypt / aesDecrypt (and, for try_aes_decrypt, TryEval(StaticInvoke(aesDecrypt, ...))). Comet's StaticInvoke serde only allowlists a small set of target methods (readSidePadding, isLuhnNumber, URL encode / decode), so AES reported Static invoke expression: aesEncrypt is not supported.

This PR adds native execution by routing these expressions through Comet's existing JVM codegen dispatcher (gated by spark.comet.exec.scalaUDF.codegen.enabled, default true). The dispatcher compiles Spark's own doGenCode into a per-batch Janino kernel, so behavior matches Spark exactly across versions without reimplementing AES in Rust.

What changes are included in this PR?

SQL fixtures (spark/src/test/resources/sql-tests/expressions/misc/):

  • aes.sql — GCM (default), ECB; 16/24/32-byte keys; column and literal arguments; NULL inputs.
  • aes_cbc.sql — CBC round-trip (Spark 3.5+, gated with MinSparkVersion: 3.5).
  • aes_try_decrypt.sqltry_aes_decrypt on invalid ciphertext (Spark 3.5+, gated).

aes_encrypt uses a random IV for GCM/CBC (nondeterministic output), so those cases test via round-trip (aes_decrypt(aes_encrypt(x, k), k) = x); ECB output is deterministic and compared directly. All fixtures use the default query mode (full result + Comet-coverage check).

Serde (spark/src/main/scala/org/apache/comet/serde/):

  • statics.scala: extend staticInvokeExpressions with aesEncrypt / aesDecrypt, both routed through a new CometStaticInvokeCodegenDispatch (a CometCodegenDispatch[StaticInvoke]). Define CometTryEval extends CometCodegenDispatch[TryEval] to handle try_aes_decrypt (and any other TryEval-wrapped expression, e.g. try_to_binary, try_add for non-numeric inputs).
  • QueryPlanSerde.scala: register classOf[TryEval] -> CometTryEval in miscExpressions.

Kernel fix (spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala):

Adding TryEval exposed a latent bug. In Spark 4.x, StaticInvoke is NullIntolerant when propagateNull=true, so allNullIntolerant(TryEval(StaticInvoke(...))) returns true. The kernel's null-intolerant short-circuit then skips the ev.isNull check and feeds a null ev.value to the binary writer (NPE on .length). TryEval is NullIntolerant for input nulls (null in -> null out) but can produce null on non-null inputs when its child throws, violating the short-circuit's "non-null in -> non-null out" invariant. Reject TryEval in allNullIntolerant so the default body re-checks ev.isNull. This was masked on Spark 3.5 (where StaticInvoke is not NullIntolerant) but visible on 4.0 with try_aes_decrypt(garbage, key).

Docs:

docs/source/user-guide/latest/expressions.md: aes_encrypt, aes_decrypt, try_aes_decrypt flipped from 🔜 (planned) to ✅ (supported), with notes pointing at the codegen dispatcher.

How are these changes tested?

Verified across Spark 3.4, 3.5, 4.0:

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite aes" -Dtest=none
JAVA_HOME=$(/usr/libexec/java_home -v 17) ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite aes" -Dtest=none -Pspark-4.0

All three SQL files pass on every supported Spark version. Existing CometCodegenSuite, CometCodegenFuzzSuite, all try_* SQL tests, and all misc/ SQL tests continue to pass on Spark 3.5 and 4.0.

@andygrove andygrove marked this pull request as draft June 2, 2026 14:08
andygrove added 2 commits June 2, 2026 14:41
…en dispatcher

The three AES functions are RuntimeReplaceable expressions that lower to
StaticInvoke(ExpressionImplUtils.aesEncrypt/aesDecrypt) and (for
try_aes_decrypt) TryEval(StaticInvoke(...)). Comet's StaticInvoke serde
allowlists target methods, so AES previously fell back to Spark with
"Static invoke expression: aesEncrypt is not supported".

Add codegen dispatch handlers:
- staticInvokeExpressions allowlist gains aesEncrypt and aesDecrypt,
  routed via CometStaticInvokeCodegenDispatch
- TryEval gets a top-level CometTryEval handler so try_aes_decrypt and
  any other TryEval-wrapped expression compile through the dispatcher

Fix a latent kernel bug exposed by TryEval. Spark 4.x marks StaticInvoke
as nullIntolerant when propagateNull=true, which makes
allNullIntolerant(TryEval(StaticInvoke(...))) return true. The kernel's
short-circuit then skips the ev.isNull check and feeds a null ev.value
to the binary writer (NPE on .length). TryEval is NullIntolerant for
input nulls but can produce null on non-null inputs when its child
throws, breaking the short-circuit's invariant. Reject TryEval in
allNullIntolerant so the default body re-checks ev.isNull.

Promote the AES SQL fixtures from `query spark_answer_only` to default
`query` mode now that they execute inside Comet, and mark the three
functions as supported in expressions.md.
@andygrove andygrove changed the title test: add SQL file tests for aes_encrypt, aes_decrypt, try_aes_decrypt feat: route aes_encrypt / aes_decrypt / try_aes_decrypt through codegen dispatcher Jun 2, 2026
Match the existing `CometCodegenDispatch[T]` one-liner style in
datetime.scala (`CometAddMonths`, `CometGetTimestamp`, etc.), which
carry no scaladoc. The fact that these route through the codegen
dispatcher is already documented on `CometCodegenDispatch` itself.
@andygrove andygrove marked this pull request as ready for review June 2, 2026 21:35
The longer "Notes" column from the AES rows widened the table; prettier
wants the column padding minimized to the now-longest cell.
private def allNullIntolerant(expr: Expression): Boolean =
!expr.exists {
case _: BoundReference | _: Literal => false
case _: TryEval => true
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and #4579 are fixing the same latent bug in the codegen dispatcher's NullIntolerant short-circuit. Both expose the case where an expression reports nullIntolerant=true but can still emit null on non-null inputs: TryEval here via the caught exception, MakeTimestamp(failOnError=false) in #4579 via the caught DateTimeException. In both the short-circuit's else branch wrote ev.value without re-checking ev.isNull.

I just approved #4579. Its fix is the general one: it keeps the null-input fast skip and re-adds the post-eval if (ev.isNull) setNull else write guard for any nullable NullIntolerant root, so it already covers try_aes_decrypt (the TryEval root is nullable, and TryEval.doGenCode sets ev.isNull=true on the caught exception, so the guarded write skips the binary writer and the NPE never happens).

Once #4579 lands, the allNullIntolerant-rejects-TryEval change in CometBatchKernelCodegen.scala here is redundant. Worse, it pushes TryEval trees out of the now-correct short-circuit onto the default path, which loses the null-input fast skip for no correctness benefit. Could you rebase on #4579 and drop the CometBatchKernelCodegen.scala edit, keeping the serde additions (statics.scala, QueryPlanSerde.scala) and the SQL fixtures? That keeps the two PRs from conflicting on CometBatchKernelCodegen.scala.

One thing I checked: the only AES path that produces null on non-null input is the TryEval one, which is nullable, so #4579's guard engages. The non-try aes_decrypt / aes_encrypt throw on bad input rather than returning null, matching Spark, so the non-nullable direct-write path stays correct for those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support aes_encrypt / aes_decrypt / try_aes_decrypt natively

2 participants