feat: route aes_encrypt / aes_decrypt / try_aes_decrypt through codegen dispatcher#4557
feat: route aes_encrypt / aes_decrypt / try_aes_decrypt through codegen dispatcher#4557andygrove wants to merge 8 commits into
Conversation
…en dispatcher The three AES functions are RuntimeReplaceable expressions that lower to StaticInvoke(ExpressionImplUtils.aesEncrypt/aesDecrypt) and (for try_aes_decrypt) TryEval(StaticInvoke(...)). Comet's StaticInvoke serde allowlists target methods, so AES previously fell back to Spark with "Static invoke expression: aesEncrypt is not supported". Add codegen dispatch handlers: - staticInvokeExpressions allowlist gains aesEncrypt and aesDecrypt, routed via CometStaticInvokeCodegenDispatch - TryEval gets a top-level CometTryEval handler so try_aes_decrypt and any other TryEval-wrapped expression compile through the dispatcher Fix a latent kernel bug exposed by TryEval. Spark 4.x marks StaticInvoke as nullIntolerant when propagateNull=true, which makes allNullIntolerant(TryEval(StaticInvoke(...))) return true. The kernel's short-circuit then skips the ev.isNull check and feeds a null ev.value to the binary writer (NPE on .length). TryEval is NullIntolerant for input nulls but can produce null on non-null inputs when its child throws, breaking the short-circuit's invariant. Reject TryEval in allNullIntolerant so the default body re-checks ev.isNull. Promote the AES SQL fixtures from `query spark_answer_only` to default `query` mode now that they execute inside Comet, and mark the three functions as supported in expressions.md.
Match the existing `CometCodegenDispatch[T]` one-liner style in datetime.scala (`CometAddMonths`, `CometGetTimestamp`, etc.), which carry no scaladoc. The fact that these route through the codegen dispatcher is already documented on `CometCodegenDispatch` itself.
The longer "Notes" column from the AES rows widened the table; prettier wants the column padding minimized to the now-longest cell.
| private def allNullIntolerant(expr: Expression): Boolean = | ||
| !expr.exists { | ||
| case _: BoundReference | _: Literal => false | ||
| case _: TryEval => true |
There was a problem hiding this comment.
This and #4579 are fixing the same latent bug in the codegen dispatcher's NullIntolerant short-circuit. Both expose the case where an expression reports nullIntolerant=true but can still emit null on non-null inputs: TryEval here via the caught exception, MakeTimestamp(failOnError=false) in #4579 via the caught DateTimeException. In both the short-circuit's else branch wrote ev.value without re-checking ev.isNull.
I just approved #4579. Its fix is the general one: it keeps the null-input fast skip and re-adds the post-eval if (ev.isNull) setNull else write guard for any nullable NullIntolerant root, so it already covers try_aes_decrypt (the TryEval root is nullable, and TryEval.doGenCode sets ev.isNull=true on the caught exception, so the guarded write skips the binary writer and the NPE never happens).
Once #4579 lands, the allNullIntolerant-rejects-TryEval change in CometBatchKernelCodegen.scala here is redundant. Worse, it pushes TryEval trees out of the now-correct short-circuit onto the default path, which loses the null-input fast skip for no correctness benefit. Could you rebase on #4579 and drop the CometBatchKernelCodegen.scala edit, keeping the serde additions (statics.scala, QueryPlanSerde.scala) and the SQL fixtures? That keeps the two PRs from conflicting on CometBatchKernelCodegen.scala.
One thing I checked: the only AES path that produces null on non-null input is the TryEval one, which is nullable, so #4579's guard engages. The non-try aes_decrypt / aes_encrypt throw on bad input rather than returning null, matching Spark, so the non-nullable direct-write path stays correct for those.
Which issue does this PR close?
Closes #4558.
Rationale for this change
The AES functions (
aes_encrypt,aes_decrypt,try_aes_decrypt) previously fell back to Spark. They areRuntimeReplaceableand lower to aStaticInvokeofExpressionImplUtils.aesEncrypt/aesDecrypt(and, fortry_aes_decrypt,TryEval(StaticInvoke(aesDecrypt, ...))). Comet'sStaticInvokeserde only allowlists a small set of target methods (readSidePadding,isLuhnNumber, URLencode/decode), so AES reportedStatic invoke expression: aesEncrypt is not supported.This PR adds native execution by routing these expressions through Comet's existing JVM codegen dispatcher (gated by
spark.comet.exec.scalaUDF.codegen.enabled, defaulttrue). The dispatcher compiles Spark's owndoGenCodeinto a per-batch Janino kernel, so behavior matches Spark exactly across versions without reimplementing AES in Rust.What changes are included in this PR?
SQL fixtures (
spark/src/test/resources/sql-tests/expressions/misc/):aes.sql— GCM (default), ECB; 16/24/32-byte keys; column and literal arguments; NULL inputs.aes_cbc.sql— CBC round-trip (Spark 3.5+, gated withMinSparkVersion: 3.5).aes_try_decrypt.sql—try_aes_decrypton invalid ciphertext (Spark 3.5+, gated).aes_encryptuses a random IV for GCM/CBC (nondeterministic output), so those cases test via round-trip (aes_decrypt(aes_encrypt(x, k), k) = x); ECB output is deterministic and compared directly. All fixtures use the defaultquerymode (full result + Comet-coverage check).Serde (
spark/src/main/scala/org/apache/comet/serde/):statics.scala: extendstaticInvokeExpressionswithaesEncrypt/aesDecrypt, both routed through a newCometStaticInvokeCodegenDispatch(aCometCodegenDispatch[StaticInvoke]). DefineCometTryEval extends CometCodegenDispatch[TryEval]to handletry_aes_decrypt(and any otherTryEval-wrapped expression, e.g.try_to_binary,try_addfor non-numeric inputs).QueryPlanSerde.scala: registerclassOf[TryEval] -> CometTryEvalinmiscExpressions.Kernel fix (
spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala):Adding
TryEvalexposed a latent bug. In Spark 4.x,StaticInvokeisNullIntolerantwhenpropagateNull=true, soallNullIntolerant(TryEval(StaticInvoke(...)))returnstrue. The kernel's null-intolerant short-circuit then skips theev.isNullcheck and feeds a nullev.valueto the binary writer (NPE on.length).TryEvalisNullIntolerantfor input nulls (null in -> null out) but can produce null on non-null inputs when its child throws, violating the short-circuit's "non-null in -> non-null out" invariant. RejectTryEvalinallNullIntolerantso the default body re-checksev.isNull. This was masked on Spark 3.5 (whereStaticInvokeis notNullIntolerant) but visible on 4.0 withtry_aes_decrypt(garbage, key).Docs:
docs/source/user-guide/latest/expressions.md:aes_encrypt,aes_decrypt,try_aes_decryptflipped from 🔜 (planned) to ✅ (supported), with notes pointing at the codegen dispatcher.How are these changes tested?
Verified across Spark 3.4, 3.5, 4.0:
All three SQL files pass on every supported Spark version. Existing
CometCodegenSuite,CometCodegenFuzzSuite, alltry_*SQL tests, and allmisc/SQL tests continue to pass on Spark 3.5 and 4.0.