fix: honor ANSI mode for make_date/next_day and stop next_day trimming#4566
Open
andygrove wants to merge 1 commit into
Open
fix: honor ANSI mode for make_date/next_day and stop next_day trimming#4566andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
make_date and next_day previously returned NULL for invalid input even when spark.sql.ansi.enabled=true, and next_day trimmed whitespace from the dayOfWeek argument, all of which diverge from Spark. - make_date: SparkMakeDate now carries the ANSI flag (fail_on_error) and throws a java.time-style message (MonthOfYear / DayOfMonth / Invalid date) on invalid input under ANSI, matching the message Spark surfaces via ansiDateTimeArgumentOutOfRange / ansiDateTimeError. - next_day: replace the upstream datafusion-spark::SparkNextDay (which trims dayOfWeek and never throws) with a Comet-native implementation that matches Spark's DateTimeUtils.getDayOfWeekFromString exactly: no whitespace trimming, and throws "Illegal input for day of week" under ANSI while returning NULL otherwise. The resolved ANSI flag is read from MakeDate.failOnError / NextDay.failOnError in the Scala serde and passed to native through the existing ScalarFunc.fail_on_error proto field. Closes apache#4451 Closes apache#4450 Closes apache#4449
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4451
Closes #4450
Closes #4449
Rationale for this change
These three correctness bugs were surfaced by the date/time audit (#4448). All three cause Comet to diverge from Spark for
make_dateandnext_day:make_date(year, month, day)returnsNULLfor an invalid triple even whenspark.sql.ansi.enabled=true, where Spark throws viaansiDateTimeArgumentOutOfRange(4.0) /ansiDateTimeError(3.4/3.5).next_day(date, dayOfWeek)trims leading/trailing whitespace fromdayOfWeekbefore matching. Spark'sDateTimeUtils.getDayOfWeekFromStringdoes not trim, so values like' MO 'succeed in Comet but are invalid in Spark.next_day(date, dayOfWeek)returnsNULLfor an unrecogniseddayOfWeekeven under ANSI, where Spark throwsSparkIllegalArgumentException(3.5+) /IllegalArgumentException(3.4).What changes are included in this PR?
make_date:SparkMakeDatenow carries the ANSI flag (fail_on_error). On invalid input under ANSI it throws ajava.time-style message (MonthOfYear/DayOfMonthrange, orInvalid date '...'), reproducing the text Spark wraps inDATETIME_FIELD_OUT_OF_BOUNDS/_LEGACY_ERROR_TEMP_2000. Non-ANSI behaviour (returnNULL) is unchanged.next_day: replaced the upstreamdatafusion-spark::SparkNextDay(which trimsdayOfWeekand never throws) with a Comet-native implementation that matchesDateTimeUtils.getDayOfWeekFromStringexactly: no whitespace trimming, and throwsIllegal input for day of weekunder ANSI while returningNULLotherwise.MakeDate.failOnError/NextDay.failOnErrorin the Scala serde and passed to native through the existingScalarFunc.fail_on_errorproto field. No new proto fields or native shim traits were needed.How are these changes tested?
next_day(no-trim day-of-week parsing, next-date math).make_date_ansi.sqlandnext_day_ansi.sqlassert the ANSI throw paths viaexpect_error, each with a non-error sentinel query so the assertions cannot pass vacuously through a Spark fallback;next_day.sqlgains a non-ANSI whitespace regression case. Verified passing on Spark 3.5 and Spark 4.1.Note: #4448 (still open) adds these same fixtures as
ignore(...)reproducers; if that merges first this PR will need a trivial rebase to drop the duplicated files.