[CLAUDE] [OPUS 4.7] feat(hive): register NAMED_STRUCT in parser so struct fields annotate #7561
Open
RichardHughes-amp wants to merge 4 commits intotobymao:mainfrom
Open
[CLAUDE] [OPUS 4.7] feat(hive): register NAMED_STRUCT in parser so struct fields annotate #7561RichardHughes-amp wants to merge 4 commits intotobymao:mainfrom
RichardHughes-amp wants to merge 4 commits intotobymao:mainfrom
Conversation
named_struct(name1, val1, ...) is a Hive built-in (since 0.8.0, HIVE-1360)
and the standard Spark/Databricks way to construct a struct with named
fields. It wasn't registered in HiveParser.FUNCTIONS, so it fell through
to exp.Anonymous and _annotate_struct never ran — leaving downstream
Dot(struct, field) access annotated as UNKNOWN even when every input
type was known.
Map named_struct(...) to exp.Struct with PropertyEQ children so it
annotates identically to the equivalent STRUCT(v AS k, ...) alias form.
Also fix Hive.struct_sql to emit NAMED_STRUCT('k', v, ...) when all
children are named. The prior branch warned 'Hive does not support
named structs' and dropped field names — that message conflated Hive
(which has the function) with the STRUCT(v AS k) alias syntax (which
is Spark-only). Three BigQuery→Hive test expectations are updated to
reflect the preserved names.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I'm relatively certain this is an actual fix; we monkey-patched this into our SQLGlot implementation and it resolved issues locally. No changes to the generator to keep the impact on the tests minimal and ensure it won't give any surprises to any other users of SQLGlot, but it's worthwhile to consider generating NAMED_STRUCT's rather than STRUCTs when the underlying elements are named.
Everything above is from Richard.Everything below is from Claude.Problem
named_struct(name1, val1, ...)is a Hive built-in (LanguageManual UDF, since Hive 0.8.0 / HIVE-1360) and the standard Spark/Databricks way to build a named struct. It isn't inHiveParser.FUNCTIONS, so it falls through toexp.Anonymous._annotate_structonly runs onexp.Struct, so the call staysUNKNOWNand any downstreamDot(struct, field)inherits that — even when every input type is known and the equivalentSTRUCT(v AS k, ...)alias form annotates correctly today.Repro (schema:
src(id INT, n VARCHAR)):```sql
WITH t AS (SELECT id, named_struct('n', n) AS s FROM src)
SELECT s.n FROM t -- before: UNKNOWN; after: VARCHAR
```
Fix
HiveParser.FUNCTIONS[\"NAMED_STRUCT\"]— mapnamed_struct('k', v, ...)toexp.StructwithPropertyEQchildren, the same shapeSTRUCT(v AS k, ...)already produces._annotate_structthen works unmodified.Parser-only: no generator changes, no changes to emitted SQL for any existing input.
Tests
tests/dialects/test_spark.py—test_named_struct: confirms Spark/Databricks generate the canonical aliased form from the new AST.tests/fixtures/optimizer/annotate_types.sql— three fixtures covering flat, multi-field, and nestednamed_struct.Full suite: 1072 passed, 17644 subtests.