Skip to content

[CLAUDE] [OPUS 4.7] feat(hive): register NAMED_STRUCT in parser so struct fields annotate #7561

Open
RichardHughes-amp wants to merge 4 commits intotobymao:mainfrom
RichardHughes-amp:named-struct-parser
Open

[CLAUDE] [OPUS 4.7] feat(hive): register NAMED_STRUCT in parser so struct fields annotate #7561
RichardHughes-amp wants to merge 4 commits intotobymao:mainfrom
RichardHughes-amp:named-struct-parser

Conversation

@RichardHughes-amp
Copy link
Copy Markdown

@RichardHughes-amp RichardHughes-amp commented Apr 24, 2026

I'm relatively certain this is an actual fix; we monkey-patched this into our SQLGlot implementation and it resolved issues locally. No changes to the generator to keep the impact on the tests minimal and ensure it won't give any surprises to any other users of SQLGlot, but it's worthwhile to consider generating NAMED_STRUCT's rather than STRUCTs when the underlying elements are named.

Everything above is from Richard.


Everything below is from Claude.

Problem

named_struct(name1, val1, ...) is a Hive built-in (LanguageManual UDF, since Hive 0.8.0 / HIVE-1360) and the standard Spark/Databricks way to build a named struct. It isn't in HiveParser.FUNCTIONS, so it falls through to exp.Anonymous. _annotate_struct only runs on exp.Struct, so the call stays UNKNOWN and any downstream Dot(struct, field) inherits that — even when every input type is known and the equivalent STRUCT(v AS k, ...) alias form annotates correctly today.

Repro (schema: src(id INT, n VARCHAR)):

```sql
WITH t AS (SELECT id, named_struct('n', n) AS s FROM src)
SELECT s.n FROM t -- before: UNKNOWN; after: VARCHAR
```

Fix

HiveParser.FUNCTIONS[\"NAMED_STRUCT\"] — map named_struct('k', v, ...) to exp.Struct with PropertyEQ children, the same shape STRUCT(v AS k, ...) already produces. _annotate_struct then works unmodified.

Parser-only: no generator changes, no changes to emitted SQL for any existing input.

Tests

  • tests/dialects/test_spark.pytest_named_struct: confirms Spark/Databricks generate the canonical aliased form from the new AST.
  • tests/fixtures/optimizer/annotate_types.sql — three fixtures covering flat, multi-field, and nested named_struct.

Full suite: 1072 passed, 17644 subtests.

named_struct(name1, val1, ...) is a Hive built-in (since 0.8.0, HIVE-1360)
and the standard Spark/Databricks way to construct a struct with named
fields. It wasn't registered in HiveParser.FUNCTIONS, so it fell through
to exp.Anonymous and _annotate_struct never ran — leaving downstream
Dot(struct, field) access annotated as UNKNOWN even when every input
type was known.

Map named_struct(...) to exp.Struct with PropertyEQ children so it
annotates identically to the equivalent STRUCT(v AS k, ...) alias form.

Also fix Hive.struct_sql to emit NAMED_STRUCT('k', v, ...) when all
children are named. The prior branch warned 'Hive does not support
named structs' and dropped field names — that message conflated Hive
(which has the function) with the STRUCT(v AS k) alias syntax (which
is Spark-only). Three BigQuery→Hive test expectations are updated to
reflect the preserved names.
@RichardHughes-amp RichardHughes-amp marked this pull request as draft April 24, 2026 18:37
@RichardHughes-amp RichardHughes-amp changed the title feat(hive): register NAMED_STRUCT in parser so struct fields annotate feat(hive): register NAMED_STRUCT in parser so struct fields annotate [CLAUDE] Apr 24, 2026
@RichardHughes-amp RichardHughes-amp changed the title feat(hive): register NAMED_STRUCT in parser so struct fields annotate [CLAUDE] [CLAUDE] [OPUS 4.7] feat(hive): register NAMED_STRUCT in parser so struct fields annotate Apr 24, 2026
@RichardHughes-amp RichardHughes-amp marked this pull request as ready for review April 24, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant