Add support for Equality Deletes on DeleteFileIndex#3285
Conversation
|
@rambleraptor I think we should add a regression test for schema evolution here. This pruning path assumes the current table type for an equality field is the same type that was used when the data file and equality delete were written, which is not always true after a legal promotion like For reference, Iceberg Java had to address the same schema-evolution issue in apache/iceberg#15268, where the fix was to avoid assuming the current schema is always the right one for equality-delete field resolution. |
|
@geruh @kevinjqliu @Fokko please take a look when you can! |
|
I've successfully tried this out with Flink (thanks @Fokko for the tip!) and it's working as I expect it to. Is it worth checking in the files created by Flink? |
geruh
left a comment
There was a problem hiding this comment.
Nice, thanks for opening @rambleraptor!!! Left some comments below.
Also, +1 to add to the flink testing and I believe there were talks about this being added to the TCK! While working on #2255, we tested all delete file combinations with flink.
78800f7 to
3a7413a
Compare
|
@geruh thanks for the review! Could not agree more on the Flink testing! I'll leave that for a follow-up PR if that's alright, since we haven't stood up Flink testing yet. I don't want to pollute this PR too much |
|
@geruh thanks so much for your review! ptal |
7c3dac2 to
7ed36b5
Compare
Part of #3270
Rationale for this change
This adds support for getting equality deletes in the DeleteFileIndex.
I'm very purposefully ignoring them in
_read_all_delete_filesbecause they will crash.Are these changes tested?
I made some equality deletes by-hand and had PyIceberg read them to see the indexes. Worked as expected. If you know a way to create equality deletes, I can test those as well.
Are there any user-facing changes?