fix: Remove catalog access from SparkSQLWriter#14083
Merged
nsivabalan merged 4 commits intoOct 17, 2025
Merged
Conversation
Contributor
|
can you link the offending commit @linliu-code |
Contributor
|
hey @linliu-code : can you follow up on test failures on this |
56cab12 to
7aafc3e
Compare
linliu-code
commented
Oct 14, 2025
Contributor
|
hey @linliu-code : did you get to triage the test failures? |
Contributor
|
hey @linliu-code : any leads on test failures. |
3c45a03 to
ebdc1a3
Compare
ebdc1a3 to
c21b437
Compare
Collaborator
Contributor
|
I am not convinced yet that we should contact catalog even for spark-sql writes. if not, why can't we remove it completely. in other words, within deduceWriterSchema, why can't we handle empty tables. i..e no schema for |
nsivabalan
approved these changes
Oct 17, 2025
yihua
pushed a commit
that referenced
this pull request
Oct 19, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
#14081
HoodieSparkSqlWriterwould access enabled catalog within Spark Datasource operations during schema resolution, when it can not get the schema from the commit metadata, table config or data files.This behavior may cause some confusion since Spark Datasource operation may accidentally access the catalog and get the schema from a table with the same name, which may be an irrelevant table.
Summary and Changelog
We remove the catalog access from the writer, and pass the schema from SQL command. Therefore, for Spark Datasource operations, no catalog access could happen.
Impact
Removal of some unexpected behavior of Spark writer.
Risk Level
Medium.
Documentation Update
Contributor's checklist