Aquileo | fix: Remove catalog access from SparkSQLWriter by linliu-code · Pull Request #14083 · apache/hudi

linliu-code · 2025-10-13T17:57:00Z

Describe the issue this Pull Request addresses

HoodieSparkSqlWriter would access enabled catalog within Spark Datasource operations during schema resolution, when it can not get the schema from the commit metadata, table config or data files.

This behavior may cause some confusion since Spark Datasource operation may accidentally access the catalog and get the schema from a table with the same name, which may be an irrelevant table.

Summary and Changelog

We remove the catalog access from the writer, and pass the schema from SQL command. Therefore, for Spark Datasource operations, no catalog access could happen.

Impact

Removal of some unexpected behavior of Spark writer.

Risk Level

Medium.

Documentation Update

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

nsivabalan · 2025-10-13T19:04:53Z

can you link the offending commit @linliu-code

nsivabalan · 2025-10-14T15:59:28Z

hey @linliu-code : can you follow up on test failures on this

nsivabalan · 2025-10-15T03:48:36Z

hey @linliu-code : did you get to triage the test failures?

nsivabalan · 2025-10-15T14:31:45Z

hey @linliu-code : any leads on test failures.

hudi-bot · 2025-10-16T11:51:30Z

CI report:

d9aa891 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan · 2025-10-16T14:59:28Z

I am not convinced yet that we should contact catalog even for spark-sql writes.
can you get to the bottom of changes done in https://github.com/apache/hudi/pull/6358/files#r2425086622
and understand prior to that patch, what we were doing.
and why we had to poll catalog. for eg, prior to alexey's patch, was INSERT_INTO polling catalog for the schema.

if not, why can't we remove it completely.

in other words, within deduceWriterSchema, why can't we handle empty tables. i..e no schema for latestTableSchema.

nsivabalan

LGTM

github-actions Bot added the size:S PR with lines of changes in (10, 100] label Oct 13, 2025

linliu-code force-pushed the fix_catalog_access_spark_datasource branch from 56cab12 to 7aafc3e Compare October 14, 2025 20:48

linliu-code marked this pull request as ready for review October 14, 2025 20:48

linliu-code commented Oct 14, 2025

View reviewed changes

Comment thread .../src/test/scala/org/apache/spark/sql/hudi/analysis/TestHoodiePruneFileSourcePartitions.scala Outdated

linliu-code force-pushed the fix_catalog_access_spark_datasource branch from 3c45a03 to ebdc1a3 Compare October 15, 2025 17:18

linliu-code added 3 commits October 15, 2025 18:37

Remove catalog access from SparkSQLWriter

1ffcef2

Fix tests

30bf94d

Remove some changes

c21b437

linliu-code force-pushed the fix_catalog_access_spark_datasource branch from ebdc1a3 to c21b437 Compare October 16, 2025 01:37

Fix the test from spark4.0 module

d9aa891

nsivabalan reviewed Oct 17, 2025

View reviewed changes

nsivabalan approved these changes Oct 17, 2025

View reviewed changes

nsivabalan merged commit 8491d6a into apache:master Oct 17, 2025
70 checks passed

yihua pushed a commit that referenced this pull request Oct 19, 2025

fix: Remove catalog access from SparkSQLWriter (#14083)

3e8b566

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove catalog access from SparkSQLWriter#14083

fix: Remove catalog access from SparkSQLWriter#14083
nsivabalan merged 4 commits into
apache:masterfrom
linliu-code:fix_catalog_access_spark_datasource

linliu-code commented Oct 13, 2025 •
edited

Loading

Uh oh!

nsivabalan commented Oct 13, 2025

Uh oh!

nsivabalan commented Oct 14, 2025

Uh oh!

Uh oh!

nsivabalan commented Oct 15, 2025

Uh oh!

nsivabalan commented Oct 15, 2025

Uh oh!

hudi-bot commented Oct 16, 2025

Uh oh!

nsivabalan commented Oct 16, 2025

Uh oh!

nsivabalan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

linliu-code commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

nsivabalan commented Oct 13, 2025

Uh oh!

nsivabalan commented Oct 14, 2025

Uh oh!

Uh oh!

nsivabalan commented Oct 15, 2025

Uh oh!

nsivabalan commented Oct 15, 2025

Uh oh!

hudi-bot commented Oct 16, 2025

CI report:

Uh oh!

nsivabalan commented Oct 16, 2025

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linliu-code commented Oct 13, 2025 •
edited

Loading