Skip to content

perf: reduce unnecessary row group metadata loading#14208

Merged
danny0405 merged 2 commits into
apache:masterfrom
TheR1sing3un:perf_reduce_rg_load
Nov 7, 2025
Merged

perf: reduce unnecessary row group metadata loading#14208
danny0405 merged 2 commits into
apache:masterfrom
TheR1sing3un:perf_reduce_rg_load

Conversation

@TheR1sing3un

Copy link
Copy Markdown
Member

Describe the issue this Pull Request addresses

closes #14207

Summary and Changelog

  1. reduce unnecessary row group metadata loading

Impact

enhance parquet metadata related performance

Risk Level

none

Documentation Update

none

  • The config description must be updated if new configs are added or the default value of the configs are changed.
  • Any new feature or user-facing change requires updating the Hudi website. Please follow the
    instruction to make changes to the website. -->

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:S PR with lines of changes in (10, 100] label Nov 5, 2025
Comment thread hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java Outdated
1. reduce unnecessary row group metadata loading

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…dFileMetadataOnly`

1. rename `ParquetUtils#readMetadataWithSkipRowGroups` to `readFileMetadataOnly`

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

@danny0405 danny0405 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, nice improvement, currently the footer reading is always there to fetch the file schema, even if it is only the footer parsing cost improvement, still meaningful.

@hudi-bot

hudi-bot commented Nov 6, 2025

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands@hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@TheR1sing3un

Copy link
Copy Markdown
Member Author

@danny0405 @yihua Hi, Danny, Ethan, are there any other suggestions? If not, let's land it first

@danny0405

Copy link
Copy Markdown
Contributor

This is a known flaky test:

TestFiltersInFileGroupReader.testFiltersInFileFormat:70->runComparison:80->TestBootstrapReadBase.compareDf:222 expected: <0> but was: <1>

@danny0405 danny0405 merged commit 355f19c into apache:master Nov 7, 2025
67 of 70 checks passed
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Dec 6, 2025
* perf: reduce unnecessary row group metadata loading

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce unnecessary row group metadata loading

4 participants