Sure, smaller value for fs.s3a.readahead.range make sense. But you have to modify the Spark source code to switch to small value for any metadata read, and then switch back to big value for data block read, right?

Or you create dedicated metadata reader connection pool to separate from the data block access connections?

I assume the former approach. Since this issue is generally applicable to both ORC and Parquet, I am still wondering if there should have been a PR for both S3 and Azure Storage already to do exactly what you guys are doing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store