Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Oct 22, 2025

What changes were proposed in this pull request?

This PR proposes to add more flexible getAllWithPrefix for SparkConf.
We need to set some config related to S3 for our inner Spark.
The requirements are replacing the prefix spark.fs.s3a with new prefix spark.hadoop.fs.s3a The implementation of the function show below.

    val S3A_PREFIX = "spark.fs.s3a"
    val SPARK_HADOOP_S3A_PREFIX = "spark.hadoop.fs.s3a"
    val s3aConf = conf.getAllWithPrefix(S3A_PREFIX)
    val newConf = s3aConf
      .map(
        confPair => {
          val keyWithoutPrefix = confPair._1
          val oldKey = S3A_PREFIX + keyWithoutPrefix
          val newKey = SPARK_HADOOP_S3A_PREFIX + keyWithoutPrefix
          val value = confPair._2
          ((newKey, oldKey), value)
        })

Because getAllWithPrefix truncated the prefix, developers must concat the suffix and prefix to restore the original key.
The new getAllWithPrefix increases the flexibility developers could customize the function as they want.
After this change, the code show above could be improved as follows.

    val S3A_PREFIX = "spark.fs.s3a"
    val SPARK_HADOOP_S3A_PREFIX = "spark.hadoop.fs.s3a"
    val f = (k: String) => {
      val keyWithoutPrefix = k.substring(S3A_PREFIX.length)
      val newKey = SPARK_HADOOP_S3A_PREFIX + keyWithoutPrefix
      (newKey, k)
    }
    val newConf = conf.getAllWithPrefix(S3A_PREFIX, f)

Why are the changes needed?

The new API getAllWithPrefix could improve the flexibility for SparkConf.

Does this PR introduce any user-facing change?

'No'.
New API.

How was this patch tested?

GA tests.

Was this patch authored or co-authored using generative AI tooling?

'No'.

@github-actions github-actions bot added the CORE label Oct 22, 2025
@beliefer
Copy link
Contributor Author

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53980][CORE] Add more flexible getAllWithPrefix for SparkConf [SPARK-53980][CORE] AddSparkConf.getAllWithPrefix(String, String => K) API Oct 24, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you so much, @beliefer .

Merged to master for Apache Spark 4.1.0-preview3.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53980][CORE] AddSparkConf.getAllWithPrefix(String, String => K) API [SPARK-53980][CORE] Add SparkConf.getAllWithPrefix(String, String => K) API Oct 24, 2025
@ueshin
Copy link
Member

ueshin commented Oct 24, 2025

Do we need to add this to Python as well? cc @HyukjinKwon
Update: actually Python doesn't have getAllWithPrefix even the one without taking a function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants