Skip to content

Releases: IBM/spark-s3-shuffle

Maintenance release: Create builds for Spark 3.5.x

02 Apr 11:08
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.9.5...v0.9.6

Move files using NIO if the shuffle dir is mounted as a local file system.

22 Sep 05:33
Compare
Choose a tag to compare

What's Changed

  • Add block sizes to benchmarks and use an array-backed buffer for uploading to S3. by @pspoerri in #80
  • Move files using NIO if the shuffle dir is mounted as a local file system. by @pspoerri in #81

Full Changelog: v0.9.4...v0.9.5

Fix timeout issue on S3A filesystem.

14 Sep 10:50
Compare
Choose a tag to compare

What's Changed

  • Fix S3A timeout issue and avoid memory leak. by @pspoerri in #75
  • Increase default maxConcurrencyTask to 10. by @pspoerri in #76
  • Improve documentation and add a tuning guide and disable fallback fetch in the benchmarks. by @pspoerri in #77
  • Bump the default version to 0.9.4 by @pspoerri in #78
  • Revert: Modify the multipart upload size. by @pspoerri in #79

Full Changelog: v0.9.3...v0.9.4

Automatically adapt shuffle fetch concurrency based on I/O wait time.

13 Sep 12:46
Compare
Choose a tag to compare

What's Changed

  • Increase example block size to 128 MiB. by @pspoerri in #67
  • Dynamically adapt the number of threads based on the I/O wait time. by @pspoerri in #68
  • Measure I/O statistics when writing shuffle data. by @pspoerri in #69
  • Enable prometheus integration for non-nfs examples. by @pspoerri in #71
  • Improve shuffle storage path for efficient lookup and delete. by @pspoerri in #70
  • Enable Spark fetching mechanism (optional) and improve reading/writing of index and checksum files. by @pspoerri in #72
  • Bump Version in the Dockerfiles and Config to 0.9.3. by @pspoerri in #73
  • Log Stage and Task ID to understand I/O bottlenecks. by @pspoerri in #74

Full Changelog: v0.9.2...v0.9.3

v0.9.2

07 Sep 07:55
Compare
Choose a tag to compare

What's Changed

  • Optimize buffers when accessing index and checksum files. by @pspoerri in #66
  • Prefetch using multiple threads. by @pspoerri in #66

Full Changelog: v0.9.1...v0.9.2

Remove unused configuration options.

04 Sep 12:51
Compare
Choose a tag to compare

What's Changed

  • Remove unused configuration options. Fix NFS example config. Create test harness. by @pspoerri in #65

Full Changelog: v0.9...v0.9.1

Buffer streams in parallel.

31 Aug 14:10
Compare
Choose a tag to compare

What's Changed

  • WIP: Fix performance regression and rename maxBufferSize to bufferInputSize. by @pspoerri in #54
  • Enable local testing. by @pspoerri in #56
  • Buffer streams in parallel up to a threshold. by @pspoerri in #57
  • Configure readahead if the filesystem supports it. by @pspoerri in #62
  • Enable tests if scala version 2.12. by @pspoerri in #63

Full Changelog: v0.8.2...v0.9

v0.9-spark3.1

31 Aug 14:30
Compare
Choose a tag to compare

0.9 release for Spark 3.1. The features

  • f5fd647 (Use the FutureDataInputStreamBuilder to open files).
  • 2ec4122 (Configure readahead).

were omitted.

What's Changed

Full Changelog: v0.8-spark3.1...v0.9-spark3.1

Allow configuration of buffer sizes to optimize I/O on distributed file systems.

17 Aug 07:05
Compare
Choose a tag to compare

What's Changed

  • Use SBT Build Info plugin to populate version number. by @pspoerri in #48
  • Integrate a JVM-Profiler. by @pspoerri in #49
  • Allow configuration of buffer sizes to optimize I/O on distributed file systems. by @pspoerri in #50
  • Fix Scala 2.13 builds in SBT. by @pspoerri in #52

Full Changelog: v0.8.1...v0.8.2

Fix MetadataFetchFailedException when pods are crashing.

28 Jul 14:46
Compare
Choose a tag to compare

What's Changed

  • Register blocks in FallbackStorage by default to avoid the MetadataFetchFailedException. by @pspoerri in #45
  • Disable caching when listing shuffle indices. by @pspoerri in #44
  • Enable/disable caching of partition lengths with a configuration variable by @pspoerri in #46

Full Changelog: v0.8...v0.8.1