Skip to content

[FEA][AUDIT][SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters #14085

@abellina

Description

@abellina

This was added to spark 4.1, and it seems to me like a bug fix really but it's not being backported and is off by default:
apache/spark@9ab693c01ed

The issue looks to be in non-deterministic stages where a task failure could result in a recompute producing the wrong values. The idea is to provide a checksum per partition that can be used to figure out if the partition is correct. In case of recomputation, if the partition's checksum had been known before to the driver (via MapStatus) and it doesn't match the new checksum, the stage will be marked as failed.

What would we need to do here? We would need to provide the checksum value as part of MapStatus in our shuffle manager, if we want to take part of this feature. Note that it is off by default.

Alternatively, we could disable the shuffle manager if this feature is turned on, as we don't support it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions