Skip to content

[Bug] There are code risks in tag expiration, which can cause data corruption #6479

@yungkei

Description

@yungkei

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

I found this mistake in version 1.9.0, and it still exists in the master branch.

Compute Engine

flink version1.16, spark version 3.3.1

Minimal reproduce step

If the baseManifestList or deltaManifestList associated with the tag are deleted in advance, the datafile will be deleted mistakenly during tag cleaning, which can cause data corruption, especially since the datafile is associated with the earliests snapshot.

step1: delete baseManifestList or deltaManifestList associated with the tag, The premise is that the tag expiration time is greater than the snapshot expiration time
step2: execute expired tag program
step3: query the current snapshot or the earliest snapshot data, we will find a FileNotFoundException about the orc file

What doesn't meet your expectations?

This issue will result in datafile loss, and cause paimon unavailable.

Anything else?

When a tag expires, the left neighbor tag and the nearest right neighbor tag will be collected in skipping sets to prevent the datafile from being mistakenly deleted. if baseManifestList of the nearest right neighbor tag does not exist, the relevant datafiles will be accidentally deleted. So, I suggest the skipping set can collect both the left neighbor tag and the nearest right neighbor tag, along with the earliest snapshot.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions