Skip to content

Add configuration to disable reprocess files cleanup by v1Cleaner #1041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nirutgupta
Copy link

@nirutgupta nirutgupta commented Jan 13, 2025

Overview

Issue: Missing records in Snowflake when streaming data using the Snowflake Sink Connector. Customers who have set up multiple topics to map to a single table might see data loss.

Cause: Cleaning up reprocessFiles may remove files that haven’t been fully loaded yet, leading to data loss.

Solution: To avoid this, enable the snowflake.snowpipe.v1Cleaner.disable.reprocessFiles.cleanup flag. This will stop the cleanup of reprocessFiles and help prevent any data loss during the process.

Pre-review checklist

  • This change should be part of a Behavior Change Release. See go/behavior-change.
  • This change has passed Merge gate tests
  • Snowpipe Changes
  • Snowpipe Streaming Changes
  • This change is TEST-ONLY
  • This change is README/Javadocs only
  • This change is protected by a config parameter <PARAMETER_NAME> eg snowflake.ingestion.method.
    • Yes - Added end to end and Unit Tests.
    • No - Suggest why it is not param protected
  • Is his change protected by parameter <PARAMETER_NAME> on the server side?
    • The parameter/feature is not yet active in production (partial rollout or PrPr, see Changes for Unreleased Features and Fixes).
    • If there is an issue, it can be safely mitigated by turning the parameter off. This is also verified by a test (See go/ppp).

@nirutgupta nirutgupta marked this pull request as ready for review January 15, 2025 06:42
@nirutgupta nirutgupta requested a review from a team as a code owner January 15, 2025 06:42
@sfc-gh-mbobowski
Copy link
Contributor

Hi @nirutgupta, thanks for your contribution!

The bug causing cleaner to delete files from other topics was fixed (link to PR) last week and will be released in 3.0.1. From now on the filename is prefixed with topic name when topicsToTableMap is used so we are sure that no more data loss will happen.

Having that in mind there is no need to put additional toggle flag for the cleaner itself.

@sfc-gh-mbobowski
Copy link
Contributor

Merged with some changes in #1045

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants