Skip to content

[Bug] [Connector-V2] connector-maxcompute: The source reader may cause data duplication #8379

@liangcw1111

Description

@liangcw1111

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When maxcompute source split enumerator assign pending splits, the assignSplitOperation is sent to task group worker and source reader execute pollNext(Collector output) completed, if split enumerator signalNoMoreSplits Operation
is not arrived, the pollNext(Collector output) may execute again. This leads to the set of splits read more than once. It is easy to hanpened when the cluster's system load is high.

SeaTunnel Version

2.3.7

SeaTunnel Config

seatunnel:
  engine:
    classloader-cache-mode: true
    history-job-expire-minutes: 1440
    backup-count: 1
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    queue-type: blockingqueue
    slot-service:
      dynamic-slot: false
      slot-num: 20
    checkpoint:
      interval: 30000
      timeout: 2147483647
      max-concurrent: 5
      tolerable-failure: 2
      storage:
        type: oss

Running Command

sh /alidata1/za-seatunnel/seatunnel-2.3.7/bin/seatunnel-cluster.sh -d -r master/worker

Error Exception

There is no exception, but the data of one or more splits is reading repeated.

Zeta or Flink or Spark Version

zeta

Java or Scala Version

java 1.8

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions