Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide better visibility into migrations #180

Open
pdbossman opened this issue Jul 19, 2024 · 1 comment
Open

Provide better visibility into migrations #180

pdbossman opened this issue Jul 19, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@pdbossman
Copy link
Contributor

It would be useful to have the following progress information during migrations:

  1. Number for items (dynamodb/alternator) read
  2. Number of items inserted in target (dynamodb/alternator)
  3. If we implement discarding expired, output total items discarded
@julienrf julienrf added the enhancement New feature or request label Jul 19, 2024
@julienrf julienrf self-assigned this Jul 23, 2024
@julienrf
Copy link
Collaborator

AFAICT, there is no direct way to access such information.

There is a class AbstractReadManager which tracks the number of read items, but it is used internally only by emr-dynamodb-connector to adjust the read throughput, and this information is not publicly exposed. We would need to investigate further whether this class effectively tracks the total number of read items (and not just the amount of read items per partition). If that’s the case, a basic way to expose that information would be through logs. Otherwise, we would have to use a Spark accumulator to keep track of all the read items across all the partitions.

@julienrf julienrf removed their assignment Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants