Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DynamoDB JSON for import/export table #179

Open
StoneDot opened this issue Sep 30, 2023 · 3 comments
Open

Support DynamoDB JSON for import/export table #179

StoneDot opened this issue Sep 30, 2023 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@StoneDot
Copy link
Contributor

Currently, we do not support DynamoDB JSON type for importing and exporting, as mentioned in #66. This prevents us from copying the table data as it is. We should provide an import/export option to use DynamoDB JSON format for such a use case.

@StoneDot StoneDot added the enhancement New feature or request label Sep 30, 2023
@StoneDot StoneDot added this to the v0.4.0 milestone Sep 30, 2023
@mlafeldt
Copy link
Contributor

mlafeldt commented Dec 1, 2023

@StoneDot I started implementing this here: mlafeldt/dynein@main...raw-export

The first question I encountered was whether to use JSONL or not.

Personally, I'm a fan of the DynamoDB S3 export format, which uses JSONL but also has a somewhat annoying top-level Item field. Item attributes are also sorted, which I still think is the right thing to do (see #85), especially with dump/export files that could be hashed/compared.

Anyway, let's focus on JSONL vs JSON for now. Given a big number of items, only JSONL makes sense IMHO, but then again, dynein isn't really meant to be a proper backup tool for huge tables. Let me know what you think.

@StoneDot
Copy link
Contributor Author

StoneDot commented Dec 6, 2023

@mlafeldt Thank you for reaching out before implementing the feature.

First, as you mentioned, dynein's export/import function is not intended for huge tables. Dynein's export is especially useful for small tables, which may experience a large overhead on native import/export. Additionally, the output data is not intended to be read by humans. Generated files are mainly consumed by programs, either local programs or distributed systems. This implies the following points:

  • We do not need to consider the educational perspective.
  • We prefer stable and reproducible output.
  • The output format should be easy to use with downstream services.

Thus, I think the export function should store the data in stable order for attributes that are the same as you. Also, I do not want to use the top-level Item field because its schema is not good for distributed systems.

Based on the above considerations, I believe

{"pk":{"S":"pk1"},"sk":{"S":"sk1"},"title":{"S":"item1"},"like":{"N":"1"}}
{"pk":{"S":"pk1"},"sk":{"S":"sk2"},"title":{"S":"item2"},"like":{"N":"3"}}

or,

{"like":{"N":"1"},"pk":{"S":"pk1"},"sk":{"S":"sk1"},"title":{"S":"item1"}}
{"like":{"N":"3"},"pk":{"S":"pk1"},"sk":{"S":"sk2"},"title":{"S":"item2"}}

are the preferable format. I don't know whether people prefer the primary key to appear before other attributes.

@StoneDot
Copy link
Contributor Author

StoneDot commented Apr 9, 2024

I apologize for not providing a clear answer initially. Based on my previous response, I believe you were able to understand the gist of my reply. However, I would like to clarify that the JSONL format is the preferred choice in this case. The top-level 'Item' field is unnecessary and not suitable for the BigData framework, as it utilizes line breaks to separate individual records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants