-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DynamoDB JSON for import/export table #179
Comments
@StoneDot I started implementing this here: mlafeldt/dynein@main...raw-export The first question I encountered was whether to use JSONL or not. Personally, I'm a fan of the DynamoDB S3 export format, which uses JSONL but also has a somewhat annoying top-level Anyway, let's focus on JSONL vs JSON for now. Given a big number of items, only JSONL makes sense IMHO, but then again, dynein isn't really meant to be a proper backup tool for huge tables. Let me know what you think. |
@mlafeldt Thank you for reaching out before implementing the feature. First, as you mentioned, dynein's export/import function is not intended for huge tables. Dynein's export is especially useful for small tables, which may experience a large overhead on native import/export. Additionally, the output data is not intended to be read by humans. Generated files are mainly consumed by programs, either local programs or distributed systems. This implies the following points:
Thus, I think the export function should store the data in stable order for attributes that are the same as you. Also, I do not want to use the top-level Based on the above considerations, I believe {"pk":{"S":"pk1"},"sk":{"S":"sk1"},"title":{"S":"item1"},"like":{"N":"1"}}
{"pk":{"S":"pk1"},"sk":{"S":"sk2"},"title":{"S":"item2"},"like":{"N":"3"}} or, {"like":{"N":"1"},"pk":{"S":"pk1"},"sk":{"S":"sk1"},"title":{"S":"item1"}}
{"like":{"N":"3"},"pk":{"S":"pk1"},"sk":{"S":"sk2"},"title":{"S":"item2"}} are the preferable format. I don't know whether people prefer the primary key to appear before other attributes. |
I apologize for not providing a clear answer initially. Based on my previous response, I believe you were able to understand the gist of my reply. However, I would like to clarify that the JSONL format is the preferred choice in this case. The top-level 'Item' field is unnecessary and not suitable for the BigData framework, as it utilizes line breaks to separate individual records. |
Currently, we do not support DynamoDB JSON type for importing and exporting, as mentioned in #66. This prevents us from copying the table data as it is. We should provide an import/export option to use DynamoDB JSON format for such a use case.
The text was updated successfully, but these errors were encountered: