Support DynamoDB JSON for import/export table #179

StoneDot · 2023-09-30T01:30:20Z

Currently, we do not support DynamoDB JSON type for importing and exporting, as mentioned in #66. This prevents us from copying the table data as it is. We should provide an import/export option to use DynamoDB JSON format for such a use case.

mlafeldt · 2023-12-01T10:44:06Z

@StoneDot I started implementing this here: mlafeldt/dynein@main...raw-export

The first question I encountered was whether to use JSONL or not.

Personally, I'm a fan of the DynamoDB S3 export format, which uses JSONL but also has a somewhat annoying top-level Item field. Item attributes are also sorted, which I still think is the right thing to do (see #85), especially with dump/export files that could be hashed/compared.

Anyway, let's focus on JSONL vs JSON for now. Given a big number of items, only JSONL makes sense IMHO, but then again, dynein isn't really meant to be a proper backup tool for huge tables. Let me know what you think.

StoneDot · 2023-12-06T05:49:35Z

@mlafeldt Thank you for reaching out before implementing the feature.

First, as you mentioned, dynein's export/import function is not intended for huge tables. Dynein's export is especially useful for small tables, which may experience a large overhead on native import/export. Additionally, the output data is not intended to be read by humans. Generated files are mainly consumed by programs, either local programs or distributed systems. This implies the following points:

We do not need to consider the educational perspective.
We prefer stable and reproducible output.
The output format should be easy to use with downstream services.

Thus, I think the export function should store the data in stable order for attributes that are the same as you. Also, I do not want to use the top-level Item field because its schema is not good for distributed systems.

Based on the above considerations, I believe

{"pk":{"S":"pk1"},"sk":{"S":"sk1"},"title":{"S":"item1"},"like":{"N":"1"}}
{"pk":{"S":"pk1"},"sk":{"S":"sk2"},"title":{"S":"item2"},"like":{"N":"3"}}

or,

{"like":{"N":"1"},"pk":{"S":"pk1"},"sk":{"S":"sk1"},"title":{"S":"item1"}}
{"like":{"N":"3"},"pk":{"S":"pk1"},"sk":{"S":"sk2"},"title":{"S":"item2"}}

are the preferable format. I don't know whether people prefer the primary key to appear before other attributes.

StoneDot · 2024-04-09T16:25:40Z

I apologize for not providing a clear answer initially. Based on my previous response, I believe you were able to understand the gist of my reply. However, I would like to clarify that the JSONL format is the preferred choice in this case. The top-level 'Item' field is unnecessary and not suitable for the BigData framework, as it utilizes line breaks to separate individual records.

StoneDot mentioned this issue Sep 30, 2023

Empty string set failing on export/import #66

Closed

StoneDot added the enhancement New feature or request label Sep 30, 2023

StoneDot added this to the v0.4.0 milestone Sep 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DynamoDB JSON for import/export table #179

Support DynamoDB JSON for import/export table #179

StoneDot commented Sep 30, 2023

mlafeldt commented Dec 1, 2023

StoneDot commented Dec 6, 2023

StoneDot commented Apr 9, 2024

Support DynamoDB JSON for import/export table #179

Support DynamoDB JSON for import/export table #179

Comments

StoneDot commented Sep 30, 2023

mlafeldt commented Dec 1, 2023

StoneDot commented Dec 6, 2023

StoneDot commented Apr 9, 2024