Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: set bootstrap data locally instead of downloading #153

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

wafuwafu13
Copy link
Contributor

Issue #, if available:

#149, #143

Description of changes:

#149 (comment)

Downloading json files each time is redundant and may cause errors in the unzip process.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@wafuwafu13
Copy link
Contributor Author

wafuwafu13 commented Aug 17, 2023

discuss with @StoneDot

dy should work in single binary so using like https://doc.rust-lang.org/std/macro.include_str.html or https://doc.rust-lang.org/std/macro.include_bytes.html is better.

cargo install --locked --path . | ./target/release/dy bootstrap is success but the following error occurs.

~/desktop/dynein
$ cp ./target/release/dy ~/desktop/

~/desktop
$ ./dy bootstrap                   
Bootstrapping - dynein will creates 4 sample tables defined here:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AppendixSampleTables.html

'ProductCatalog' - simple primary key table
    Id (N)

'Forum' - simple primary key table
    Name (S)

'Thread' - composite primary key table
    ForumName (S)
    Subject (S)

'Reply' - composite primary key table, with GSI named 'PostedBy-Message-Index'
    Id (S)
    ReplyDateTime (S)

[skip] Table 'ProductCatalog' already exists, skipping to create new one.
[skip] Table 'Forum' already exists, skipping to create new one.
[skip] Table 'Thread' already exists, skipping to create new one.
[skip] Table 'Reply' already exists, skipping to create new one.
Still CREATING following tables: []
All tables are in ACTIVE.
Tables are ready and retrieved sample data locally. Now start writing data into samle tables...
Error: LoadData(Os { code: 2, kind: NotFound, message: "No such file or directory" })

@wafuwafu13 wafuwafu13 marked this pull request as draft August 17, 2023 21:42
@wafuwafu13 wafuwafu13 marked this pull request as ready for review August 18, 2023 16:34
@StoneDot StoneDot self-requested a review August 24, 2023 09:39
Copy link
Contributor

@StoneDot StoneDot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating pull request. I think this PR great because it opens a path to use bootstrap command in an isolated network.

I think the code quality is good. However, I have a concern regarding the size of binary, which is why I share the result of my testing in the bottom of this review.

It seems that binary size is almost doubled. It is acceptable level, however it would be nice if we can shrink this. Could you try to put compressed files on resources folder and uncompress these content at runtime? I think LZMA/LZMA2 is a good option for compressions because it is well-known by its compression ratio and there is pure rust library, lzma_rs. Also, we should measure the effectiveness. An article I found indicates that brotli may have good characteristics.

Additionally, could you change the type of pull request from cleanup to refactor?

❯ git show --no-patch
commit 96fb355bf8a08555215b54970c864344b24ffdab (HEAD -> main, upstream/main)
Author: Hirotaka Tagawa <[email protected]>
Date:   Thu Aug 17 22:08:18 2023 +0100

    fix: prevent infinite loop when DynamoDbClient connection fails

❯ cargo build -j 8 --release
**snip**

❯ ls -lh ./target/release/dy
Permissions Size User   Date Modified Name
.rwxr-xr-x   12M hiroag 24 8 18:42    ./target/release/dy

❯ gh pr checkout 153
**snip**

❯ git show --no-patch
commit 2e706dc36f531bfb67f3a2e36e63fb88c006062e (HEAD -> bootstrap-local)
Author: Hirotaka Tagawa <[email protected]>
Date:   Fri Aug 18 17:31:23 2023 +0100

    fix: use include_str macro

❯ cargo build -j 8 --release
**snip**

❯ ls -lh ./target/release/dy
Permissions Size User   Date Modified Name
.rwxr-xr-x   21M hiroag 24 8 18:45    ./target/release/dy

@StoneDot
Copy link
Contributor

StoneDot commented Aug 25, 2023

It seems that bzip2 achieves best compression ratio. brotli and lzma also achieve good compression rate. Based on the maturity of the library and the compression rate, I think rust-brotli is a best choice. Please feel free to comment your opinion.

❯ lzma -k -e moviedata.json
❯ bzip2 -9 -k moviedata.json
❯ brotli -k moviedata.json
❯ gzip -9 -k moviedata.json
❯ zstd -19 -k moviedata.json
❯ lz4 -9 -k moviedata.json
❯ ls -l | grep moviedata
.rw-r--r-- 3.7M hiroag 25 8 10:43 moviedata.json
.rw-r--r-- 480k hiroag 25 8 10:43 moviedata.json.br
.rw-r--r-- 423k hiroag 25 8 10:43 moviedata.json.bz2
.rw-r--r-- 731k hiroag 25 8 10:43 moviedata.json.gz
.rw-r--r-- 823k hiroag 25 8 10:43 moviedata.json.lz4
.rw-r--r-- 486k hiroag 25 8 10:43 moviedata.json.lzma
.rw-r--r-- 504k hiroag 25 8 10:43 moviedata.json.zst

@wafuwafu13 wafuwafu13 changed the title cleanup: set bootstrap data locally instead of download and unzip refactor: set bootstrap data locally instead of download and unzip Aug 29, 2023
@wafuwafu13 wafuwafu13 changed the title refactor: set bootstrap data locally instead of download and unzip refactor: set bootstrap data locally instead of downloading Aug 29, 2023
// Step 3. decompress data
let compressed_data = include_bytes!("./resources/bootstrap/moviedata.json.br");
let cursor = Cursor::new(compressed_data);
let mut decompressor = Decompressor::new(cursor, 4096);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in this example, set the buffer size to 4096

@wafuwafu13
Copy link
Contributor Author

wafuwafu13 commented Aug 31, 2023

$ uname -a
Darwin c889f3a95659 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000 arm64

$ cargo build -j 8 --release

$ ls -lh ./target/release/dy
-rwxr-xr-x  1 herotaka  staff    11M 31 Aug 10:24 ./target/release/dy

@wafuwafu13
Copy link
Contributor Author

wafuwafu13 commented Sep 5, 2023

Pros

  • Avoid network problems

Cons

  • Increase in binary size
    • Little or no impact on users

@StoneDot
Copy link
Contributor

StoneDot commented Sep 8, 2023

I confirmed that the difference in the size of the binary is negligible.

❯ git show --no-patch

commit 1264580374ba26c60d79068eff33569838d9b655 (HEAD -> bootstrap-local)
Author: Hirotaka Tagawa <[email protected]>
Date:   Tue Aug 29 22:43:44 2023 +0100

    refactor: decompress by brotli

❯ ls -lh ./target/release/dy
Permissions Size User   Date Modified Name
.rwxr-xr-x   12M hiroag  8 9 13:47    ./target/release/dy

Copy link
Contributor

@StoneDot StoneDot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wafuwafu13 Thank you for implementing compression. It looks good to me. Could you squash commit into a single one?

Copy link
Contributor

@StoneDot StoneDot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Thank you!

@StoneDot StoneDot merged commit ed46dc1 into awslabs:main Sep 28, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants