Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store data path hash instead of string in binary #5886

Closed
wants to merge 11 commits into from

Conversation

sffc
Copy link
Member

@sffc sffc commented Dec 10, 2024

Proof-of-concept for #4991

@sffc
Copy link
Member Author

sffc commented Dec 10, 2024

@Manishearth What do you think about this approach?

The actual path strings are used in two places:

  1. Error messages, including impl Debug
  2. FsDataProvider

This PR makes the Debug impl use the hash and FsDataProvider get the path from the registry.

One thing that puzzles me is the size of tutorials_buffer.wasm.

Commit Description Size
9e5a25b Main 250697
68fcc7e Add data marker hash 246851
f8d1505 Remove data marker path 251087
8f34fd5 Simplify Debug impl 251888

My expectation is that the size should go up on row 2 and down on rows 3 and 4, but it is basically the opposite of my expectation. I've inspected the assembly and I have no explanation for this.

@sffc sffc force-pushed the tdmh branch 2 times, most recently from 279a8d3 to 759930e Compare December 10, 2024 22:17
@Manishearth
Copy link
Member

I like this, as long as datagen is able to speak in terms of the regular paths and debugging hashes is not too hard. Which I think will be fine with the registry.

My expectation is that the size should go up on row 2 and down on rows 3 and 4, but it is basically the opposite of my expectation. I've inspected the assembly and I have no explanation for this.

Is the path included in the binary elsewhere? I bet something else includes it so it gets reused.

@sffc
Copy link
Member Author

sffc commented Dec 12, 2024

Briefly discussed in the WG call. @Manishearth and @hsivonen are supportive of the change, and @sffc will continue to investigate the binary size issue.

@sffc sffc added this to the ICU4X 2.0 Stretch ⟨P2⟩ milestone Dec 12, 2024
@sffc
Copy link
Member Author

sffc commented Jan 7, 2025

@robertbastian I still have a lot on my plate for 2.0. Would you mind taking a look at this PR, reproducing and possibly fixing the binary size regression shown above?

@robertbastian
Copy link
Member

robertbastian commented Jan 13, 2025

tutorials_buffer.wasm is now at 243150

Edit: due to the rebase the numbers from Shane's post are no longer accurate. New baselines

Commit Size
Main 242744
Add data marker hash 240770
Remove data marker path 244179
Simplify Debug impl 244847
... ...
265e9c4 243150
attributes domain 242321

@sffc
Copy link
Member Author

sffc commented Jan 14, 2025

Your latest changes look good; thanks. Still no idea why the binary size changes in the opposite direction than expected.

How do you suggest we proceed?

@robertbastian
Copy link
Member

I will stare at this more.

@sffc
Copy link
Member Author

sffc commented Jan 17, 2025

Replaced by #5981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants