Skip to content

Conversation

@byroot
Copy link
Contributor

@byroot byroot commented Nov 24, 2025

Ref: 1eb715d

It seems like this data was initially YAML, but 13 years ago various version of libyaml had issues, so Marshal was used.

The problem with marshal is it's a binary format, so while this data has no reason to change, it's hard to see what's in it.

I think a simple Ruby constant would work just as well. On my machine, both take as long to load, but many apps have bootsnap setup, so would probably load the ruby version faster than the marshal one.

Also as a sidenote, as far as I can tell, COMPOSITION_TABLE is unused, getting rid of it would save a bit of memory and boot time.

Ref: 1eb715d

It seems like this data was initially YAML, but 13 years ago
various version of libyaml had issues, so Marshal was used.

The problem with marshal is it's a binary format, so while this
data has no reason to change, it's hard to see what's in it.

I think a simple Ruby constant would work just as well.
On my machine, both take as long to load, but many apps have
bootsnap setup, so would probably load the ruby version faster
than the marshal one.
@dentarg
Copy link
Collaborator

dentarg commented Nov 24, 2025

I like this change. Thanks for doing it. Hopefully people use editors with code folding if they need to interact with this file. :)

Also as a sidenote, as far as I can tell, COMPOSITION_TABLE is unused, getting rid of it would save a bit of memory and boot time.

Do you mind adding a issue about it or making a PR?

@dentarg dentarg merged commit 2252813 into sporkmonger:main Nov 24, 2025
33 checks passed
@dentarg
Copy link
Collaborator

dentarg commented Nov 24, 2025

I'll note that I did Marshal.load myself and things checked out.

If someone else does what I did, the naive copy-paste from irb, do note that 12644 and 65440 are a bit special (they did not carry over in the copy-paste).

Screenshot 2025-11-24 at 23 24 04
Screenshot 2025-11-24 at 23 24 16

Comment on lines -36 to -38
UNICODE_TABLE = File.expand_path(
File.join(File.dirname(__FILE__), '../../..', 'data/unicode.data')
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is a breaking change that Addressable::IDNA::UNICODE_TABLE no longer exist. Looks like it only existed if you required addressable/idna/pure.

@byroot
Copy link
Contributor Author

byroot commented Nov 25, 2025

Hopefully people use editors with code folding if they need to interact with this file. :)

Initially I did put the constant in its own file, but then I discovered the test suite use load and const_remove, so the slit file was breaking lots of tests :/.

Do you mind adding a issue about it or making a PR?

Will do tomorrow.

I wonder if it is a breaking change that Addressable::IDNA::UNICODE_TABLE no longer exist.

Depends how you define a breaking change. On paper it was a public constant, in practice, it had 0 usefulness, so I can't imagine its removal breaking any reasonable code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants