Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Scratch to hold downloads and compiled tzdata #384

Merged
merged 10 commits into from
Jul 9, 2022

Conversation

staticfloat
Copy link
Contributor

This should address many of the problems listed in #343. It still contains a Pkg.build() step to do the serialization, but it stores things in a scratch space, rather than in the package directory, and it marks the default tzdata as an eager artifact, rather than a lazy one, so it should download along with the rest of the artifacts during package installation.

The one thing I'm not sure about is whether the "version" that it checks against the IANA servers is properly cached; I see many different ways that the version can be queried, and I'm not sure if the "happy path" still makes a request against the IANA servers. I would prefer that it doesn't, and no network accesses outside of the normal Pkg flow occur, as that helps this package be used in restrictive environments where only an internal PkgServer is whitelisted.

We don't want to mutate the package directory; let's use a scratchspace
instead, and adjust all `const` variables to do runtime lookups, for
maximum relocatability.

We shift around the data a bit here, to make it more clear that the
package directory is not to be mutated.  The ideal case here now is
that, on a fresh install, nothing is downloaded for the default happy
path, the package directory is never mutated, and the build step is
minimal in cost.
@codecov-commenter
Copy link

codecov-commenter commented Jul 7, 2022

Codecov Report

Merging #384 (9c49fba) into master (4cae0ca) will increase coverage by 0.12%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #384      +/-   ##
==========================================
+ Coverage   95.12%   95.24%   +0.12%     
==========================================
  Files          35       36       +1     
  Lines        1743     1746       +3     
==========================================
+ Hits         1658     1663       +5     
+ Misses         85       83       -2     
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <100.00%> (ø)
src/discovery.jl 86.51% <100.00%> (ø)
src/types/timezone.jl 97.22% <100.00%> (+0.07%) ⬆️
src/tzdata/TZData.jl 100.00% <100.00%> (ø)
src/tzdata/build.jl 84.61% <100.00%> (+7.69%) ⬆️
src/tzdata/compile.jl 95.42% <100.00%> (-0.02%) ⬇️
src/tzdata/download.jl 91.17% <100.00%> (+0.26%) ⬆️
src/winzone/WindowsTimeZoneIDs.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cae0ca...9c49fba. Read the comment docs.

@staticfloat
Copy link
Contributor Author

As a side note, I did experiment with eliminating the local serialization step completely; I used JLD2 as a substitute, which seems to work, although it does add a few KB to each serialized file.

We could have a GitHub Actions job that automatically serializes all tzdata sets and uploads them to a GitHub release or similar, and opens a pull request if a new tzdata set is added without the paired serialized artifact. If someone wants to do this, I'd be happy to guide them through it, but I can't do it myself at the moment.

@omus
Copy link
Member

omus commented Jul 8, 2022

Thanks for this @staticfloat. I'll give this a thorough review tomorrow but I'm already sure I'd like to pair this with defaulting to the TZJFile format introduced in #380.

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems good to me. I've modified some of the names to be more consistent. e.g. mixing "compiled"/"serialized"/"serialized cache" or "tz_source"/"source". I'll mention at first I was thinking using Ref constants for the directories was the correct approach but the use of functions reduces the chance of mutations there so I think it's the right direction. I'll make a follow up PR to rename "custom_tzsource_regions" and I think we can merge this. I'll integrate defaulting to the TZJFile format before making a release though

src/tzdata/build.jl Outdated Show resolved Hide resolved
src/TimeZones.jl Outdated Show resolved Hide resolved
src/discovery.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
test/ci.jl Outdated Show resolved Hide resolved
test/winzone/WindowsTimeZoneIDs.jl Outdated Show resolved Hide resolved
src/tzdata/build.jl Outdated Show resolved Hide resolved
src/tzdata/build.jl Outdated Show resolved Hide resolved
src/tzdata/build.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Jul 8, 2022

I should call out that this could be a breaking change if people were referencing the now removed constants. I'm not too worried about this but I may use some globals defined at init-time to ensure this is non-breaking.

@omus
Copy link
Member

omus commented Jul 8, 2022

❗ We'll need to use different directories for each tzdata version. Otherwise having multiple versions of TimeZones installed at the same time with different tzdata versions will interfere with each other
– #384 (comment)

With this in mind I think it makes sense to merge #385 first as that will eliminate us having to specify the Julia version number to ensure serialization compatibility.

@omus
Copy link
Member

omus commented Jul 8, 2022

I'll mention at first I was thinking using Ref constants for the directories was the correct approach but the use of functions reduces the chance of mutations there so I think it's the right direction

After sleeping on this there may be an advantage in the Ref approach under certain conditions. For example if we continue to allow specifying the tzdata version used via the env variable JULIA_TZ_VERSION we may want to reference the path to the tzdata version loaded into memory currently instead just reflecting the current JULIA_TZ_VERSION value. Using a Ref would allow us to update this directory under specific conditions only.

src/tzdata/build.jl Outdated Show resolved Hide resolved
src/tzdata/compile.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Jul 8, 2022

Alright in order to get things moving I'll merge this as it exists now and follow up on the issues I pointed out in a follow up PR.

@omus
Copy link
Member

omus commented Jul 8, 2022

The changes in this PR remove the constants:

TimeZones.PKG_DIR
TimeZones.DEPS_DIR
TimeZones.TZData.TZ_SOURCE_DIR
TimeZones.TZData.COMPILED_DIR
TimeZones.TZData.LATEST
TimeZones.TZData.LATEST_FILE_PATH
TimeZones.WindowsTimeZoneIDs.WINDOWS_XML_DIR
TimeZones.WindowsTimeZoneIDs.WINDOWS_XML_FILE

All of these should only be used internally by TimeZones.jl so I won't bother with trying to gracefully deprecate external usages. In the future I'll try to be more diligent in this package to mark internal contants and functions with leading underscores.

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with using nothing for this purpose this change could be breaking and doesn't seem worth it at this time. I've left a comment to address this in the future.

src/tzdata/build.jl Outdated Show resolved Hide resolved
src/tzdata/build.jl Outdated Show resolved Hide resolved
src/tzdata/build.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Jul 9, 2022

This PR is also running into #386. Looking into this before merging here

@omus
Copy link
Member

omus commented Jul 9, 2022

This PR is also running into #386. Looking into this before merging here

Failure definitely appears unrelated.

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few follow up changes I'd like to make but I think the fastest path forward is to merge this PR as it exists and make the additional changes in another PR

omus added a commit that referenced this pull request Jul 10, 2022
* Specify Scratch compat

* Rename custom tzsource directory

* Refactor tzdata copying logic

* Restore region selection behaviour for compile
kpamnany pushed a commit to RelationalAI-oss/TimeZones.jl that referenced this pull request May 5, 2023
* Eagerly download the default TZData

* Use `Scratch` to hold downloads and compiled tzdata

We don't want to mutate the package directory; let's use a scratchspace
instead, and adjust all `const` variables to do runtime lookups, for
maximum relocatability.

We shift around the data a bit here, to make it more clear that the
package directory is not to be mutated.  The ideal case here now is
that, on a fresh install, nothing is downloaded for the default happy
path, the package directory is never mutated, and the build step is
minimal in cost.

* Fix Windows as well

* Rename `serialized_cache_dir` to `compiled_dir`

* Rename `get_windows_xml_file_path` to `windows_xml_file_path`

* Formatting changes

* Improve backwards compatibility in `build`

Co-authored-by: Curtis Vogt <[email protected]>
kpamnany pushed a commit to RelationalAI-oss/TimeZones.jl that referenced this pull request May 5, 2023
* Specify Scratch compat

* Rename custom tzsource directory

* Refactor tzdata copying logic

* Restore region selection behaviour for compile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants