-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for writing to the tzfile
format
#379
Conversation
Codecov Report
@@ Coverage Diff @@
## master #379 +/- ##
==========================================
+ Coverage 94.09% 94.48% +0.39%
==========================================
Files 30 32 +2
Lines 1540 1649 +109
==========================================
+ Hits 1449 1558 +109
Misses 91 91
Continue to review full report at Codecov.
|
Part of the work here was also to determine if the The tzfile format is for the most part compatible with the internal representation we use for Let's validate that there exist time zones which change both of these offsets at once: julia> using TimeZones
julia> tzif_incompatible(::FixedTimeZone) = false
tzif_incompatible (generic function with 1 method)
julia> tzif_incompatible(tz::VariableTimeZone) = tzif_incompatible_index(tz) !== nothing
tzif_incompatible (generic function with 2 methods)
julia> function tzif_incompatible_index(tz::VariableTimeZone)
t, remaining = Iterators.peel(tz.transitions)
std = t.zone.offset.std
dst = t.zone.offset.dst
for (i, t) in enumerate(remaining)
if std != t.zone.offset.std && dst != t.zone.offset.dst
return i
elseif std != t.zone.offset.std
std = t.zone.offset.std
else
dst = t.zone.offset.dst
end
end
return nothing
end
tzif_incompatible_index (generic function with 1 method)
julia> filter(tzif_incompatible, all_timezones())
154-element Vector{TimeZone}:
Africa/Algiers (UTC+1)
Africa/Casablanca (UTC+0/UTC+1)
Africa/El_Aaiun (UTC+0/UTC+1)
Africa/Tripoli (UTC+2)
⋮
US/Alaska (UTC-9/UTC-8)
US/Aleutian (UTC-10/UTC-9)
US/Indiana-Starke (UTC-6/UTC-5)
W-SU (UTC+3)
julia> tz = first(ans)
Africa/Algiers (UTC+1)
julia> i = tzif_incompatible_index(tz)
28
julia> tz.transitions[i:i + 1]
2-element Vector{TimeZones.Transition}:
1977-05-06T00:00:00 UTC+0/+1 (WEST)
1977-10-20T23:00:00 UTC+1/+0 (CET) Since there exists transitions where both of these offsets change at once we'll need to need to represent the DST offset as a value instead of a boolean to ensure a lossless conversion. As the julia> dst_negative(::FixedTimeZone) = false
dst_negative (generic function with 1 method)
julia> function dst_negative(tz::VariableTimeZone)
for t in tz.transitions
if t.zone.offset.dst < Second(0)
return true
end
end
return false
end
dst_negative (generic function with 2 methods)
julia> filter(dst_negative, all_timezones()) # Validate that negative DST offsets exist
7-element Vector{TimeZone}:
Africa/Casablanca (UTC+0/UTC+1)
Africa/El_Aaiun (UTC+0/UTC+1)
Africa/Windhoek (UTC+2)
Eire (UTC+0/UTC+1)
Europe/Bratislava (UTC+1/UTC+2)
Europe/Dublin (UTC+0/UTC+1)
Europe/Prague (UTC+1/UTC+2)
julia> dst_offsets(tz::FixedTimeZone) = Set{Second}([tz.offset.dst])
dst_offsets (generic function with 2 methods)
julia> function dst_offsets(tz::VariableTimeZone)
results = Set{Second}()
for t in tz.transitions
if t.zone.offset.dst in results
@show tz t.zone.offset.dst
end
push!(results, t.zone.offset.dst)
end
return results
end
dst_offsets (generic function with 2 methods)
julia> offsets = mapreduce(dst_offsets, union, all_timezones())
Set{Second} with 7 elements:
Second(-3600)
Second(1200)
Second(5400)
Second(1800)
Second(3600)
Second(7200)
Second(0)
julia> sort!(Minute.(offsets)) # All unique DST offsets that exist in tzdata
7-element Vector{Minute}:
-60 minutes
0 minutes
20 minutes
30 minutes
60 minutes
90 minutes
120 minutes Sticking to the one-byte value we could use 1-minute steps to represent the DST offset which would allow us to store offsets ±2 hours or with 10-minute steps ±42 hours. After thinking about compatibility more the tzfile format stores both the UTC and DST offset as a single value and uses the dst_offset = tt_isdst
utc_offset = tt_utoff - dst_offset Encoding the DST offset this way should allow us to extract the UTC/DST offsets separately and still maintain backwards compatibility. However, I'd need to dig into this further as we may need to have some sort of bit flag in the tzfile which lets us determine if we are interpreting the file in this new way or using our existing heuristic which attempts to extract the DST offset from past information and the As the tzfile also always includes the v1 in additional to the new version we can both reduce size and complexity by using a similar but custom file format. This also has the advantage letting us encode additional information such as the Since the tzdata will be stored in a separate Julia repo TimeZones.jl depends on we can make use of the typical semver usage to ensure format compatibility. After getting julia> using TimeZones
julia> tzif_incompatible(::FixedTimeZone) = false
tzif_incompatible (generic function with 1 method)
julia> tzif_incompatible(tz::VariableTimeZone) = tzif_incompatible_index(tz) !== nothing
tzif_incompatible (generic function with 2 methods)
julia> function tzif_incompatible_index(tz::VariableTimeZone)
t, remaining = Iterators.peel(tz.transitions)
prev_std = t.zone.offset.std
prev_dst = t.zone.offset.dst
for (i, t) in enumerate(remaining)
if prev_std != t.zone.offset.std && prev_dst != t.zone.offset.dst && !iszero(prev_dst) && !iszero(t.zone.offset.dst)
return i + 1
end
prev_std = t.zone.offset.std
prev_dst = t.zone.offset.dst
end
return nothing
end
tzif_incompatible_index (generic function with 1 method)
julia> filter(tzif_incompatible, all_timezones())
3-element Vector{TimeZone}:
Europe/Moscow (UTC+3)
Europe/Paris (UTC+1/UTC+2)
W-SU (UTC+3)
julia> i = tzif_incompatible_index(tz"Europe/Moscow")
9
julia> tz"Europe/Moscow".transitions[(i - 1):(i + 1)]
3-element Vector{TimeZones.Transition}:
1919-05-31T19:28:41 UTC+02:31:19/+2 (MDST)
1919-07-01T00:00:00 UTC+3/+1 (MSD)
1919-08-15T20:00:00 UTC+3/+0 (MSK) In validating the read/write round-trip was easy to determine which time zones include some incompatibility between the tzfile format and our internal representation: julia> function tzif_version(tz)
io = IOBuffer()
TZFile.write(seekstart(io), tz)
return TZFile.read(seekstart(io))(tz.name)
end
tzif_version (generic function with 1 method)
julia> tzif_tz_compatible(tz) = tzif_version(tz) == tz
tzif_tz_compatible (generic function with 1 method)
julia> filter(!tzif_tz_compatible, all_timezones())
113-element Vector{TimeZone}:
America/Argentina/Buenos_Aires (UTC-3)
America/Argentina/Catamarca (UTC-3)
America/Argentina/ComodRivadavia (UTC-3)
America/Argentina/Cordoba (UTC-3)
⋮
Israel (UTC+2/UTC+3)
Pacific/Rarotonga (UTC-10)
Portugal (UTC+0/UTC+1)
W-SU (UTC+3) |
I'm going to forge ahead with this and make a release |
* Refactor abbreviation function * Create `transition_min` function * Rename TZFILE_MAX to TZFILE_CUTOFF * Create TZFile submodule * Add deprecations * Prefer `TZFile.read` to `read_tzfile` * Initial TZFile.write * Embrace closure interface * Lower default write version * Update to use TZFile in docs * Rename TZFile.abbreviation to get_designation
As per #359 the plan was to use the
tzfile
format (or something similar) as the binary format we use to store time zone data as a simple binary serialization format. The changes here add that support add that support as well as change the currentread_tzfile
interface to beTZFile.read
.