-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion of VarName to/from string #100
Conversation
@sunxd3 @yebai I'm not entirely sure what interface we want for this, so would like to ask you all for feedback. You can see what it currently does in the tests I added. Specifically:
Alternatively, we could go the whole way and serialise the entire internal structure of VarName. That would be the most correct thing to do, although I don't currently get the sense we need to go that far. |
Is it sensible to overload
I'm not sure I understand this; can you clarify? |
I had a look at that earlier, but it skips the 'string' bit and goes straight to 'IO stream' serialize(stream::IO, value)
deserialize(stream) meaning that it's not possible to use these methods as part of something larger (e.g. de/serialising a struct that itself contains Other serialisation libraries like Serde are composable, but we probably don't want to bring in another dependency :(
By this, I'm talking about storing the exact types that VarName uses internally, which would let us correctly differentiate between concretised colons and abstract colons. julia> using AbstractPPL; x = ones(15)
julia> dump(@varname(x[:], true))
VarName{:x, Accessors.IndexLens{Tuple{AbstractPPL.ConcretizedSlice{Int64, Base.OneTo{Int64}}}}}
optic: Accessors.IndexLens{Tuple{AbstractPPL.ConcretizedSlice{Int64, Base.OneTo{Int64}}}}
indices: Tuple{AbstractPPL.ConcretizedSlice{Int64, Base.OneTo{Int64}}}
1: AbstractPPL.ConcretizedSlice{Int64, Base.OneTo{Int64}}
range: Base.OneTo{Int64}
stop: Int64 15
julia> dump(@varname(x[:]))
VarName{:x, Accessors.IndexLens{Tuple{Colon}}}
optic: Accessors.IndexLens{Tuple{Colon}}
indices: Tuple{Colon}
1: Colon() (function of type Colon) |
I don't have a strong view on this; any thoughts from @mhauru @sunxd3 @torfjelde? |
I don't really understand the how concretisation gets used, so no strong opinion. I do agree with @penelopeysm's comment that dumping the entire structure, so one can reconstruct an object that passes a |
A julia> x = ones(10)
10-element Vector{Float64}:
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
julia> vn1 = @varname(x[end])
x[10]
julia> x = ones(100)
100-element Vector{Float64}:
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
⋮
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
julia> vn2 = @varname(x[end])
x[100] this is useful to support general Julia program in Turing model. (note |
regarding how we should convert I think saving the whole internal would be good, but might makes it hard to read. It would be nice to have an concret example where this conversion might be used. (an example I can think of is from parameter names in |
This was my use case that came up. I think having the same format both machine-readable and human-readable is great, but if that's hard, we can always have one method for a full string representation and another for pretty-printing a summary. |
Thanks all! The thing with Chains is that the varname will always be concretised: using Turing
@model function demo(y)
x = Vector{Real}(undef, 2) # so that x[:] is concretised
x[:] ~ MvNormal(fill(0, 2), 1)
y ~ Normal(x[end], 1)
end
chain = sample(demo(1), NUTS(), 100, progress=false)
vn = collect(keys(chain.info.varname_to_symbol))[1] # x[:][1]
dump(vn) gives
So for this usecase, it seems sensible to say that roundtrip deserialisation need only work for concretised varnames? |
yeah, that would make sense to me. also |
I'm personally very much in favour of storing a "true" representation of |
if all we want is serialization, then yeah, storing the full thing would be the right way to go. but do we want to use this "string-ify" also for the also remind me of that we need to deal with non-standard variable name like |
Depends on what the application for this is (I'm somewhat OOL here). IMO, there are currently two "use-cases" for
(1) is fair, but requires no notion of "deserialization". (2) is not a good motivation in my opinion. Instead, we should just be using |
Current behaviour is much improved, I think, and works for both unconcretised and concretised slices, as well as the using AbstractPPL
using Printf
y = ones(10)
vns = [
@varname(x),
@varname(x.a),
@varname(x.a.b),
@varname(var"x.a"),
@varname(x[1]),
@varname(x[1:10]),
@varname(x[1, 2]),
@varname(x[:]),
@varname(y[begin:end]),
@varname(y[:], false),
@varname(y[:], true),
]
for vn in vns
@printf "%10s => %s\n" vn vn_to_string(vn)
end
It fails if somebody tries to use custom arrays, – presumably because julia> using OffsetArrays: Origin
julia> z = Origin(4)(ones(10))
10-element OffsetArray(::Vector{Float64}, 4:13) with eltype Float64 with indices 4:13:
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
julia> vn = @varname(z[:], true)
z[:]
julia> vn_from_string(vn_to_string(vn)) == vn
ERROR: UndefVarError: `OffsetArrays` not defined
Stacktrace:
[1] top-level scope
@ none:1
[2] eval
@ ./boot.jl:385 [inlined]
[3] eval
@ ~/ppl/appl/src/AbstractPPL.jl:1 [inlined]
[4] nt_to_optic(nt::@NamedTuple{type::String, indices::String})
@ AbstractPPL ~/ppl/appl/src/varname.jl:783
[5] vn_from_string(str::String)
@ AbstractPPL ~/ppl/appl/src/varname.jl:808
[6] top-level scope
@ REPL[63]:1 |
Co-authored-by: Tor Erlend Fjelde <[email protected]>
Okay, I've reimplemented the {de,}serialisation with In the benchmarks below, Click here to see the benchmarking code.using AbstractPPL
using BenchmarkTools
y = ones(10)
z = ones(10, 5)
vns = [
@varname(x),
@varname(x.a),
@varname(x.a.b),
@varname(var"x.a"),
@varname(x[1]),
@varname(x[1:10]),
@varname(x[1, 2]),
@varname(x[:]),
@varname(y[begin:end]),
@varname(y[:], false),
@varname(y[:], true),
@varname(z[:], false),
@varname(z[:], true),
@varname(z[:,:], false),
@varname(z[:,:], true),
]
# Method 1
println("Old")
strs = map(vn_to_string, vns)
@benchmark map(vn_from_string, strs)
# Method 2
println("New")
strs = map(vn_to_string2, vns)
@benchmark map(vn_from_string2, strs) julia> @benchmark map(vn_from_string, strs)
BenchmarkTools.Trial: 1649 samples with 1 evaluation.
Range (min … max): 2.873 ms … 74.691 ms ┊ GC (min … max): 0.00% … 95.75%
Time (median): 2.920 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.032 ms ± 1.772 ms ┊ GC (mean ± σ): 1.53% ± 2.80%
Memory estimate: 242.80 KiB, allocs estimate: 4126.
julia> @benchmark map(vn_from_string2, strs)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 135.750 μs … 66.464 ms ┊ GC (min … max): 0.00% … 99.57%
Time (median): 138.666 μs ┊ GC (median): 0.00%
Time (mean ± σ): 149.350 μs ± 665.882 μs ┊ GC (mean ± σ): 5.97% ± 3.66%
Memory estimate: 105.80 KiB, allocs estimate: 1285. The only cost is that this drops support for any Note that the original implementation already didn't permit ranges that were not defined in |
It's only called for ConcretizedSlice now, which could potentially be removed too.
Remaining todos if we're happy with the implementation:
|
c5016aa
to
73a426b
Compare
Great stuff @penelopeysm :)
If we're lucky, they might be open to adding StructTypes.jl as a conditional dependency in Accessors.jl, which would probably make this even nicer:) |
At this point, would it make sense to just move all the varname related stuff to a separate package, e.g. VarNames.jl? This way we could easily use it in MCMCChains for example. |
Hmm yes an AccessorsStructTypesExt would make sense! One problem is that we are only really implementing serialisation for a subset of the types in Accessors (i.e. the types we care about). I could open an issue to see whether they would be interested in the code.
AbstractPPL doesn't really have much code that isn't in |
I think the tradeoff between the new dependency and speed of deserialize is worth it, so the new implementation is good with me. This is fantastic, Penny!
I am probably missing something, why is it difficult now? (I think |
Actually, coming back to this with a fresh pair of eyes, here's an even faster one which just goes via Dict -> JSON instead of using StructTypes: julia> @benchmark map(vn_from_string2, strs)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 68.208 μs … 62.339 ms ┊ GC (min … max): 0.00% … 99.75%
Time (median): 70.583 μs ┊ GC (median): 0.00%
Time (mean ± σ): 77.770 μs ± 623.365 μs ┊ GC (mean ± σ): 8.61% ± 1.89%
Memory estimate: 42.80 KiB, allocs estimate: 677. |
121d1b0
to
fcd81b0
Compare
AbstractPPL.jl has a bunch of features that are completely unnecessary for MCMCChains.jl. There are several places where it would be beneficial to use |
Do we lose some extensibility here though? My "hope" was that if the "serialization interface" is external from AbstractPPL.jl, we'd be able to suggest other packages to also implement it through and extension or something, leading to |
Actually the StructTypes implementation wasn't very extensible either 😅 because it doesn't use StructTypes 'all the way down' – it just uses it for the top layer, and the bottom layers (most importantly, indices) are serialised manually: Lines 866 to 899 in 73a426b
For composability purposes I see a couple of options we can do:
Option 1 would be the correct solution if we believe that there is one true serialisation method for those upstream types, which every consumer should work with. I think this is a noble aim! but I feel very hesitant in proposing so many upstream changes :( As a case study, our ConcretizedSlices (usually) wrap julia> using StructTypes, JSON3
julia> s = JSON3.write(Base.OneTo(5))
"[1,2,3,4,5]" Unfortunately, deserialising this results in a loss of information, because there is no information in the string that says that it should be converted to julia> JSON3.read(s)
5-element JSON3.Array{Int64, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
1
2
3
4
5 and although these are semantically the same (I discovered that
To fix this, we would have to patch StructTypes to serialise "{\"stop\":5,\"type\":\"Base.OneTo\"}" so that we could use that Let's discuss this on Monday though :) |
34d229c
to
446dd76
Compare
01cba79
to
e0e5322
Compare
I think this is ready for a final review. I wrote up docs here too which show how you can extend the interface to handle custom types: http://turinglang.org/AbstractPPL.jl/previews/PR100/api/#VarName-serialisation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me; thanks @penelopeysm!
Closes #98.