OndaEDF should use EDF-representable encoding parameters (physical min & max) #90

ericphanson · 2023-10-18T11:48:12Z

EDF.jl requires the caller to encode the data
EDF.jl asks the caller for the encoding parameters (physical_minimum and physical_maximum) and then tries to store those in the file as 8 ASCII characters
OndaEDF.jl is one such caller
OndaEDF.jl performs a new re-encoding phase which can choose difficult-to-represent encoding parameters

One approach here is to make EDF.jl better at representing encoding parameters, e.g. using scientific notation. I think improving EDF.jl's representation is a good thing to do. However, since OndaEDF gets to choose it's encoding, and a wide variety of encoding parameters can perform reasonably well for a given recording, I claim it should choose easy-to-represent ones.

That is, we could round physical_minimum down to the nearest EDF representable value, and physical_maximum up to the nearest EDF representable value before encoding. Thus, there should be no truncation/rounding when EDF encodes that parameter, and thus reduces unnecessary quantization noise.

As a very silly example, the current approach can result in 5 orders of magnitude more encoding noise, simply due to the truncation of encoding parameters:

using OndaEDF, Onda
scale = 1e6
info = SamplesInfoV2(; sensor_type="eeg",
    channels=["fp1"],
    sample_unit="microvolt",
    sample_resolution_in_unit=0.25,
    sample_offset_in_unit=scale,
    sample_type=Int16,
    sample_rate=1)

data = reshape([scale + 0.999999, scale + 1.999999], 1,:)
samples = Samples(data, info, false)

edf = onda_to_edf([samples])
io = IOBuffer()
EDF.write(io, edf)
seekstart(io)
f = EDF.File(io)

(samples2,), plan = edf_to_onda_samples(f)
samples2 = Onda.decode(samples2)

yields

julia> # roundtrip through encoding alone, without EDF truncation of encoding parameters
       quantization_error = abs(samples.data[1] - decode(OndaEDF.reencode_samples(samples)).data[1])
1.00000761449337e-6

julia> # roundtrip through EDF encoding
       edf_error = abs(samples.data[1] - samples2.data[1])
0.375044776359573

julia> log10(edf_error / quantization_error)
5.574079814047871

Note in both cases we are using OndaEDF.reencode_samples to re-encode them. The only difference is passing through EDF.jl writing, which rounds/truncates these parameters.

The text was updated successfully, but these errors were encountered:

klaff · 2023-10-18T17:00:31Z

EDF (I presume) was designed as a format for data generated by A/D converters, which (excluding μ-law or other non-linear types) has a natural resolution (how much is 1 lsb worth?) and a span or range of valid values.

On the other hand, if EDF is used as an output format from some complex process, the contextual clues from the data's origin may be unavailable (or never existed) and the problem changes to one of finding the best mapping given the limits of the format (e.g. 8 char ASCII representation of mapping parameters and Int16 samples).

Those two constraints can play against each other. The EDF error reported above disappears if one chooses sample_resolution_in_unit=1, but that's because we are lucky that a favorable set of physical min/max work out for that case (967232.0f0, 1.032767f6).

The error does not disappear if one naïvely chooses sample_resolution_in_unit=1/2^16. This would maximize the resolution within the span but the physical min/max is broken by the 8-char ASCII limitation (999999.5f0, 1.0f6).

Is there an algorithmic way to find the optimal mapping into EDF? Would optimum be defined as minimum maximum error over a whole signal? Minimum RMS error?

Should there be a function to determine the error in a given case?

ericphanson · 2023-10-19T11:01:38Z

I'm not sure if I'm am following 100% but I think "two constraints can play against each other" refers to the two independent sources of error:

quantization error (in general, from encoding & decoding)
truncation-of-encoding-parameter-error which is due to EDF's 8-character representation of the encoding parameters
- note this truncation happens AFTER encoding, meaning the decoding gets slightly-wrong parameters

I don't think these two are really fundamentally in tension however! Why?

minimizing quantization error depends on setting relatively tight bounds to maximize the resolution we can represent with Int16 numbers. That is, we want physical_minimum and physical_maximum to be near the actual min/max of the data
- note: as you pointed out, if the resolution is actually limited by the underlying recording device, we should just use those parameters, which might give less resolution that we can squeeze out of Int16's but represents the actual resolution of the signal. But if we don't know the resolution of the recording device, then maybe it does make sense to squeeze as much resolution as possible out.
minimizing additional error incurred due to truncation of encoding parameters depends on the value being close to an 8-character digit
- but there's a lot of such "exactly representable" values spread throughout the floating point space!

We can use

function _nearest_representable_edf_value(x, rm::RoundingMode=RoundNearest)
    return round(x, rm; digits=(8 - (ndigits(floor(Int, x)) + signbit(x) + !isinteger(x))))
end

to choose one, deciding on whether or not we want to look up or down to get it. (See beacon-biosignals/EDF.jl#76 for the bugfix I've made to this function compared to it's version in EDF.jl).

So I think if we just choose physical_min = _nearest_representable_edf_value(signal_min, RoundDown) and physical_max = _nearest_representable_edf_value(signal_max, RoundUp) we can eliminate truncation error without significantly changing the quantization error.

BTW, I will include some code below I have used while investigating, in case it is useful (and to put it somewhere).

Code

using Printf
function edf_roundtrip(x::Real)
    result = missing
    if isinteger(x)
        str = string(trunc(Int, x))
        if length(str) <= 8
            result = str
        end
    else
        fpart, ipart = modf(x)
        ipart_str = string('-'^signbit(x), Int(abs(ipart))) # handles `-0.0` case
        fpart_str = @sprintf "%.7f" abs(fpart)
        fpart_str = fpart_str[3:end] # remove leading `0.`
        if length(ipart_str) < 7
            result = ipart_str * '.' * fpart_str[1:(7-length(ipart_str))]
        elseif length(ipart_str) <= 8
            result = ipart_str
        end
    end
    if !ismissing(result)
        return parse(Float64, result)
    end
    error("failed to fit header field into EDF's 8 ASCII character limit. Got: $x")
    return nothing
end

edf_roundtrip_error(x) = abs(x - edf_roundtrip(x))

function _nearest_representable_edf_time_value(x, rm::RoundingMode=RoundNearest)
    return round(x, rm; digits=(8 - (ndigits(floor(Int, x)) + signbit(x) + !isinteger(x))))
end
decode(sample, dmin, dmax, pmin, pmax) = ((sample - dmin) / (dmax - dmin)) * (pmax - pmin) + pmin

encode(sample, resolution_in_unit, offset_in_unit) = round(Int16, clamp((sample - offset_in_unit) / resolution_in_unit, typemin(Int16), typemax(Int16)))


function quantization_error(sample, dmin, dmax, pmin, pmax)
    sample_resolution_in_unit = (pmax - pmin) / (dmax - dmin)
    sample_offset_in_unit = pmin - (sample_resolution_in_unit * dmin)

    encoded_sample = encode(sample, sample_resolution_in_unit, sample_offset_in_unit)
    decoded_sample = decode(encoded_sample, dmin, dmax, pmin, pmax)

    return abs(sample - decoded_sample)
end

function quantization_plus_thresholding_error(sample, dmin, dmax, pmin, pmax)
    sample_resolution_in_unit = (pmax - pmin) / (dmax - dmin)
    sample_offset_in_unit = pmin - (sample_resolution_in_unit * dmin)
    encoded_sample = encode(sample, sample_resolution_in_unit, sample_offset_in_unit)
    decoded_sample = decode(encoded_sample, edf_roundtrip(dmin), edf_roundtrip(dmax), edf_roundtrip(pmin), edf_roundtrip(pmax))
    return abs(sample - decoded_sample)
end

additional_error_due_to_truncation = (args...) -> (quantization_plus_thresholding_error(args...) - quantization_error(args...))


additional_quantization_error_due_to_rounding = (sample, dmin, dmax, pmin, pmax) -> (quantization_error(sample, _nearest_representable_edf_time_value(dmin, RoundDown), _nearest_representable_edf_time_value(dmax, RoundUp), _nearest_representable_edf_time_value(pmin, RoundDown), _nearest_representable_edf_time_value(pmax, RoundUp)) - quantization_error(sample, dmin, dmax, pmin, pmax))

function error_curve(f, pmin, pmax)

    dmin, dmax = Float64(typemin(Int16)), Float64(typemax(Int16))
    xs = range(pmin, pmax; length=100)
    ys = [f(sample, dmin, dmax, pmin, pmax) for sample in xs]
    return (xs, ys)
end


using CairoMakie

pmin, pmax = -3200.0, 3200.0 # or pmin, pmax = 1e6 + 0.999999, 1e6 + 1.999999

# periodic, probably an analytic formula
lines(error_curve(quantization_error, pmin, pmax)...)

lines(error_curve(quantization_plus_thresholding_error, pmin, pmax)...)

lines(error_curve(additional_error_due_to_truncation, pmin, pmax)...)

lines(error_curve(additional_quantization_error_due_to_rounding, pmin, pmax)...)

BTW: this is why I've been saying that I don't think stuff like beacon-biosignals/EDF.jl#70 is really getting at the root problem here. If we represent more numbers accurately, that's helpful, but really we should be choosing exactly-representable encoding parameters from the start, since we both get to choose them here and know the target is EDF.jl, and can thus eliminate truncation error completely, just by calling _nearest_representable_edf_value appropriately before encoding. (EDF.jl is too late, it gets the results after encoding).

klaff · 2023-10-20T14:22:02Z

@ericphanson Did you mean for the funcitons _nearest_representable_edf_time_value and _nearest_representable_edf_value to be different in some way?

klaff · 2023-10-20T14:38:49Z

@ericphanson I don't have time today to keep up with all the great detail you've put in here, but let me know if I've got the TL;DR summary correct: The root of the problem in your original example was choosing offset and resolution at the Onda stage without considering that the chosen encoding parameters are not exactly representable in EDF.

ericphanson · 2023-10-20T14:52:46Z

@ericphanson Did you mean for the funcitons _nearest_representable_edf_time_value and _nearest_representable_edf_value to be different in some way?

No, sorry, they are the same. The time one is the name in EDF.jl but it has nothing to do with time so I changed the name posthoc in an attempt at clarity here .

@ericphanson I don't have time today to keep up with all the great detail you've put in here, but let me know if I've got the TL;DR summary correct: The root of the problem in your original example was choosing offset and resolution at the Onda stage without considering that the chosen encoding parameters are not exactly representable in EDF.

Yes! That was actually what I was trying to say with this whole issue.

ericphanson mentioned this issue Oct 18, 2023

RFC: More principled errors beacon-biosignals/EDF.jl#75

Open

kleinschmidt assigned ericphanson Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OndaEDF should use EDF-representable encoding parameters (physical min & max) #90

OndaEDF should use EDF-representable encoding parameters (physical min & max) #90

ericphanson commented Oct 18, 2023

klaff commented Oct 18, 2023

ericphanson commented Oct 19, 2023 •

edited

Loading

klaff commented Oct 20, 2023

klaff commented Oct 20, 2023

ericphanson commented Oct 20, 2023 •

edited

Loading

OndaEDF should use EDF-representable encoding parameters (physical min & max) #90

OndaEDF should use EDF-representable encoding parameters (physical min & max) #90

Comments

ericphanson commented Oct 18, 2023

klaff commented Oct 18, 2023

ericphanson commented Oct 19, 2023 • edited Loading

klaff commented Oct 20, 2023

klaff commented Oct 20, 2023

ericphanson commented Oct 20, 2023 • edited Loading

ericphanson commented Oct 19, 2023 •

edited

Loading

ericphanson commented Oct 20, 2023 •

edited

Loading