Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please use my IntegerOrFloat. I added a Negative for you #248

Open
ctrlcctrlv opened this issue Jan 13, 2022 · 16 comments
Open

Please use my IntegerOrFloat. I added a Negative for you #248

ctrlcctrlv opened this issue Jan 13, 2022 · 16 comments

Comments

@ctrlcctrlv
Copy link
Contributor

https://github.com/MFEK/integer_or_float.rlib

Please, instead of your own. It's making merging our implementations more challenging than it needs be. I think it's obvious my implementation is far more complete, but if you want clarification on any point I'm happy to oblige.

As stated in title, I added a Negative for you:
https://github.com/MFEK/integer_or_float.rlib/blob/22862f7072821e22b404f9a46de34e6ad991f627/src/lib.rs#L23-L29

@madig
Copy link
Collaborator

madig commented Jan 14, 2022

Oof, that's a lot of code. What made you not just use f64s for stuff?

@ctrlcctrlv
Copy link
Contributor Author

Because that wouldn't be correct and could cause me to write back data that wasn't as I read it.

@ctrlcctrlv
Copy link
Contributor Author

Also, IOF is a lot of code but it has zero hard dependencies and even supports #[no_std]https://github.com/MFEK/integer_or_float.rlib/blob/main/Cargo.toml

@cmyr
Copy link
Member

cmyr commented Jan 31, 2022

sorry for the delay, I had a response to this typed up but I forgot to hit send!

That certainly looks quite thorough, so I want to make sure I understand what the various tradeoffs are. In general the more non-std code we're using the more likely we are to have bugs and maintenance costs and increased compilation times and binary sizes, so I'd like to better understand how you're approaching this.

Per my understanding, IntegerOrFloat is essentially a convention around serialization. Because XML plists have separate real and integer types, both are supported.

My decision in norad was to only deal with that at the serialization boundary. This saves a lot of complexity inside the library, and by using f64 we can represent up to 52-bit integers losslessly. This should more than cover any possible intentional value in a real font file. (see note below)

Because that wouldn't be correct and could cause me to write back data that wasn't as I read it.

Unless we abandon floating point completely in favour of some real real number type, this goal is not 100% attainable. What's more, According to the spec, if a value can be represented as an integer (has no fractional part) then it should be. So even if you read in 1.000, writing out 1 is what the spec wants. Regardless, assuming we are using a floating-point implementation internally, even if we wanted to perfectly match something on disk it would always be possible to have a number in a glyph file that was not perfectly representable in floating point, and so we would ultimately end up writing back something different from what we found.

More worryingly, the spec allows for numbers to be of arbitrary length and precision. This means that there is no guarantee that a number can be represented in a float or a native integer type, because it would be "legal" to have a number in your glyph with thousands of digits. If we wanted to handle this 'correctly', we would need to use some sort of fancy real numbers library.

My current position is that we should be pragmatic here: specifically I mean that we should take the set of values that we can conceivably imagine in a production font file, and we should handle these. I believe using f64 internally more than does this, even if we imagine a PPEM of 100,000.

In any case, I'm curious to hear your thoughts, and if there are any specific cases you can think of where our approach is inadequate or produces unexpected results. :)

@ctrlcctrlv
Copy link
Contributor Author

Thanks for the thoughtful reply, I really appreciate it. I'm trying to be nicer on issue trackers, it's a problem for me, mostly because I tend to see my positions as obvious, so I'm learning to be more what Christians call charitable.

I'll try and explain why I made IOF.

I'm aware of what the spec says about shortening 1.000 to 1. The primary thing IntegerOrFloat seeks to enforce is TTF (round-to-int expectation) vs CFF (float OK expectation). What I mean by "Because that wouldn't be correct and could cause me to write back data that wasn't as I read it." is that my maths operations might otherwise cause me to read let's say "1", and then I do many operations and I now have in memory "3.000003", if I was holding an f64. But if I was holding an IOF, my hope is that a future glifparser (yes, not even the one I have 😆), could see that for this point, we used to be holding an integer, and are now holding a float, so after a series of operations, we restore the integer-or-floatness of the value. I would implement this by giving each IOF a private enum field, original_type, being either Integer or Float. Then, when IOF is getting written, it can force itself to be its "original" type.

And I can't just do this based on the curve type in use, because many CFF fonts have the same int-expectation (as they want to be cleanly convertible to TTF). So, I'd really want to enforce this in glifparser level as, if all values in this contour were int, we strictly follow IOF original_type. If any value is float, we allow any value to be float even post operations (because IOF can change its type if you multiply it by a float).

For some people this might seem "too magical" I'm sure, but I feel it's justified given the constraints we are working under. (Some glyphs must always retain their integer-ness, but we can't really know which in any way besides what I've described.) And glifparser indeed has many modes and this would become just another mode, enabled by default in MFEK applications but changeable by the user.

@ctrlcctrlv
Copy link
Contributor Author

But also in terms of practicality, because of these plans and concerns of mine, IOF permeates the codebase, it's a vital core type, (you might be able to tell given all the work I've done on it and the absurd number of traits it supports—you can have an IOF that can report the number of Units of Least Precision between itself and another IOF via approx_eq::Ulps) and I'm having trouble figuring out how to even interoperate. If I did the work, maybe we could meet in the middle, you could have a non-default feature which would use MFEK's IOF. My IOF is already extremely "magical" and supports, well, hundreds(?) of traits, just from docs here:

Add<IntegerOrFloat>
Add<f32>
Add<f64>
Add<i16>
Add<i32>
Add<i64>
Add<i8>
Add<isize>
Add<u16>
Add<u32>
Add<u64>
Add<u8>
Add<usize>
ApproxEq
AsMut<IntegerOrFloat>
AsRef<IntegerOrFloat>
Clone
Copy
Debug
Default
Deserialize<'de>
Display
Div<IntegerOrFloat>
Div<f32>
Div<f64>
Div<i16>
Div<i32>
Div<i64>
Div<i8>
Div<isize>
Div<u16>
Div<u32>
Div<u64>
Div<u8>
Div<usize>
From<IntegerOrFloat>
From<f32>
From<f64>
From<i16>
From<i32>
From<i64>
From<i8>
From<isize>
From<u16>
From<u32>
From<u64>
From<u8>
From<usize>
FromStr
Mul<IntegerOrFloat>
Mul<f32>
Mul<f64>
Mul<i16>
Mul<i32>
Mul<i64>
Mul<i8>
Mul<isize>
Mul<u16>
Mul<u32>
Mul<u64>
Mul<u8>
Mul<usize>
Neg
Num
One
PartialEq<IntegerOrFloat>
PartialOrd<IntegerOrFloat>
Rem<IntegerOrFloat>
Rem<f32>
Rem<f64>
Rem<i16>
Rem<i32>
Rem<i64>
Rem<i8>
Rem<isize>
Rem<u16>
Rem<u32>
Rem<u64>
Rem<u8>
Rem<usize>
Serialize
Sub<IntegerOrFloat>
Sub<f32>
Sub<f64>
Sub<i16>
Sub<i32>
Sub<i64>
Sub<i8>
Sub<isize>
Sub<u16>
Sub<u32>
Sub<u64>
Sub<u8>
Sub<usize>
ToPrimitive
TryFrom<&'_ str>
Ulps
Zero
RefUnwindSafe
Send
Sync
Unpin
UnwindSafe
Any
Borrow<T>
BorrowMut<T>
DeserializeOwned
From<T>
Into<U>
NumOps<Rhs, Output>
ToOwned
ToString

Now, if you compile w/o serde, obviously no Deserialize and so on. And your own serde feature if you have one can easily be given to IOF, e.g. serde = ["integer_or_float/serde", …]

@madig
Copy link
Collaborator

madig commented Feb 1, 2022

I'm aware of what the spec says about shortening 1.000 to 1. The primary thing IntegerOrFloat seeks to enforce is TTF (round-to-int expectation) vs CFF (float OK expectation). What I mean by "Because that wouldn't be correct and could cause me to write back data that wasn't as I read it." is that my maths operations might otherwise cause me to read let's say "1", and then I do many operations and I now have in memory "3.000003", if I was holding an f64.

I appreciate the desire to be thorough, but how realistic is it that you have a CFF font where the precision of 3.000003 is war-deciding? My hunch is that you can actually limit the precision to, I dunno, 4 decimal places and get away with it on asymptotically all fonts in existence.

Addendum: I actually see the presence of floating point numbers in sources where the output format is TTF to be a sign that something is going wrong somewhere, and these numbers tend to stand out in diffs.

@ctrlcctrlv
Copy link
Contributor Author

I'm more thinking about the case where you want to apply a matrix to an entire font, like a skew/scale/etc., or you want to change the UPM or raise/lower the ascender/descender, common font editor operations.

@madig
Copy link
Collaborator

madig commented Feb 1, 2022

Ah, ok. I think e.g. Glyphs.app handles it with a bit of user intervention. It will round stuff after glyph ops, unless you disable some font-wide option in the font info dialog somewhere. Typically, you really want integers, always. But yes, it's tricky.

@ctrlcctrlv
Copy link
Contributor Author

Right, meanwhile I want to make the decision based on what the contour already looks like, at write time instead of after the operations, so that no precision is lost until it needs to be.

@cmyr
Copy link
Member

cmyr commented Feb 2, 2022

Thanks for the clear explanation @ctrlcctrlv, I really appreciate it.

In this particular case (floating point error introduced during division/multiplication) maybe there is another solution, which is to do check this at serialization time. That is, if a given number had a fractional part, we check that the fractional part is greater than some epsilon, and if it is not then we serialize as an integer? I'd have to double check, but there should be some way to ensure we catch those extreme values that might be introduced by floating-point math.

Ideally we would just have a piece of metadata on the glyph that says "integer only!" and then treat that as the source of truth, and then round everything when serializing? And as a potential short-term solution we could let you optionally pass some 'int only' flag to norad (when creating/loading a font?) that norad would store, and then when serializing we would check this flag and if it was set we would ensure we only serialize integers? Then it would be up to the API client to decide when to use this argument, and they could employ whatever heuristics they want to do that.

@ctrlcctrlv
Copy link
Contributor Author

I think glyph-level is too extreme and IOF is designed around thinking about it contour-by-contour. Reason being, I have a lot of fonts where some of my glyphs have contours I drew normally, so in a font-level operation (matrix) I'd want those to remain snapping to int, but others I drew w/Spiro (which turns into cubic Bézier with lots of extraneous points that look weird when snapped)…also, another reason is that another common FontForge operation I'm thinking about is “Merge Fonts”. You may merge TTF to CFF, and want the TTF glyphs to keep to int as they only contain int coords.

I know this might sound highly theoretical, but these are all real issues that have either come up in my development in the past or come up routinely in the present.

If you apply a skew, let's say, there's no way an epsilon check would work. A skew is guaranteed to put the majority at an extreme number of ULP's from nearest int. That's why I want to know—“when this contour was originally added, did it only consist of integers”—something only IOF can really conceivably answer with certainty (contour.all(|p|p.x.original_held_integer() && p.y.original_held_integer())) One could of course actually make that a trait on Point<PD>, just an example. 😄

@cmyr
Copy link
Member

cmyr commented Feb 7, 2022

I know this might sound highly theoretical, but these are all real issues that have either come up in my development in the past or come up routinely in the present.

If you apply a skew, let's say, there's no way an epsilon check would work. A skew is guaranteed to put the majority at an extreme number of ULP's from nearest int. That's why I want to know—“when this contour was originally added, did it only consist of integers”—something only IOF can really conceivably answer with certainty (contour.all(|p|p.x.original_held_integer() && p.y.original_held_integer())) One could of course actually make that a trait on Point<PD>, just an example. 😄

I think this makes sense, but I wonder if there's a slight mixing of concerns, here: Is there a distinction between "do I want this contour to use integer coordinates" versus "does this contour happen to only contain integer coordinates"? I could imagine situations where we did not want to be limited to integer coordinates, but just happened to have a contour that was all integers.

As a general principal, we should prefer explicit state. So if we want to be able to indicate "this contour should always be integer rounded", then that should be something that gets set in a lib or similar?

This would make a very nice separation of concerns possible: your client software could check this flag, and if it was present it could just manually go and round all that contour's points?

@ctrlcctrlv
Copy link
Contributor Author

ctrlcctrlv commented Feb 9, 2022

I could imagine situations where we did not want to be limited to integer coordinates, but just happened to have a contour that was all integers.

Can you explain such a situation? Perhaps scaling a very large contour down very small?

Typically the times you in a sense "need" float-level precision deal with contours the user did not draw, and which are not the result of boolean operations applied to user splines.

In these cases—chiefly: results of stroke algorithms, machine traces of rasters, results of Spiro and maybe Hyperbezier sometimes, and results of "disturb"-like algorithms—you are guaranteed to not have all integers.

I can write a lib sure—and do write a lot of lib entries—but this isn't interoperable with other software. Users would manually have to mark contours (or mark a whole glyph/font as appropriate). I don't find that tradeoff acceptable when I can be so sure of the provenance based on this simple heuristic. Another simple heuristic I will use is, if the points have a fractional component, is that fractional component equivalent within the f32::EPSILON across all of them?

However, to handle your case of a contour that will look OK large but degrade small, I will add bbox calculations and a warning, thanks for idea:

Matrix→contour warning
"This contour only consists of integers. Its new bounding box will result in significant distortion of n%. Please consider increasing this font's UPM (units per em), or changing MFEK's settings to always allow floats for this font (if you don't plan to compile a TTF).
Mark this contour as being float-allowed / Distort the contour as you must / Cancel".

@ctrlcctrlv
Copy link
Contributor Author

ctrlcctrlv commented Feb 9, 2022

Another heuristic for setting the value initially is scoring a contour's neighbor point closeness considering the size of its bounding box, if integer/float state alone is distasteful.

I think a lot of our discussions have one core theme—how much value I place on interoperability, which is of course entirely a project level subjective decision and not a technical decision. "How much do I want to help the user use UFO sources, and parts thereof, that we did not make?" A big concern given as a technically adept user (which, by the way, is my target audience) integrates MFEK into their workflow it will only gradually take on more and more jobs.

@ctrlcctrlv
Copy link
Contributor Author

FYI I extended IntegerOrFloat in two ways—

  • MFEK/integer_or_float.rlib@72bca01impl num_traits::Float for IntegerOrFloat. This allows one to call e.g. abs() or sqrt() without conversion to a primitive float. Of course since this crate is also no_std this feature is optional.

  • And, most importantly for this issue, MFEK/integer_or_float.rlib@v0.1.3...v0.1.4 — you can now pass x64-backing-store, implemented thus:

    https://github.com/MFEK/integer_or_float.rlib/blob/ae5b52f9f6b08c492d251cb0e8911be4f5ea50f3/src/backing_types.rs#L7

    #[cfg(feature = "x64-backing-store")]
    pub type f_iof = f64;
    #[cfg(feature = "x64-backing-store")]
    pub type i_iof = i64;
    #[cfg(feature = "x64-backing-store")]
    pub type u_iof = u64;
    #[cfg(not(feature = "x64-backing-store"))]
    pub type f_iof = f32;
    #[cfg(not(feature = "x64-backing-store"))]
    pub type i_iof = i32;
    #[cfg(not(feature = "x64-backing-store"))]
    pub type u_iof = u32;
    
    #[cfg(feature = "x64-backing-store")]
    pub const IOF_BACKING_STORE_BITLEN: u8 = 64;
    #[cfg(not(feature = "x64-backing-store"))]
    pub const IOF_BACKING_STORE_BITLEN: u8 = 32;

    Because cargo can compile multiple versions of a crate, you should even be able to use glifparser which expects IOF_BACKING_STORE_BITLEN to be 32 and norad which would probably want 64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants