Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to harfbuzz funcs #4794

Merged
merged 15 commits into from
Apr 23, 2024
Merged

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Apr 10, 2024

This isn't quite correct: the design of servo/rust-harfbuzz#197 requires you to use UnicodeFuncsBuilder and pass in six different objects (and it would be inefficient to clone this object five times).

I guess I should have six loadable objects in this crate that people have to manually set? Potentially with a feature-gated helper that automatically builds a UnicodeFuncs.

Alternatively we can update upstream to also allow for a single UnicodeFuncs object.

Thoughts? @sffc

Fixes #3257

@Manishearth Manishearth requested review from hsivonen and a team as code owners April 10, 2024 21:49
@Manishearth Manishearth requested a review from sffc April 10, 2024 21:49
ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
@Manishearth
Copy link
Member Author

Manishearth commented Apr 11, 2024

Decision on path forward: Add 7 types here: 6 individually-loaded types, and one ZST that implements everything and uses compiled data.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please make separate types as discussed)

@sffc
Copy link
Member

sffc commented Apr 15, 2024

ICU4X-WG discussion:

  • @sffc Teh traits in the harfbuzz crate are directly a result of the HarfBuzz FFI. Maybe there is a way to use an RC...
  • @Manishearth No, there's no receiver
  • @robertbastian HarfBuzz FFi takes a function pointer and an optional data pointer. For compiled data, we don't need to pass data, we can just pass a function pointer.
  • @sffc Yeah, we can avoid that, possibly.
  • @robertbastian We are avoiding it.
  • @sffc The HarfBuzz traits PR was open for 6 months. The bakedata (?) design was new, the idea of not needing to allocate.
  • @Manishearth HarfBuzz allocates. You have to have allocation. There is no situation in which you use this library without allocating. The question I have is: do we change upstream harfbuzz, where we have a separate type for borrowed and non-borrowed for the 6 properties, resulting in 12 types, or do we allow one type that implements the APIs for all 6 properties.
  • @robertbastian Why do we have these traits?
  • @sffc It's because we depend on harfbuzz-sys, which uses CMake to build in order to link to native code. The unsafe code belongs in the harfbuzz crate (harfbuzz-sys?), and that's where it is. This PR allows us to get rid of the unsafe code from our code (icu_harfbuzz ?). That is confusing code, and it is not something that I want to maintain in ICU4X. I think the 12 types is the right way to go, but the way to do it is implement on CodePointSetData.
  • @robertbastian I don't think the current implementation is complicated.
  • @Manishearth It's a whole pile of FFI.
  • @sffc It's a pile of FFI and it's hard to reason about. It casts between a ___ and a ..._script_t. It's complicated code that I dont want to maintain in ICU4X.
  • @Manishearth HarfBuzz needs alloc. It uses allocation.
  • @Manishearth It seems we have a conclusion. We have 12 types. 6 of them are zero-sized.
  • @sffc We can have 7 types. 6 ___ types and 1 ____.
  • @Manishearth 7 types is nice.
  • @sffc The goal of this change was twofold, and this PR addresses those points: getting rid of the CMake dependency, and getting rid of the unsafe code. If you're in an environment with HarfBuzz available, you already have std available, so you already have allocation.
  • @robertbastian No, you might have an allocator in C, but not on the Rust side, you'll have to write that yourself
  • @Manishearth You can also make the change to ______ upstream.
  • @sffc That seems okay, and a small change. But waiting for upstream will take a long time.

@Manishearth
Copy link
Member Author

Updated

@Manishearth
Copy link
Member Author

I don't understand how CI succeeded previously

@Manishearth
Copy link
Member Author

Manishearth commented Apr 19, 2024

Ah it no longer builds native harfbuzz from source. Annoying.

@Manishearth
Copy link
Member Author

How should we proceed for CIing this? We can install libharfbuzz-dev for CI but on other platforms that won't work. We can skip this crate for testing, perhaos.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify for anyone unsure about the impact of this change: we are replacing ICU4X code like

        unsafe {
            hb_unicode_funcs_set_mirroring_func(
                ufuncs.raw,
                Some({
                    extern "C" fn cb(
                        _: *mut hb_unicode_funcs_t,
                        unicode: hb_codepoint_t,
                        _: *mut c_void,
                    ) -> hb_codepoint_t {
                        bidi_data::bidi_auxiliary_properties()
                            .get32_mirroring_props(unicode)
                            .mirroring_glyph
                            .map(u32::from)
                            .unwrap_or(unicode) as hb_codepoint_t
                    }
                    cb
                }),
                core::ptr::null_mut(),
                None,
            );
        }

with

builder.set_mirroring_func(Box::new(UnicodeFuncs));

where the impl of that function is

    pub fn set_mirroring_func<F: MirroringFunc>(&mut self, f: Box<F>) {
        let mirroring_ptr: *mut F = Box::into_raw(f);
        extern "C" fn impl_mirroring<F: MirroringFunc>(
            _ufuncs: *mut hb_unicode_funcs_t,
            unicode: hb_codepoint_t,
            user_data: *mut c_void,
        ) -> hb_codepoint_t {
            unsafe { &*(user_data as *mut F) }.mirroring(hb_codepoint_t_to_char(unicode))
                as hb_codepoint_t
        }
        extern "C" fn destroy_mirroring<F>(user_data: *mut c_void) {
            let _ = unsafe { Box::from_raw(user_data as *mut F) };
        }
        unsafe {
            hb_unicode_funcs_set_mirroring_func(
                self.raw,
                Some(impl_mirroring::<F>),
                mirroring_ptr as *mut c_void,
                Some(destroy_mirroring::<F>),
            );
        }
    }

hb_codepoint_t_to_char is a private function defined in the Rust harfbuzz crate:

#[inline]
fn hb_codepoint_t_to_char(input: hb_codepoint_t) -> char {
    unsafe { char::from_u32_unchecked(input) }
}

and the impl of mirroring is

    fn mirroring(&self, ch: char) -> char {
        bidi_data::bidi_auxiliary_properties()
            .get32_mirroring_props(ch.into())
            .mirroring_glyph
            .unwrap_or(ch)
    }

So, after inlining, the "hot path", which is the callback function, changes from

                    extern "C" fn cb(
                        _: *mut hb_unicode_funcs_t,
                        unicode: hb_codepoint_t,
                        _: *mut c_void,
                    ) -> hb_codepoint_t {
                        bidi_data::bidi_auxiliary_properties()
                            .get32_mirroring_props(unicode)
                            .mirroring_glyph
                            .map(u32::from)
                            .unwrap_or(unicode) as hb_codepoint_t
                    }

to

        extern "C" fn impl_mirroring<F: MirroringFunc>(
            _ufuncs: *mut hb_unicode_funcs_t,
            unicode: hb_codepoint_t,
            user_data: *mut c_void,
        ) -> hb_codepoint_t {
            let _self = unsafe { &*(user_data as *mut F) }; // should go away
            let ch = unsafe { char::from_u32_unchecked(unicode) };
            bidi_data::bidi_auxiliary_properties()
                .get32_mirroring_props(ch.into())
                .mirroring_glyph
                .unwrap_or(ch)
            as hb_codepoint_t
        }

Cargo.lock Show resolved Hide resolved
ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
ffi/harfbuzz/src/lib.rs Show resolved Hide resolved
ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +167 to +179
#[doc = icu_provider::gen_any_buffer_unstable_docs!(ANY, Self::try_new_unstable)]
pub fn try_new_with_any_provider(
provider: &(impl icu_provider::AnyProvider + ?Sized),
) -> Result<Self, HarfBuzzError> {
Self::try_new_unstable(&provider.as_downcasting())
}
#[cfg(feature = "serde")]
#[doc = icu_provider::gen_any_buffer_unstable_docs!(BUFFER,Self::try_new_unstable)]
pub fn try_new_with_buffer_provider(
provider: &(impl icu_provider::BufferProvider + ?Sized),
) -> Result<Self, HarfBuzzError> {
Self::try_new_unstable(&provider.as_deserializing())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use the helper macro to generate these. Is there a reason you're not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no baked constructor so it won't work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro can be made to not generate baked constructors, we do this when we manually write an infallible baked constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out a way to make it work. You're welcome to try.

None,
);
fn convert_gc(gc: GeneralCategory) -> harfbuzz_traits::GeneralCategory {
match gc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match result in as efficient and in as compact code as the array? Even if the compiler generates a table, is it smart enough to only spend a byte per item in the table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It usually generates a jump table, which does typically work that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a premature optimization but you're welcome to make it as a followup.

ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
@sffc
Copy link
Member

sffc commented Apr 19, 2024

How should we proceed for CIing this? We can install libharfbuzz-dev for CI but on other platforms that won't work. We can skip this crate for testing, perhaos.

Can just the docstest job install the required dependencies?

Getting harfbuzz-sys out of the main icu4x tree was a goal of this change and I think it should stay that way.

Co-authored-by: Shane F. Carr <[email protected]>
@Manishearth
Copy link
Member Author

@sffc if it's a dependency of the doctest it will be needed in tree. A lot of tasks run tests and pull in dev deps.

@Manishearth
Copy link
Member Author

Instead of a doctest we could have a tutorial that doesn't live in the workspace, and is only run in one of the ffi tasks

@sffc
Copy link
Member

sffc commented Apr 19, 2024

Instead of a doctest we could have a tutorial that doesn't live in the workspace, and is only run in one of the ffi tasks

Works for me. Something running out of the workspace sounds good. Just make a small harfbuzz tutorial crate.

ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
@Manishearth
Copy link
Member Author

Which CI job should this be a part of?

@Manishearth
Copy link
Member Author

I think we should upgrade the gn task to being an "ffi integrations" task

@sffc
Copy link
Member

sffc commented Apr 19, 2024

I think we should upgrade the gn task to being an "ffi integrations" task

We kept making combined FFI tasks and then splitting them because they were getting too slow. That said, I'm fine if you do this.

tutorials/harfbuzz/Cargo.toml Outdated Show resolved Hide resolved
@sffc
Copy link
Member

sffc commented Apr 19, 2024

What do you think about doing it in the very same job as test-tutorials-local, including the target directory? Maybe even moving the tutorial into the tutorials/rust folder.

@Manishearth
Copy link
Member Author

Yeah!

ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +167 to +179
#[doc = icu_provider::gen_any_buffer_unstable_docs!(ANY, Self::try_new_unstable)]
pub fn try_new_with_any_provider(
provider: &(impl icu_provider::AnyProvider + ?Sized),
) -> Result<Self, HarfBuzzError> {
Self::try_new_unstable(&provider.as_downcasting())
}
#[cfg(feature = "serde")]
#[doc = icu_provider::gen_any_buffer_unstable_docs!(BUFFER,Self::try_new_unstable)]
pub fn try_new_with_buffer_provider(
provider: &(impl icu_provider::BufferProvider + ?Sized),
) -> Result<Self, HarfBuzzError> {
Self::try_new_unstable(&provider.as_deserializing())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro can be made to not generate baked constructors, we do this when we manually write an infallible baked constructor.

tutorials/harfbuzz/src/main.rs Outdated Show resolved Hide resolved
Makefile.toml Outdated Show resolved Hide resolved
tools/make/ffi.toml Show resolved Hide resolved
tutorials/rust/harfbuzz/src/main.rs Show resolved Hide resolved
tutorials/rust/harfbuzz/src/main.rs Outdated Show resolved Hide resolved
Comment on lines 4 to 18
let mut b = Buffer::with("مساء الخير");
let mut builder = UnicodeFuncsBuilder::new_with_empty_parent().unwrap();
// Note: AllUnicodeFuncs is zero-sized, so these boxes don't allocate memory.
builder.set_general_category_func(AllUnicodeFuncs::boxed());
builder.set_combining_class_func(AllUnicodeFuncs::boxed());
builder.set_mirroring_func(AllUnicodeFuncs::boxed());
builder.set_script_func(AllUnicodeFuncs::boxed());
builder.set_compose_func(AllUnicodeFuncs::boxed());
builder.set_decompose_func(AllUnicodeFuncs::boxed());
b.set_unicode_funcs(&builder.build());
b.guess_segment_properties();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc can we still make changes to the harfbuzz API? It would be nice if the builder setters were chainable:

    let funcs = UnicodeFuncsBuilder::new_with_empty_parent()
       .unwrap()
       .set_general_category_func(AllUnicodeFuncs::boxed())
       .set_combining_class_func(AllUnicodeFuncs::boxed())
       .set_mirroring_func(AllUnicodeFuncs::boxed())
       .set_script_func(AllUnicodeFuncs::boxed())
       .set_compose_func(AllUnicodeFuncs::boxed())
       .set_decompose_func(AllUnicodeFuncs::boxed())
       .build();
    let mut b = Buffer::with("مساء الخير");
    b.set_unicode_funcs(&funcs)

Copy link
Member

@sffc sffc Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was designed to closely mirror the C FFI. The "builder" is actually a real Harfbuzz C object. Every one of these setters calls FFI: it's not just a figment of Rust. The C FFI takes a mutable pointer and sets stuff on the mutable pointer, so the Rust API should by default do the same. The reason I called it a "builder", despite that term not appearing in the Harfbuzz docs, is because the C FFI has a function hb_unicode_funcs_make_immutable (which seems to be similar to the "freeze" concept in ICU4C), and this was the easiest way to track the state of whether the object was mutable or immutable.

https://harfbuzz.github.io/harfbuzz-hb-unicode.html

Now of course there is room to debate the API design in Rust; I took some liberty by calling the thing a "builder". However, I haven't seen evidence that my design has fatal flaws, and in fact this PR validates my approach. Changing between a mutating builder and a moving builder is a matter of debate that doesn't change what happens under the hood. And, the harfbuzz crate is external to ICU4X, living on its own lifecycle, and the maintainers of that crate already reviewed and had me change some aspects of the API before it landed. It is a crate designed for a narrow set of power users who know what they are doing. Most folks wanting to do text layout in Rust should hopefully be using a more end-to-end package instead of directly interfacing with harfbuzz.

For all these reasons, I would not personally be enthusiastic about a change of this nature.

If I were to make any change, it would be to rename UnicodeFuncsBuilder to MutableUnicodeFuncs and renaming .build() to .make_immutable() and having it return a ImmutableUnicodeFuncs. This would be the thing that most closely mirrors FFI.

tutorials/rust/harfbuzz/Cargo.toml Outdated Show resolved Hide resolved
sffc
sffc previously approved these changes Apr 23, 2024
ffi/harfbuzz/src/lib.rs Outdated Show resolved Hide resolved
sffc
sffc previously approved these changes Apr 23, 2024
Co-authored-by: Robert Bastian <[email protected]>
sffc
sffc previously approved these changes Apr 23, 2024
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a few open comments

@Manishearth Manishearth merged commit 03405c1 into unicode-org:main Apr 23, 2024
30 checks passed
@Manishearth Manishearth deleted the harfbuzz-traits branch April 23, 2024 17:52
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make icu_harfbuzz be safe and no_std
4 participants