Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-C extension support #193

Open
headius opened this issue Jul 9, 2024 · 10 comments
Open

Non-C extension support #193

headius opened this issue Jul 9, 2024 · 10 comments

Comments

@headius
Copy link

headius commented Jul 9, 2024

I would like to be able to support this library in JRuby, but the C extension is a limiting factor. JRuby does not support the CRuby extension API.

We do, however, support some alternative mechanisms for binding native libraries:

  • Ruby FFI: this would require some manual work to write the FFI binding, but since you have a standalone C library it might make the most sense. See sassc for an example library that ships a standalone dynamic library with an FFI binding.
  • Java Native Runtime (JNR) FFI: This is our current Java/C-based FFI backend, and upon this we build Ruby FFI and several other features in JRuby. Supporting JNR would make this library accessible to the entire JVM community, including JRuby.
  • Java Foreign Function and Memory API: This is a new API in recent versions of Java designed to integrate with the JVM memory model and JIT (for better performance). It also supports a code generator called "jextract" that can generate the entire API binding from a header file.

Some combination of these APIs should get us there!

@mohamedhafez
Copy link
Contributor

I'd be willing to put a bounty of a couple thousand dollars on this, if the trilogy team agrees to allow work on this...

@mohamedhafez
Copy link
Contributor

@jhawthorn any chance we could get a comment on whether the trilogy team would accept work on this? In conversation with @headius he's mentioned it could be done in a way that doesn't at all change how trilogy works on regular MRI Ruby with the C-extension, and would just make it available to JRuby users like myself, which would be a great help

@matthewd
Copy link
Contributor

I guess I'm not super clear what advantage this library offers for JRuby users... I would've thought one would prefer to use a native JDBC approach or something along those lines?

I think that any FFI-like interface would eliminate the performance advantages that come from being very aware of where & when any allocations & memory copies have to occur, and so would certainly not be a viable option for the CRuby implementation. At that point, it feels like we'd essentially be talking about a completely separate (JRuby-specific) gem that uses the Trilogy C library?

@mohamedhafez
Copy link
Contributor

mohamedhafez commented Sep 13, 2024

Regarding the benefit to JRuby users: the native JDBC integration with rails (ar-jdbc) isn't trivial to maintain, and needs lots of work every time a new Rails minor version is released, and unfortunately because of this frequently lags behind the latest Rails releases (Rails 7.0 is the latest release supported right now for example). In addition, sometimes there are issues, or just differences in configuration and database setup, that can crop up unexpectedly, and it can be a headache to figure out what's going wrong and how to work around it (e.g. jruby/activerecord-jdbc-adapter#897, jruby/activerecord-jdbc-adapter#1091). It would be nice have something that was closer to an exact replacement of what C-Ruby users are using so we'd have the option of never having to worry about that kind of thing happening again, even if it came at a bit of performance cost. (I'm currently possibly dealing with an issue right now, and would love to have this option)

In short, full compatibility that we'd get automatically the minute a new Rails version is released, with much, much less room for error, is why I personally would want this. Man I've never been great at summarizing maybe i should have just left it at that in the first place 😅

As far as whether we are essentially talking about a JRuby-specific gem that uses the Trilogy C library, and whether FFI would result in a performance penalty in the CRuby gem even with YJIT possibly being able to optimize better... I don't have the experience to speak on that unfortunately, but I bet @headius could chime in on that!

@headius
Copy link
Author

headius commented Sep 18, 2024

Happy to jump in here with some additional info.

First off, the main reason why it would help us if Trilogy supported JRuby is as @mohamedhafez mentioned: we'd use exactly the same code from the driver level up, allowing us to exactly trace Rails as new releases come out. Currently, we have to do a bit of work each release to update the JDBC-based adapter for each new Rails release.

Second, the FFI version may introduce more overhead, but it is the most efficient way for this driver to support Ruby implementations other than CRuby. JRuby does not implement the C extension API, and TruffleRuby implements it but with a high-overhead interface that is probably slower than FFI. Both implementations could use an FFI version.

Third, I don't believe it requires a completely separate gem. The interface to the C code appears to be rather compact and easy to abstract for both the C extension and the FFI version. Above that, the code would be the same for all implementations.

My assumptions about the design of this project may be flawed, but I did not see anything in the extension interface that would be difficult to implement via an FFI version.

@headius
Copy link
Author

headius commented Oct 24, 2024

Did a bit more exploration here and I think there's three options going forward (and a fourth kinda crazy one).

Option #1: Build a JNI (Java Native Interface) extension

The most direct analog here would be to write a JNI extension for the JVM that wraps the Trilogy library in the same way as the CRuby extension. The Trilogy functions are well-isolated from the internals of CRuby, and we could mimic the same structure in the JNI extension.

We would then build some shim code to call that JNI extension from JRuby and use most of the remaining Ruby code as-is. There could be very little or almost no JRuby-specific Java code if we just used JRuby's Java integration.

Benefits:

  • It would look and function very similarly to the CRuby extension.
  • The extension, once written, would require few changes.
  • Other JVM languages could make use of the same extension.

Issues:

  • The JNI extension would need to be built for each platform, possibly on install of the gem.
  • JNI is not a particularly efficient interface, especially compared to the CRuby extension API, so it would be difficult to match performance.
  • Nobody related to JRuby would want to write or maintain such an extension.
  • Any OpenSSL exposure would have to be dealt with, since JVM does not use OpenSSL (see below).

Option #2: Use OpenJDK Panama

In this scenario, we would use the Panama jextract tool to generate a Java FFM (Foreign Function and Memory) wrapper around Trilogy as a dynamic library. jextract would create Java APIs and classes to represent the necessary functions and structs, and ideally have much lower overhead than a hand-build JNI extension.

I just ran an experiment using jextract and the Trilogy header files. It appears that all of Trilogy's primary functions and structures generated ok, but the header files include a lot of internal constants, structures, and functions that are probably not needed in the public API. In addition, because the Trilogy header pulls in OpenSSL, jextract also attempted to generate bindings for the entire OpenSSL library. We would want to limit the generation to just the necessary Trilogy functions, but I'm not sure what's included in that list.

Benefits

  • Automatic generation of the binding, which could be done on each platform we hope to support and shipped as a public JVM library or as part of the gem.
  • Improved performance over the JNI extension.
  • Easier maintenance compared to the JNI extension.
  • The resulting bindings would be usable by the rest of the JVM ecosystem.

Risks

  • If OpenSSL elements are exposed through the Trilogy API, that could be a complication. The JVM does not use OpenSSL for encryption, so we'd have to negotiate between the JVM security subsystem and any required OpenSSL functions and structures consumed by Trilogy. My hope is that the OpenSSL-dependent portion of Trilogy is well-encapsulated and we don't need to interact with OpenSSL directly to use Trilogy.
  • I have not managed to test this generated binding at all.
  • Panama is only a preview feature even in the latest JDK, so we're on the bleeding edge here. But Trilogy is pretty bleeding edge anyway.
  • It still may be slower than using existing pure-Java JDBC drivers for MySQL (but we those require more adaptation to work with ActiveRecord).

Option #3: Ruby FFI bindings

This option would use the existing Ruby FFI support to bind all appropriate Trilogy functions. The remaining logic would live in Ruby to adapt those functions to the rest of the Trilogy Ruby code in the same way as the CRuby extension.

Benefits

  • Pure Ruby! All we need is a Trilogy dynamic library and we can do everything the existing CRuby extension does.
  • Usable by all Ruby implementations that support FFI.
  • No new native or Java code required.

Issues

  • Performance would almost certainly be the worst of these three options, due to the increased amount of Ruby code and the inefficiencies of the FFI interface.
  • There are few fully-working generators for Ruby FFI so in the worst case we'd have to write the bindings manually. But we'd only need to do it once.
  • No exposure of Trilogy to other consumers in the JVM ecosystem.
  • Possibly the most manual work of the three options (though JNI is right up there).
  • OpenSSL exposure is again a concern here.

Bonus crazy option #4: Compile Trilogy to WASM and run it with a JVM WASM runtime

This sounds crazy, but it's actually a fallback option for JRuby's use of the Prism parser. We compile the Trilogy library to WASM, and then run that through the Chicory JVM WASM runtime. If that runtime can optimize the resulting code down to JVM bytecode, and the JVM can optimize that bytecode to native, it might have good enough performance to be usable without requiring any native library whatsoever. We would then just wrap that WASM version of Trilogy with appropriate Java and Ruby code to adapt it to the rest of the Trilogy Ruby code.

Benefits

  • Nothing to write other than code to adapt the WASM-compiled library to Java or Ruby.
  • Platform independence; no native libraries need to be built for each platform and no new native code is introduced.
  • Usable by the rest of the JVM ecosystem.

Issues

  • It's a little crazy, but we've already got a working example in JRuby's WASM Prism.
  • Performance will be worse than running Trilogy natively. It would probably be better than using pure Ruby FFI. Hard to say if it would compare with JNI or Panama performance.
  • OpenSSL exposure would likely be a deal-breaker, since we're not going to try to run a WASM OpenSSL.
  • Requires an in-development bleeding-edge WASM runtime but likely can run on almost any JDK.

Bottom Line

If Trilogy is as fast as claimed, then it could be a useful library to much more than just the Ruby ecosystem, and it's worth exploring making it a more general-purpose MySQL library. That would mean formalizing the public API and also producing a dynamic library rather than just the static library currently used by the CRuby extension.

If we agree that it could be useful outside of Ruby, then it's also worth exploring ways to bind it for the JVM. This would enable all entities in the JVM ecosystem to take advantage of it, and more importantly it would make it easier for Ruby implementations on JVM like JRuby to keep up with ActiveRecord MySQL support in new versions of Rails (requiring just a bit of binding work when Trilogy is updated, similar effort to maintaining the CRuby extension).

Even if it is not intended to be used outside of Ruby, providing bindings for JRuby would ease the upgrade path for JRuby's ActiveRecord support compared to the JDBC adaptation we do today.

I'm available to chat about this any time. If the work were to go forward, I believe the JRuby team could commit some resources to it, at least to build a proof-of-concept binding we can use to evaluate the overall idea.

@headius
Copy link
Author

headius commented Oct 24, 2024

I should also add here that part of the risk here is that the current JRuby ActiveRecord-JDBC binding for MySQL is not all that bad to maintain. The code diff from Rails is the second-smallest (of the three core databases) and we usually can turn around updated Rails support fairly quickly. The benefit of using Trilogy directly would be that almost no work is required when a new ActiveRecord comes out, and it would "just work" as long as Trilogy didn't need additional changes (which would affect CRuby in the same way).

@bhelx
Copy link

bhelx commented Oct 25, 2024

Happy to support anyone who wants to try the Chicory Wasm implementation. We're close to stabilizing our API and have our compiler working. I can't step-by-step say how all this will work without spending some more time on it, but I think it will be faster than you think and the whole plan doesn't sound unreasonable to me.

@headius
Copy link
Author

headius commented Oct 25, 2024

@bhelx We would love to give it a try! I think the two most interesting options are going to be Panama and Chicory. I would like to move forward with a Panama proof of concept once we confirm that OpenSSL is not exposed to consumers of the Trilogy API. The Chicory version could be attempted any time, but I wouldn't have the cycles for it for a couple of weeks at least.

And both of these options require Trilogy to have a dynamic library build target and a list of public API functions to expose.

@headius
Copy link
Author

headius commented Oct 25, 2024

@matthewd Maybe you can answer some of the questions in this thread?

  • What functions are included in the public API of the Trilogy library (not C extension)?
  • Does that API expose OpenSSL to consumers (i.e. consumers have to also work with OpenSSL data and functions)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants