[feature request] Make C-extension threadsafe/Ractor-safe #3283
Replies: 6 comments 5 replies
-
Libxml2 doesn't support concurrent modifications the same document. See https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Thread-safety |
Beta Was this translation helpful? Give feedback.
-
So the way Ractors work is that only one of them can access a given document object at a time, so libxml2's limitation of not supporting concurrent modifications on the same document actually shouldn't be an issue: https://ruby-doc.org/core-3.0.0/Ractor.html What I'm hoping to avoid is that accessing different document objects can't be done concurrently, which is currently the case. According to the link you posted, libxml2 explicitly allows this as long as you:
So I'm hoping this actually should be trivial! (I'm addressing the use case of Ractors only here, since thats the only way it would happen in canonical, regular C-Ruby. |
Beta Was this translation helpful? Give feedback.
-
@eregon perhaps you or someone on the TruffleRuby team could lend a little more gravitas to my argument above? 😅 |
Beta Was this translation helpful? Give feedback.
-
@mohamedhafez Thanks for opening this issue. Earlier this year I spent some time exploring how ractors and the sqlite3 gem interact, so I have questions. Have you tried parsing and manipulating documents in different ractors? What was your experience like? What worked and what didn't work? Our mental model is that although libxml2 doesn't support concurrent operations within a single document, each ractor should be able to parse and manipulate a separate document, and I'd like to update our mental model if your experience has been something different. When you say "support for ractors" I'm trying to understand your specific use case, and what specific error message motivated you to open this issue. Passing objects between ractors can be hard for complex object graphs, and so any additional information you can provide would help me form better mental models. |
Beta Was this translation helpful? Give feedback.
-
@flavorjones so the mental model you mentioned, of each ractor should be able to parse and manipulate a separate document, is exactly what I'm hoping for. Currently, if you try to use Nokogiri in a Ractor, it will fail with a ~ $ curl 'https://nokogiri.org/tutorials/installing_nokogiri.html' > /tmp/installing_nokogiri.html
~ $ irb
3.3.3 :001 > require 'nokogiri'
=> true
3.3.3 :002 > Ractor.new { puts Nokogiri::HTML(File.open("/tmp/installing_nokogiri.html")).inspect }
(irb):2: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
=> #<Ractor:#2 (irb):2 blocking>
#<Thread:0x000000011f3d31d0 run> terminated with exception (report_on_exception is true):
/Users/mohamed/.rvm/gems/ruby-3.3.3/gems/nokogiri-1.16.6-arm64-darwin/lib/nokogiri/html4/document.rb:194:in `read_io': ractor unsafe method called from not main ractor (Ractor::UnsafeError)
from /Users/mohamed/.rvm/gems/ruby-3.3.3/gems/nokogiri-1.16.6-arm64-darwin/lib/nokogiri/html4/document.rb:194:in `parse'
from /Users/mohamed/.rvm/gems/ruby-3.3.3/gems/nokogiri-1.16.6-arm64-darwin/lib/nokogiri/html4.rb:11:in `HTML4'
from (irb):2:in `block in <top (required)>' This is the expected behavior: https://docs.ruby-lang.org/en/3.3/extension_rdoc.html#label-Appendix+F.+Ractor+support. According to that doc, the fix basically boils down to make sure you protect access to global variables with a Mutex, and make sure any external libraries like libxml2 are safe to access from different threads, and then call |
Beta Was this translation helpful? Give feedback.
-
@flavorjones, following @eregon reporting in #3283 (reply in thread) that nokogiri already configures libxml to be multithreaded, I've been running my test workload on TruffleRuby with the C-extension lock turned off, and no issues as far as I can tell! I've got 10 threads cycling through 50 jobs (each job consists of downloading a webpage and processing it with Nokogiri to pick out a bunch of info from it). Then there's a 1 second pause, then I repeat. I've had that running for a couple hours now and no problems! |
Beta Was this translation helpful? Give feedback.
-
In planning ahead to the near future when TruffleRuby can run C-extensions marked with
rb_ext_ractor_safe(true)
in parallel, and for when Ractors are no longer just experimental, it would be great if the C-extension could be made threadsafe, or marked as such if it already is so!Beta Was this translation helpful? Give feedback.
All reactions