Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Send for Document #123

Open
zippy2 opened this issue Nov 9, 2023 · 4 comments
Open

Implement Send for Document #123

zippy2 opened this issue Nov 9, 2023 · 4 comments

Comments

@zippy2
Copy link

zippy2 commented Nov 9, 2023

Document is declared as follows:

pub(crate) struct _Document {
  /// pointer to a libxml document
  pub(crate) doc_ptr: xmlDocPtr,
  /// hashed pointer-to-Node bookkeeping table
  nodes: HashMap<xmlNodePtr, Node>,
}

Now, looking into libxml sources, neither struct _xmlDoc nor struct _xmlNode contain any thread local data. Therefore, it should be possible for the Document to have Send trait.

@dginev
Copy link
Member

dginev commented Nov 9, 2023

You're right for a single document object in isolation, but I worry.

Consider a program that builds a document, then extracts some nodes from it for manipulation, then hands off the document to another thread (via Send) while keeping the nodes in the parent thread. It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread. For Document to be Send I would expect all of its pieces to be sent along with it (or ensure they are safe to live in all threads, via Sync).

I had tried my hand at a Sync interface back in #46 but didn't get too far. But I'm open to ideas.

@dginev
Copy link
Member

dginev commented Nov 9, 2023

If you don't need mutability, I could imagine extending the read-only treatment for nodes, which added a thread-safe RoNode struct mirroring Node also for document. So a new RoDocument struct, which is thread-safe, but disallows mutating anything inside the document (so not too useful beyond getting the root node as a RoNode and passing it on - which you can do already).

@zippy2
Copy link
Author

zippy2 commented Nov 9, 2023

. It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread.

Correct. And that's exactly the distinction between Sync and Send. My use case is a bit different: I want to have a "cache" of parsed (immutable) XML documents (basically a HashMap<Document>) and to allow multiple threads run XPATH queries I need Sync (solvable by adding a mutex), and Send. Right now, the cache stores str and each thread parses it again and again.

For Document to be Send I would expect all of its pieces to be sent along with it (or ensure they are safe to live in all threads, via Sync).

Agreed. But Sync can be supplied by lib users via Mutex.

I had tried my hand at a Sync interface back in #46 but didn't get too far. But I'm open to ideas.

Agreed. Native Sync is definitely out of question.

If you don't need mutability, I could imagine extending the read-only treatment for nodes, which added a thread-safe RoNode struct mirroring Node also for document. So a new RoDocument struct, which is thread-safe, but disallows mutating anything inside the document (so not too useful beyond getting the root node as a RoNode and passing it on - which you can do already).

My use case is basically evaluating XPATHs, so no mutation. And yeah, I think RoDocument would do the trick.

@dginev
Copy link
Member

dginev commented Nov 20, 2023

It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread.

Correct. And that's exactly the distinction between Sync and Send

Actually no, I don't think that is the difference. A Send object must be safely transferable to a thread (where no mutability remains possible in the parent thread), a Sync object must have a safe mutability interface that can be used cross-thread. In our libxml2 wrapper's case, neither interface is possible to guarantee if any mutability is allowed at all.

For Document to be safe to Send, the Rust philosophy is to only be able to send it over if no mutability is even theoretically possible on the parent thread.

So yes, RoDocument is the only way this would be possible with the current wrapper design. Until a PR for that lands, you are welcome to grab the root node as an RoNode and then Send that around as needed.

  let root: RoNode = doc.get_root_readonly().unwrap();

nertpinx pushed a commit to nertpinx/virt-lint that referenced this issue Nov 28, 2024
For the time being, I'm keeping both Rust and Lua validators.
I had to switch caps_cache to store Strings instead of parsed
documents, because:

  1) I'm ditching unmaintained sxd_document and sxd_xpath in
     favor of libxml,
  2) I'm working on Python bindings for which I need VirtLint
     struct to satisfy Send trait and neither sxd_document's
     Document, nor libxml's Document have that [1].

To avoid random test failures, warnings obtained via
VirtLint::warnings() are now sorted (readdir() and Lua and
stuff) which means that warning() method now has to take a
mutable reference to VirtLint object.

1: KWARC/rust-libxml#123

Signed-off-by: Michal Privoznik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants