Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about xpath // #26

Open
bitparity opened this issue Apr 26, 2022 · 4 comments
Open

Question about xpath // #26

bitparity opened this issue Apr 26, 2022 · 4 comments
Assignees

Comments

@bitparity
Copy link

So at 11:42 of "Advanced Digital Editing: Introduction to XPath II", it says ./descendent::head is the same as //head which I do find to be the case. But in the book XQuery for Humanists (p.62) it says A double slash (//) stands for /descendant-or-self::node()/.

I know from having debugged a problematic query that /descendant-or-self::node()/head is not the same as /descendent-or-self::head (particularly when it comes to looking for attributes within /head which I think are technically siblings, not descendents), but I don't know why, especially since functionally it seems to just make // equivalent to, as mentioned in the video, ./descendent::head.

Can you possibly explain the difference between the two definitions (yours and XQH's) for // ?

@gabrielbodard
Copy link
Member

Two differences:

  1. descendant-or-self:: is correct, because // can also find the root node, not only descendants of it;
  2. node() is technically correct, but irrelevant in practice—among other things it allows the XPath to match nodes other than elements, but I don't think attributes, text nodes, processing instructions or comments will ever have child nodes—at least not in the kind of XML we're likely to need to work with.

I think that the two definitions are functionally equivalent though. Can you find an example of an XPath match for /descendant-or-self::node()/element that gives different results or counts from /descendant::element ?

@bitparity
Copy link
Author

bitparity commented Apr 27, 2022

So I've managed to draw up a test example illustrating the issue.

Sample xml:

<body>
	<p lang="la" id="p-1">
		<s id="s-1">sent 1</s>
		<s id="s-2">sent 2</s>
	</p>
	<p lang="en" id="p-2">para 2</p>
</body>

The goal is to find all elements that have an @id attribute where the parent or self element has a @lang attribute.

The below two xpath searches are identical, as per the aforementioned definitions of // in the XQH book and the workshop youtube video. However, they don't seem to note the <p> element which has both @lang and @id attributes:

.//*[./@lang = "la"]/descendant-or-self::node()/*[./@id != ""]
.//*[./@lang = "la"]//*[./@id != ""]

returns

<s id="s-1">sent 1</s>
<s id="s-2">sent 2</s>

The below xpath search DOES note the <p> element with both @lang and @id attributes, raising the point of the dissimilarity between this xpath and the above two.

.//*[./@lang = "la"]/descendant-or-self::*[./@id != ""]

returns

<p lang="la" id="p-1">
   <s id="s-1">sent 1</s>
   <s id="s-2">sent 2</s>
</p>
<s id="s-1">sent 1</s>
<s id="s-2">sent 2</s>

I'm sure most of the time, this is just theoretical, but this is a specific instance where it affected one of my queries. I agree thinking of // as descendant:: is easier, which is why i was puzzled by the XQH book's full definition of /descendant-or-self::node()/, which appears to be both true AND confusing (since it apparently cancels the self part out).

@gabrielbodard
Copy link
Member

gabrielbodard commented Apr 27, 2022

Interestingly, this looks like it has just proved that when you want descendant-or-self::* you can't just use .//*, which in practice means descendant::*

So while I have no doubt the XQH definition is correct, it doesn't look like ours is wrong after all…

@bitparity
Copy link
Author

bitparity commented Apr 28, 2022

I think I realized what the problem was, from p.53 of the Walmsley XQuery book (which also gave the same node definition for //).

Whenever you type the name of an element after a /, it is technically child::element.

So //element is technically /descendent-or-self::node()/child::element, which forces the search for <element> down to the descendant but not the self of the context, making it different from /descendent-or-self::element.

I think anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants