Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the # sign to locate the URL of some properties that are not easy to reconcile with Wikidata(to be clarified) #107

Open
candlecao opened this issue Jul 26, 2024 · 17 comments
Labels
Priority: low Low priority

Comments

@candlecao
Copy link
Contributor

For example,
https://musicbrainz.org/doc/Event#Cancelled.

@candlecao candlecao added the Priority: low Low priority label Jul 26, 2024
@ahankinson
Copy link
Member

Why did you choose the document fragment delimiter?

@dchiller
Copy link
Contributor

I understand this is only an example, but I'm also not sure why, if you wanted to reconcile an event status as being cancelled, this would be hard to do in Wikidata?

@candlecao
Copy link
Contributor Author

Why did you choose the document fragment delimiter?

@ahankinson
Afaik, (1)The method is common in the area of linked data and information management. (2) In practice of Semantic Web, MusicBrainz itself is just a well-recognized metadata metadata schema. Many ontologies reuse it directly(see the picture).(3)The # symbol can directly locate to the attribute "Cancelled" on the page https://musicbrainz.org/doc/Event.
Screenshot 2024-07-29 at 11 10 07

@dchiller
It's not easy to find an exact match. Therefore, we can directly use the https://musicbrainz.org/doc/Event#Cancelled.

I am not necessarily surely right. It's just based on my experience.
Thank you for suggestions.

@dchiller
Copy link
Contributor

It's not easy to find an exact match.

https://www.wikidata.org/wiki/Q114342413 ?

Or if there isn't one we create it? I feel like we should be very, very intentional and careful about departing from the "Wikidata is our schema" principle. And it seems fishy to me that we would want to represent an object as a URI fragment.

@ahankinson
Copy link
Member

The document fragment component is a complicated identifier form to use correctly.

HTTP clients never send the fragment portion of a URL to the server. Thus if you need to look something up on the server based on the fragment identifier you can’t.

The fragment will prompt browsers to jump to that portion of the page if it is returned as HTML, but this changes if the output form of the HTML changes. Thus it isn’t really that great as a persistent identifier.

If you are only using it for a URI and never expect it to be retrievable, then it is fine as a CURIE value. So you should make sure your only purpose for using this is to have a unique URI.

@candlecao
Copy link
Contributor Author

candlecao commented Jul 30, 2024

It's not easy to find an exact match.

https://www.wikidata.org/wiki/Q114342413 ?

Or if there isn't one we create it? I feel like we should be very, very intentional and careful about departing from the "Wikidata is our schema" principle. And it seems fishy to me that we would want to represent an object as a URI fragment.

I believe "https://www.wikidata.org/wiki/Q114342413" is definitely not feasible. It's an entity (an instance or a class), rather than a property. A property’s URI typically contains “P” (e.g., …P…) rather than “Q” (e.g., …Q…).

I recall that ChatGPT understands the distinction between P and Q. So we had better adhere to this differentiation or at least not confuse the two. Subsequently, maintaining this distinction will significantly aid ChatGPT in converting natural language questions into SPARQL queries through unified reconciliation.

departing from the "Wikidata is our schema" principle--it's another topic we may talk someday.

However, if we used https://www.wikidata.org/wiki/Q114342413 indeed, it wouldn't cause any major disruptions either...

@candlecao
Copy link
Contributor Author

The document fragment component is a complicated identifier form to use correctly.

HTTP clients never send the fragment portion of a URL to the server. Thus if you need to look something up on the server based on the fragment identifier you can’t.

The fragment will prompt browsers to jump to that portion of the page if it is returned as HTML, but this changes if the output form of the HTML changes. Thus it isn’t really that great as a persistent identifier.

If you are only using it for a URI and never expect it to be retrievable, then it is fine as a CURIE value. So you should make sure your only purpose for using this is to have a unique URI.

OK, thank you. I got it a lot . At present, our purpose for using this is to have a unique URI. @Yueqiao12Zhang, for your reference too.

@fujinaga
Copy link
Member

What about this?
URI_of_an_event(e.g., concert) wdt:P793 wd:Q114342413

@dchiller
Copy link
Contributor

I believe "https://www.wikidata.org/wiki/Q114342413" is definitely not feasible. It's an entity (an instance or a class), rather than a property. A property’s URI typically contains “P” (e.g., …P…) rather than “Q” (e.g., …Q…).

Right, but what's the "property" you are trying to represent here? I mean, here's a way I could represent a cancelled event:

graph TD
   A[https://musicbrainz.org/event/d801c6c8-7870-4506-9709-af3890bf1e74] --> B[P31: is instance of]
   B --> C[Q114342413]
Loading

I'm not sure what the property would be of what you are doing? This? Why?

graph TD
   A[https://musicbrainz.org/event/d801c6c8-7870-4506-9709-af3890bf1e74] --> B["has cancelled event status"]
   B --> C["True"]
Loading

@Yueqiao12Zhang
Copy link
Contributor

I believe "https://www.wikidata.org/wiki/Q114342413" is definitely not feasible. It's an entity (an instance or a class), rather than a property. A property’s URI typically contains “P” (e.g., …P…) rather than “Q” (e.g., …Q…).

Right, but what's the "property" you are trying to represent here? I mean, here's a way I could represent a cancelled event:

graph TD
   A[https://musicbrainz.org/event/d801c6c8-7870-4506-9709-af3890bf1e74] --> B[P31: is instance of]
   B --> C[Q114342413]
Loading

I agree with this one. It makes sense. However, since in the original JSON, the object "cancelled" can only have a boolean value. We can only modify the conversion script code to customize it. Adding some specific customization to the conversion might cause bugs.

@fujinaga
Copy link
Member

The whole point of this exercise is to find out what needs conversion customization and what can be automatically converted with OpenRefine. What's important is to identify the challenges, such as this example.

@ahankinson
Copy link
Member

Will you need to model cancelled events that are not in musicbrainz?

@fujinaga
Copy link
Member

fujinaga commented Aug 2, 2024

Will you need to model cancelled events that are not in musicbrainz?

Yes, for example, thesession.org also has a cancelled field.

@ahankinson
Copy link
Member

So probably a more generic but pre-defined ontology like Wikidata is more suitable than a bespoke vocabulary from MusicBrainz?

@fujinaga
Copy link
Member

fujinaga commented Aug 2, 2024

Agreed. So let's go with this one:

graph TD
   A[https://musicbrainz.org/event/d801c6c8-7870-4506-9709-af3890bf1e74] --> B[P31: is instance of]
   B --> C[Q114342413]
Loading

@candlecao
Copy link
Contributor Author

OK, to confirm: for all entities categorized as events that have an attribute indicating whether they are "cancelled or not cancelled", we can temporarily apply the following pattern:
<an event> wdt:P31 <wd:Q114342413> .
We may draft a principle of all the reconciliation, especially for particular case like this. @fujinaga
@Yueqiao12Zhang

@candlecao
Copy link
Contributor Author

So probably a more generic but pre-defined ontology like Wikidata is more suitable than a bespoke vocabulary from MusicBrainz?

Yes, I basically agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: low Low priority
Projects
None yet
Development

No branches or pull requests

5 participants