Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Query Experiment for The Session Database: Make a contrast between the webpage query and SPARQL query #1

Open
candlecao opened this issue Aug 5, 2024 · 26 comments
Assignees
Labels
priority: high high priority

Comments

@candlecao
Copy link
Contributor

  1. I will inspect data reconciliation first.

  2. Try seeing the advantage of SPARQL query: like some queries that cannot be realized on webpage but realizable by SPARQL.

  3. Try using LLM2SPARQL.

@fujinaga

@candlecao candlecao self-assigned this Aug 5, 2024
@candlecao candlecao added priority: high high priority and removed priority: high high priority labels Aug 5, 2024
@candlecao
Copy link
Contributor Author

candlecao commented Aug 13, 2024

For example, search under the option "Sessions":

  1. Webpage Query

The returned result includes the geographical info. By clicking on the blue label (highlighted in the search results), you can view more details about a specific session:
image

@candlecao
Copy link
Contributor Author

candlecao commented Aug 13, 2024

  1. Query using SPARQL against RDF in our Virtuoso
    but there is no property to decide whether it's Sunday or Monday or ...
prefix wdt: <http://www.wikidata.org/prop/direct/>
select * from <http://sample/thesession/reconciled>
where{
  ?session a <https://thesession.org/sessions>; # Sometimes you will find it important to assert the type.
          wdt:P17 “Australia”;
          wdt:P2561 ?name;
          ?p ?o
}

(The named graph will be updated from time to time.)
After execution, we get a lot of results. Just take “The last Supper” for example:

<style> </style>
session name p o
https://thesession.org/sessions/8851 The Last Supper http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q932410
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P2561 The Last Supper
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P131 https://www.wikidata.org/wiki/Q11568
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P131 Darwin
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P17 https://www.wikidata.org/wiki/Q408
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P17 Australia
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P276 https://www.wikidata.org/wiki/Q3235
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P276 Northern Territory
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P6375 1/35 Cavenagh St
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P625 Point(-12.46182823 130.84178162)
https://thesession.org/sessions/8851 The Last Supper http://www.wikidata.org/prop/direct/P585 4/14/2024 8:05

@candlecao
Copy link
Contributor Author

candlecao commented Aug 13, 2024

Example 2: Search under the tab "Tunes"

https://thesession.org/tunes

  1. Webpage Query
    -- jigs in D major for "colm".
    image
    The searching result are 2 items also shown above.
    Please refer to:

    https://thesession.org/tunes/search?type=jig&mode=Dmajor&q=colm

  2. Query using SPARQL against RDF in our Virtuoso
    By searching with code:

prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix wd:<http://www.wikidata.org/entity/>
prefix wdt:<http://www.wikidata.org/prop/direct/>
select distinct * from <http://sample/thesession/reconciled>
where {
?Tune rdf:type wd:Q170412; 
           wdt:P2561 ?nameOfTune;
           wdt:P136 "jig";
           wdt:P136 ?Type;
           wdt:P826 "D major".
filter(contains(str(?nameOfTune), "colm")).
}

you can get result like:
image

However, we cannot find the composer Jerry Holland who composed Malcolm's New Fiddle:
Screenshot 2024-08-13 at 11 59 30
The reason is there is no attribute of composer/creator for the entity "tunes" in the original dumped data.

@fujinaga
Copy link
Member

  1. Query using SPARQL against RDF in our Virtuoso
    but there is no property to decide whether it's Sunday or Monday or ...

Is the day of the week information in Virtuoso?
The property for day of week is P2894.

@fujinaga
Copy link
Member

you can get result like:
image

You only get one result? TheSession returns two results. What is causing the difference?

@candlecao
Copy link
Contributor Author

  1. Query using SPARQL against RDF in our Virtuoso
    but there is no property to decide whether it's Sunday or Monday or ...

Is the day of the week information in Virtuoso? The property for day of week is P2894.

No, neither in Virtuoso nor in the original CSV dumped from the database. We will upload again a new version of RDF of TheSession to Virtuoso soon, because we updated some data reconciliation yesterday.

@candlecao
Copy link
Contributor Author

candlecao commented Aug 13, 2024

you can get result like:
image

You only get one result? TheSession returns two results. What is causing the difference?

Hi, Ich. Now the updated results are two.

@candlecao
Copy link
Contributor Author

candlecao commented Aug 13, 2024

Example 3: Find those entities or properties which are not displayed on the webpage.

  • In preparation of this, we should know what are in the database, namely exploring the structure of the database beforehand.

  • For example, we can explore the chains consisting of classes and properties by executing:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?classOfDomain ?property ?classOfRange
FROM <http://sample/thesession/reconciled>
WHERE {
  {?subject a ?classOfDomain.
  ?subject ?property ?object1 .
  FILTER (?property != rdf:type)
  }
  union
  {?subject a ?classOfDomain.
  ?subject ?property ?object1 .
  ?object1 a ?classOfRange.}
}

(Please refer to DDMAL/linkedmusic-datalake#88)

The searching results are like:
image
...
image

Then we can see a class that is "playlist" https://www.wikidata.org/wiki/Q1569406 that matches the "sets" of The Session.

So, we can query against sets:

Such as entity "sets".

...

@candlecao
Copy link
Contributor Author

candlecao commented Aug 13, 2024

  • And we can see its(sets') relationship with tunes:
    (wd:Q1569406, playlist/set)-[https://thesession.org/tunes]->(wd:Q170412, melody).# I just use this pattern format to denote class in () and property in [] and arrow for the direction of "predicate";
  • We can also see what attributes does set have, such as name(wdt:P2561), type(wdt:P136), meter(wdt:P3440).
  • Above all, for example, if you want to see the related tunes, type, meter of a tune_set whose name is “12 Weeks And A Day”, you can query using SPARQL:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
select * from <http://sample/thesession/reconciled>
where{
?entity rdf:type wd:Q1569406; # The assertion of its type/class is somtimes very important, especially when the properties of this entity are commonly used such as wdt:P2561.
      wdt:P2561 "12 Weeks And A Day";
      <https://thesession.org/tunes> ?tune;
      wdt:P136 ?type;
      wdt:P136 ?type_wiki;
      wdt:P3440 ?meter
}

@fujinaga
Copy link
Member

Did you find out how to get day of week into RDF?
Are day of the week in the original CSV?

@candlecao
Copy link
Contributor Author

candlecao commented Aug 14, 2024

(1)Not yet. But we may use SPARQL to transfer the date into day of week, such as:

prefix wdt: <http://www.wikidata.org/prop/direct/>
select * from <http://sample/thesession/reconciled/updated>
where {
  ?session a <https://thesession.org/sessions>;  # Asserting the type is important
           wdt:P17 "Australia";
           wdt:P2561 ?name;
           wdt:P585 ?date.
  
  # Convert the date to xsd:dateTime
  BIND(xsd:dateTime(?date) AS ?dateTime)
  
  # Filter for Sundays (DAYOFWEEK returns 7 for Sunday)
  FILTER(DAYOFWEEK(?dateTime) = 7)
}
limit 20

It needs to be adjusted a little to be executable. I may try it tomorrow.
(2)

Are day of the week in the original CSV?

No.

@candlecao
Copy link
Contributor Author

candlecao commented Aug 14, 2024

Example 4: Search the chains of entity1->property1->entity2->propery2..., which can not be directly searched on the webpage.

Regarding the aforementioned "chain", we can find quite a long chain of properties or classes in the structure of the database:

(value of type/genre)<-[type/genre, wdt:P136]-(tune_set, wd:Q1569406)-[corresponds to, https://thesession.org/tunes]->(melody/tune, wd:Q170412)<-[corresponds to, https://thesession.org/tunes]-(Recording, Q49017950)-[artist, https://musicbrainz.org/doc/Recording#Artist]->(value of artist)

thus, we may query with “a chain of quesiton”:

Please find some tune_set with type/genre “polka”. The tune_set should have tunes which "Monks Of The Screw Trio"’s recording also has.(Monks Of The Screw Trio is the name of (an) artist(s) .)

The corresponding SPARQL can be:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
select * from <http://sample/thesession/reconciled>
where{
  ?tune_set rdf:type wd:Q1569406;# We possibly don’t need this triples...
                    wdt:P136 "polka";
                    <https://thesession.org/tunes> ?tune.
  ?Recording rdf:type wd:Q49017950;
                      <https://musicbrainz.org/doc/Recording#Artist> "Monks Of The Screw Trio";
                      <https://thesession.org/tunes> ?tune.
}

The query result includes:

<style> </style>
tune_set tune recording
https://thesession.org/members/104992/sets/12112 https://thesession.org/tunes/259 https://thesession.org/recordings/1001
https://thesession.org/members/101858/sets/52599 https://thesession.org/tunes/1650 https://thesession.org/recordings/1001
https://thesession.org/members/101941/sets/22891 https://thesession.org/tunes/17936 https://thesession.org/recordings/1001
https://thesession.org/members/105138/sets/3268 https://thesession.org/tunes/259 https://thesession.org/recordings/1001

@fujinaga
Copy link
Member

Are day of the week in the original CSV?

No.

Interesting. I guess they are finding the day of the week on the fly. We probably should store that in the RDF.

@candlecao
Copy link
Contributor Author

candlecao commented Aug 16, 2024

For the Oct 26 meeting, I may conduct experiments through these steps(draft):

  • Session

eg: questions which can not be answered directly via The session webpage; questions combining various filtering condition

  • Session + wikidata
  • Session + wikidata + musicBrainz
  • LLM2SPARQL--especially make it flexible.

eg:

  1. ... Who played on a particular album/recording in The Session Database. We will have to refer to MusicBrainz.
  2. Something that MusicBrainz doesn’t have.(Like MusicBrainz doesn’t have concert info) That needs information from The session. Eg: I want to find whether a certain artist played in Ireland / in Ontario Canada.

I will appeal to our colleagues in the lab to contribute with all kinds of questions.

@candlecao
Copy link
Contributor Author

Hi, Ich. I didn't have enough time to note down the example questions on the meeting yesterday. If you could have a look at the above outline or samples, I would appreciate it. @fujinaga

@fujinaga
Copy link
Member

What we need are several examples of the following:

  1. a query on the Session webpage (screenshot), a one-page result of the query (screenshot).
  2. an equivalent natural language query.
  3. result of asking that query on ChatGPT to create a SPARQL query
  4. show the SPARQL query in Virtuoso (a screenshot)
  5. show the result of the SPARQL query from Virtuoso
    The result should be same as the result from the webpage.

After that we need queries that cannot be answered on the webpage, but can be answered with Virtuoso, which includes data from Wikidata, e.g.: "Find Canadian performers over 60 years old on the Session database."

We would also like to find queries that involve input from MusicBrainz, e.g.:
"What are the keys of the tunes that John Doyle recorded with the fiddler Liz Carroll?"

I'll solicit other queries from people; maybe on Slack.

@candlecao
Copy link
Contributor Author

Thank you, Ich. That's very clear, especially for me to make slides of the power point for the Oct. 26 meeting.

@candlecao
Copy link
Contributor Author

candlecao commented Aug 19, 2024

Example 5-1 Query across TheSession and Wikidata

Question: In The Session database, what tunes are of "Scottish folk dance"(which corresponds to wd:Q110558158)?

For the syntax of query between an local database and an external database, please refer to
https://github.com/DDMAL/linkedmusic-datalake/wiki/How-to-Query-across-different-Databases(Different-RDF-Data-Source)

#Question: In The Session database, what tunes are of Scottish folk dance? 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
select distinct ?Tune_id ?name ?Tune_Type ?label 
where {
     service <https://query.wikidata.org/sparql> {
         ?Tune_Type wdt:P279 wd:Q110558158; #P279: subclass of; Q110558158: Scottish folk dance
                              rdfs:label ?label
     }     
     graph <http://sample/thesession/reconciled> {
         ?Tune_id wdt:P136 ?Tune_Type;
                         wdt:P2561 ?name
     }
}
limit 100

Be careful, we had better not swap the positions of snippets service... and graph..., what if you do, it would increase the complexity of traversal of algorithm then cause delayed response.

The query result is:
image

Hi Ich, Yueqiao tried again to reconcile the performers of The Session with those performers of Wikidata. But less than 1% can be reconciled.(Later, I will help check too.) So, for
e.g.: "Find Canadian performers over 60 years old on the Session database."
It's not feasible.
No worry, right now, we have tried other cases instead, just to illustrate query across The Session and Wikiddata.

@candlecao
Copy link
Contributor Author

Example 5-2 Query across TheSession and Wikidata

Question: In The Session database, how many tunes respectively for each genre which is of "Scottish folk dance"(which corresponds to wd:Q110558158)?

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
select distinct ?Tune_Type (COUNT(?Tune_id) as ?TuneCount)
where {
  service <https://query.wikidata.org/sparql> {
    ?Tune_Type wdt:P279 wd:Q110558158.  # P279: subclass of; Q110558158: Scottish folk dance
                         #rdfs:label ?label
  }
  graph <http://sample/thesession/reconciled> {
    ?Tune_id wdt:P136 ?Tune_Type.
                    #wdt:P2561 ?name
  }     
}
GROUP BY ?Tune_Type

The query result is:

SPARQL | HTML5 table Tune_Type | TuneCount -- | -- http://www.wikidata.org/entity/Q1079270 | 32090 http://www.wikidata.org/entity/Q118120 | 41472

SPARQL | HTML5 table
Tune_Type TuneCount
http://www.wikidata.org/entity/Q1079270
32090
http://www.wikidata.org/entity/Q118120
41472

@candlecao
Copy link
Contributor Author

There are 32090 tunes of the genre Q1079270(jig) and 41472 tunes of the genre Q118120(reel).

@candlecao
Copy link
Contributor Author

candlecao commented Aug 22, 2024

I list the prospective questions for query on the webpage or using SPARQL.

  1. Queries executable both on TheSession webpage and using SPARQL against Virtuoso

1.1 Find sessions that took place (on Sundays and) in Australia

1.2 Find a tune that is of the type "gigs," in D major, and has "colm" in its name

1.3 Find a recording that is performed by the folk bank "The Chieftains"

The Chieftains: http://www.wikidata.org/entity/Q662577

  1. Queries not executable on TheSession webpage but executable using SPARQL

2.1 What tunes are contained in the recordings of "The Chieftains"?

2.2 Show me the sessions that took place before the session "Steampacket Hotel".(sorted from distant past to recent past)

2.3 Please find the session that took place within a 50-kilometer radius centered on the session "Canadian Legion"

2.4 What is the third tune(or last tune) in the tune set created on March 31st 2019 by a user called Delattre?

  1. Queries across TheSession and WikiData

3.0 From TheSession, please find recordings that were published between 2007 and 2012

We will use a property wdt:P577 from Wikidata.

3.1 Which session took place in Eastern Europe?

3.2 Please find other melodies(tunes, wd:Q170412) on Wikidata, which have the same genres with the tune "100 Watt Reels" in The Session

3.3 Give us a general introduction (from Wikidata) of the performer of the recording "Celtic Wedding"

3.4 A spare question: In The Session database, what tunes are of Scottish folk dance?

  1. Queries across TheSession and MusicBrainz

4.1 Has the label "Adrienne Young" ever issued some album that can be found in TheSession?

4.2 Find 3 albums in MusicbBrainz respectively corresponding to each different genre appearing in the TheSession

4.3 Which country did the event "Celtic Festival 2006" take place in? At the same year of its occurrence, is there some album published in this country which can be found in MusicBrainz?

@candlecao
Copy link
Contributor Author

@fujinaga Hi Ich, please refer to the questions as above. Your advice would be appreciated.

@fujinaga
Copy link
Member

I list the prospective questions for query on the webpage or using SPARQL.

  1. Queries executable both on TheSession webpage and using SPARQL against Virtuoso

1.1 Find sessions that took place (on Sundays and) in Australia

1.2 Find a tune that is of the type "gigs," in D major, and has "colm" in its name

1.3 Find a recording that is performed by the folk bank "The Chieftains"

The Chieftains: http://www.wikidata.org/entity/Q662577

The Chieftains are well-known group and one doesn't need to TheSession for that.

  1. Queries not executable on TheSession webpage but executable using SPARQL

2.1 What tunes are contained in the recordings of "The Chieftains"?

You can find this using Wikipedia.

2.2 Show me the sessions that took place before the session "Steampacket Hotel".(sorted from distant past to recent past)

What is ""Steampacket Hotel"? How do I find this on TheSession?

2.3 Please find the session that took place within a 50-kilometer radius centered on the session "Canadian Legion"

What is "Canadian Legion"?

2.4 What is the third tune(or last tune) in the tune set created on March 31st 2019 by a user called Delattre?

What is the significance of this? Who is Delattre?

  1. Queries across TheSession and WikiData

3.0 From TheSession, please find recordings that were published between 2007 and 2012

We will use a property wdt:P577 from Wikidata.

Doesn't TheSession database include recording dates?

3.1 Which session took place in Eastern Europe?

Yes!

3.2 Please find other melodies(tunes, wd:Q170412) on Wikidata, which have the same genres with the tune "100 Watt Reels" in The Session

3.3 Give us a general introduction (from Wikidata) of the performer of the recording "Celtic Wedding"

This the recording by The Chieftains? You can get this information from Wikipedia.

3.4 A spare question: In The Session database, what tunes are of Scottish folk dance?

I don't understand the significance of this. What are you looking for?

  1. Queries across TheSession and MusicBrainz

4.1 Has the label "Adrienne Young" ever issued some album that can be found in TheSession?

I think the reverse would be more interesting: Which albums in TheSession are published by "Adrienne Young"?

4.2 Find 3 albums in MusicbBrainz respectively corresponding to each different genre appearing in the TheSession

You don't need TheSession for this. The genres, such as jigs and reels are well known.

4.3 Which country did the event "Celtic Festival 2006" take place in?
Is this the "Hebridean Celtic Festival 2006"? How do I find this in TheSession?

At the same year of its occurrence, is there some album published in this country which can be found in MusicBrainz?

There would be hundreds of albums published in Scotland in 2006.

Note that we are looking for useful questions for the users of TheSession.

@fujinaga
Copy link
Member

Here's one (TheSession+Wikidata):
Find all performers in TheSession who were born today.

@candlecao
Copy link
Contributor Author

candlecao commented Aug 26, 2024

OK, got it. Thank you. I will polish them or may consult with you in person, sooner.
I answer part of your questions:

  • Query 1.3 is replaced by:
    Find the recordings(Albums) whose performers are named "Adam Agee & Jon Sousa".

  • Query 2.1 is replaced by:
    What genres appear in the recording of "Adam Agee & Jon Sousa"?

 Find the genres corresponding to the tunes in a recording and return the genre with their frequency.

  • Reply 2.2: "Steampacket Hotel" is the name of a session. You cannot directly find the session from the webpage of TheSession unless using SPARQL. So we can add a simple query--Find a session named "Steampacket Hotel."

  • Query 2.2: To clarify, we can update the query as:
    Show me the sessions that took place before the session named "Steampacket Hotel".(sorted from distant past to recent past)

  • Reply 2.3: "Canadian Legion" is the name of a session.

  • Reply 3.0: No, TheSession database indeed doesn't include recording dates.

  • Reply 4.2: OK.

  • Reply 4.3: it's probably. You can not find it directly on TheSession webpage. the URL for this is https://thesession.org/events/60

I found it from the CSV:

<style> </style>
events_id event dtstart dtend venue venue_wiki address town town_wiki area area_wiki country country_wiki coordinate
https://thesession.org/events/60 Celtic Festival 2006 9/9/2006 21:00 9/10/2006 19:00 Chemin De Feuquières     Grandvilliers https://www.wikidata.org/wiki/Q585338 Île-de-France https://www.wikidata.org/wiki/Q13917 France https://www.wikidata.org/wiki/Q142 Point(48.53992844 2.88414598)

@candlecao
Copy link
Contributor Author

@fujinaga Hi, Ich. I began to make the slides on the power point. You can find the draft here being updated from time to time.
https://github.com/DDMAL/linkedmusic-queries/blob/main/QueryContrast.ppt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high high priority
Projects
None yet
Development

No branches or pull requests

2 participants