Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to query lots of strings in RDF(TheSession) #9

Open
candlecao opened this issue Sep 8, 2024 · 6 comments
Open

Not able to query lots of strings in RDF(TheSession) #9

candlecao opened this issue Sep 8, 2024 · 6 comments
Assignees
Labels
priority: high high priority

Comments

@candlecao
Copy link
Contributor

For example, you can not query the session named "Hurley’s Irish Pub" by:

SELECT ?session
WHERE {
  ?session wdt:P2561 "Hurley’s Irish Pub" .
  ?session rdf:type <https://thesession.org/sessions> .
}

But you can make it by adding "@en": ?session wdt:P2561 "Hurley’s Irish Pub"@en .
The reason is due to the modification:
image

@candlecao
Copy link
Contributor Author

candlecao commented Sep 8, 2024

I don't quite agree on this rendering because:
(1) We can not guarantee that all of these are definitely in English.
(2) It will cause burden to LLM2SPARQL, intensifying the inaccuracy.
(3) We can use English as the default language so that there is no need to specify this; for other languages, we may supplement with tags such as @zh for Chinese @fr for French...

@fujinaga Hi, Ich, do you agree?

@Yueqiao12Zhang
Copy link

@fujinaga

@fujinaga
Copy link
Member

There should always be a language tag in every string. We can always instruct ChatGPT to append the language tags in SPARQL queries.

@Yueqiao12Zhang
Copy link

Ok. Does this mean that I have to automatically detect the language of every string in my script?

@fujinaga
Copy link
Member

No. For each database we import, we should know which language it's in.
For now you can default always to @en. If we are storing chant text from CantusDB, that would be in Latin.

@ahankinson
Copy link
Member

There are several codes that you can use for non-coded languages:

Type: script
Subtag: Zyyy
Description: Code for undetermined script
Added: 2005-10-16
%%
Type: script
Subtag: Zzzz
Description: Code for uncoded script
Added: 2005-10-16
%%
Type: language
Subtag: und
Description: Undetermined
Added: 2005-10-16
Scope: special

https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

Note: "und should not be used unless a language tag is required and language information is not available or cannot be determined. Omitting the language tag (where permitted) is preferred. This subtag may also be useful when matching language tags in certain situations. Where xml:lang="" is allowed by the markup, it is better to use that rather than und"

From a search for "und" here: https://r12a.github.io/app-subtags/

See: https://www.w3.org/International/questions/qa-no-language#undetermined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high high priority
Projects
None yet
Development

No branches or pull requests

4 participants