-
Notifications
You must be signed in to change notification settings - Fork 4
Prompt_Engineering_NL2SPARQL: Experiments on June 2024, and Further Proposal
(Please refer to https://github.com/DDMAL/linkedmusic-datalake/tree/PromptEngineering_on_simssaDB/Prompt_Engineering on Branch “PromptEngineering_on_simssaDB”. The codes snippets are from trial.py file.)
Assume the user is interacting at a UI of LinkedMusic product; We program by python rendering precedures like as follow:
Just some snippets as below: from openai import OpenAI
def callGPT(prompt):
completion = client.chat.completions.create(
model="gpt-3.5-turbo-16k",
max_tokens=500,
temperature=0.1,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
return completion.choices[0].message.content
from SPARQLWrapper import SPARQLWrapper, JSON
def query_sparql(endpoint, query, graph_iri):
sparql = SPARQLWrapper(endpoint)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
sparql.addDefaultGraph(graph_iri)
results = sparql.query().convert()
return results
sparql_query = callGPT(sparql_query_prompt).strip()
#print(sparql_query)
sparql_query = sparql_query.replace("```sparql", "").strip("```")
print(sparql_query)
sparql_query_prompt = f"""
Given the following RDF context and question, generate SPARQL query to retrieve the relevant information:
Context:
{context}
Question:
{question}
Please provide only the SPARQL query without any additional text.
Because it is not querying over the WikiData SPARQL Endpoint, so don't use the "SERVICE" Keyword in SPARQL query generated by you!!!
"""
As to the 5 questions on SimssaDB, please check: SimssaDB_5_questions
Refer to SimssaDB_context
Given a SPARQL Endpoint and a graph_iri: sparql_endpoint = "https://virtuoso.staging.simssa.ca/sparql" graph_iri = "urn:simssadb".
This is in area of Music. Please note the corresponding RDF Database has already been reconciled with WikiData as much as possible. But not DBPedia! So we don't have to use the ontology assertion from DBPedia.
Important reference are as belows: @PREFIX wd: http://www.wikidata.org/entity/ . @PREFIX wdt: http://www.wikidata.org/prop/direct/ . wdt:P1476 rdfs:domain wd:Q2188189 . wdt:P136 wdt:P2561
Don't forget to add the necessary assertion of namespace. Because it is not querying over the WikiData SPARQL Endpoint, so don't use the "SERVICE" Keyword in SPARQL query generated by you!!! Anyway, make sure the syntax of the SPARQL codes should be absolutely correct.
(1)Integrate the results in spreatsheet-like format. (2)Explain the structured results with natural language complement, also with another round of prompt engineering.
Our prompt procedure is including 3 steps: (1)SPARQL-generating (2)Transform the Result of SPARQL in format of dictionary or CSV(plus URI) (3)Further explaination using GPT
WorkingLogs: I did the prompt engineering with python on ChatGPT (3.5): Against SimssaDB, I asked questions and got fed back the SPARQL Query.
Even if chatGPT can generate some SPARQL correctly since it has understood many assertions of properties or classes, it’s not constantly right and accurate.
I took numerous trials on the context by adding little by little--guidance, instruction, especially assertion of even some “fragmented-ontology” as hints before I can get constantly right reply.