Skip to content

Prompt_Engineering_NL2SPARQL: Experiments on June 2024, and Further Proposal

Junjun Cao edited this page Jun 18, 2024 · 7 revisions

(Please refer to https://github.com/DDMAL/linkedmusic-datalake/tree/PromptEngineering_on_simssaDB/Prompt_Engineering on Branch “PromptEngineering_on_simssaDB”. The codes snippets are from trial.py file.)

1.Preliminary WorkFlow

1.1Embed all the intermediate prompt to the background in order to faciliate most users

Assume the user is interacting at a UI of LinkedMusic product; We program by python rendering precedures like as follow:

1.1.1Reuse a function of GPT specially for prompt engineering

Just some snippets as below: from openai import OpenAI

def callGPT(prompt):
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo-16k",
        max_tokens=500,
        temperature=0.1,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    )
return completion.choices[0].message.content

1.1.2 Define a function of connecting an SPARQL Endpoint

from SPARQLWrapper import SPARQLWrapper, JSON
def query_sparql(endpoint, query, graph_iri):
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    sparql.addDefaultGraph(graph_iri)
    results = sparql.query().convert()
    return results
sparql_query = callGPT(sparql_query_prompt).strip()
#print(sparql_query)
sparql_query = sparql_query.replace("```sparql", "").strip("```")
print(sparql_query)

1.1.3 Set patterns of Quesiton

sparql_query_prompt = f"""
Given the following RDF context and question, generate SPARQL query to retrieve the relevant information:
Context:
{context}
Question:
{question}
Please provide only the SPARQL query without any additional text.
Because it is not querying over the WikiData SPARQL Endpoint, so don't use the "SERVICE" Keyword in SPARQL query generated by you!!!
"""

As to the 5 questions on SimssaDB, please check: SimssaDB_5_questions

1.1.4 Give context and question

Refer to SimssaDB_context

Given a SPARQL Endpoint and a graph_iri: sparql_endpoint = "https://virtuoso.staging.simssa.ca/sparql" graph_iri = "urn:simssadb".

This is in area of Music. Please note the corresponding RDF Database has already been reconciled with WikiData as much as possible. But not DBPedia! So we don't have to use the ontology assertion from DBPedia.

Important reference are as belows: @PREFIX wd: http://www.wikidata.org/entity/ . @PREFIX wdt: http://www.wikidata.org/prop/direct/ . wdt:P1476 rdfs:domain wd:Q2188189 . wdt:P136 wdt:P2561

Don't forget to add the necessary assertion of namespace. Because it is not querying over the WikiData SPARQL Endpoint, so don't use the "SERVICE" Keyword in SPARQL query generated by you!!! Anyway, make sure the syntax of the SPARQL codes should be absolutely correct.

1.1.5 Integrate and Explain the result

(1)Integrate the results in spreatsheet-like format. (2)Explain the structured results with natural language complement, also with another round of prompt engineering.

1.2 Summary

Our prompt procedure is including 3 steps: (1)SPARQL-generating (2)Transform the Result of SPARQL in format of dictionary or CSV(plus URI) (3)Further explaination using GPT

WorkingLogs: I did the prompt engineering with python on ChatGPT (3.5): Against SimssaDB, I asked questions and got fed back the SPARQL Query.

Even if chatGPT can generate some SPARQL correctly since it has understood many assertions of properties or classes, it’s not constantly right and accurate.

I took numerous trials on the context by adding little by little--guidance, instruction, especially assertion of even some “fragmented-ontology” as hints before I can get constantly right reply.

2.Secondary Suggestion for Refinement/ Future TODO

2.0 Keep on trying over different databases like including MusicBrainz. Use high level GPT.

2.1 We can use a “iteration” approach to prompt GPT to persist in generation of SPARQL ultimately with correct syntax and yielding expected results rather than yielding errors. The goal is to obtain at least one right result or or similarly right ones.

2.2 We can prompt GPT to generate “relation patterns” or ontology for a database being queiried against, and determine the weights of properties and even classes--that is to simultaneously extract a sub-graph of schema corresponding to any arbitrary natural language question, so as to generate more accurate SPARQL. Additionally we can illustrate examples to instruct it. In fact, we have the capability to utilize certain tools to analyze the structure of Wikidata comprehensively.

2.3 We be inspired by recommendation algorithm--in case there is not exact answer for a particular question. Anyway it’s better to feed back sth. instead of nothing.

2.4 Use data Visualization so as the user can navigate better.

2.5 Otherwise, we can apply for adding-properties onto Wiki.

Clone this wiki locally