Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code in nl2sql.ipynb notebook - Changed the approach to explain the SQL tables to the model #10

Merged
merged 2 commits into from
Oct 7, 2024

Conversation

fmquaglia
Copy link
Contributor

It is using a domain-specific language already understood by the model. This results in a more compact system prompt that allows for a more nuanced data model description. This should result in fewer tokens consumed by the system prompts and better maintainability, allowing for a richer data definition.

This might not be ideal for an educational exercise because it adds to the cognitive load (Wait... what? What is this DBML thing?), but I figured I would show it to you, anyway.

From

 # ...

context.append( {'role':'system', 'content':"""
first table:
{
  "tableName": "employees",
  "fields": [
    {
      "nombre": "ID_usr",
      "tipo": "int"
    },
    {
      "nombre": "name",
      "tipo": "string"
    }
  ]
}
"""
})

context.append( {'role':'system', 'content':"""
second table:
{
  "tableName": "salary",
  "fields": [
    {
      "nombre": "ID_usr",
      "type": "int"
    },
    {
      "name": "year",
      "type": "date"
    },
    {
      "name": "salary",
      "type": "float"
    }
  ]
}
"""
})

context.append( {'role':'system', 'content':"""
third table:
{
  "tablename": "studies",
  "fields": [
    {
      "name": "ID",
      "type": "int"
    },
    {
      "name": "ID_usr",
      "type": "int"
    },
    {
      "name": "educational level",
      "type": "int"
    },
    {
      "name": "Institution",
      "type": "string"
    },
    {
      "name": "Years",
      "type": "date"
    }
    {
      "name": "Speciality",
      "type": "string"
    }
  ]
}
"""
})

# ...

To

 # ...

context.append( {'role':'system', 'content':"""
This is the definition of your database tables:

```dbml
Table employees {
  ID_usr int [pk]
  name string
}

Table salary {
  ID_usr int [ref: > employees.ID_usr]
  year date
  salary float
}

Table studies {
  ID int [pk]
  ID_usr int [ref: > employees.ID_usr]
  educational_level int
  Institution string
  Years date
  Speciality string
}

"""
})

...

@fmquaglia
Copy link
Contributor Author

Notice that GitHub gets confused by the markdown in the system prompt, and it does not render the code properly in the PR description.

@fmquaglia fmquaglia force-pushed the dbml-to-save-tokens-in-nl2sql branch from ef7e483 to c917607 Compare February 8, 2024 07:25
@fmquaglia
Copy link
Contributor Author

I rebased on top of the latest additions to main.

@peremartra peremartra self-assigned this Aug 13, 2024
@peremartra
Copy link
Owner

Hi @fmquaglia,

As you know, this notebook is part of a course, and it’s just a basic, initial approach to creating an NL2SQL solution. In a more advanced lesson, we have this notebook:
6_1_nl2sql_prompt_OpenAI.ipynb,
which uses a very similar structure in the prompt to what you are presenting here, based on this paper:
https://arxiv.org/abs/2305.11853.

However, I prefer to keep this more basic and somewhat incorrect approach here, and introduce the more accurate method later when the students have a stronger foundation of knowledge.

@peremartra peremartra closed this Aug 13, 2024
@peremartra peremartra reopened this Aug 13, 2024
@peremartra
Copy link
Owner

Just figured that we can upload the notebook with a different name, and with some explanations in the header, explaining that this is a better solution.

rename the notebook to: 1_2b-Easy_NL2SQL.ipynb.

and please, add a by line in the header with your name and a link to your github profile, or linked profile, the one you prefer.

@fmquaglia
Copy link
Contributor Author

@peremartra I totally missed this message. I'll try to have it done as you suggest by the end of this week, sir. Thank you!

@fmquaglia fmquaglia changed the title Update code in nl2sql.ipynb notebook - Changed the approach to explain the SQL tables to the mode Update code in nl2sql.ipynb notebook - Changed the approach to explain the SQL tables to the model Sep 17, 2024
@peremartra
Copy link
Owner

@peremartra I totally missed this message. I'll try to have it done as you suggest by the end of this week, sir. Thank you!

@peremartra I totally missed this message. I'll try to have it done as you suggest by the end of this week, sir. Thank you!

@peremartra I totally missed this message. I'll try to have it done as you suggest by the end of this week, sir. Thank you!

My Fault @fmquaglia! Thanks to you, waiting for you modifications :-)

Changed the approach to explain the SQL tables to the model, using a domain specific language already understood by the model.
This results in a more compact system prompt that still allows for a more nuanced description of the data model.
This should result in less tokens consumed by the system prompts and better maintainability allowing for a richer data definition.
@fmquaglia fmquaglia force-pushed the dbml-to-save-tokens-in-nl2sql branch from c917607 to cf7b36b Compare October 7, 2024 01:04
Renamed the NL2SQL notebook to a more descriptive filename and updated the author information within the notebook. Added a note about DBML to explain its use in database descriptions.
@peremartra peremartra merged commit 51a54e4 into peremartra:main Oct 7, 2024
1 check passed
@fmquaglia
Copy link
Contributor Author

@peremartra Sorry, my wife called me to have dinner, and I completely forgot to send you a comment letting you know that what you requested has been done. Man, thanks for letting me contribute to this project. It makes me happy. Thank you!

@peremartra
Copy link
Owner

peremartra commented Oct 7, 2024 via email

@peremartra
Copy link
Owner

peremartra commented Oct 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants