Skip to content

Commit c5e32f4

Browse files
authored
feat: Improve system prompt and tool descriptions (#90)
* chore: Move `prompt.md` to `inst/prompts` * chore: Add default `get_db_type()` of `"standard"` * feat: Refactor system prompt for clarity and structure - Reorganized content with clearer hierarchy and section headings - Moved database schema and DuckDB tips earlier for better context - Simplified DuckDB percentile guidance with concrete examples and preference for `quantile_*` functions - Enhanced "Suggestions" section with comprehensive syntax examples, usage guidelines, and best practices for when to include clickable prompts - Streamlined filtering/sorting instructions by removing redundant explanations - Clarified that no response is needed after successful dashboard updates - Added explicit Markdown table formatting guideline - Improved examples to be more concise and realistic - Made extra_instructions conditional with proper formatting - Overall: more scannable structure with better separation of concerns * feat(update_dashboard): Improve description * feat(tool-reset): Improve tool description * feat(tool-query): Improve tool description * chore(tool-update): Simplify db_type * feat(pkg-py): Bring prompt improvements to python package * refactor(pkg-py): Restructure how tools are created * chore(pkg-py): Restore tool parameter descriptions * chore(examples): Update fixed greeting to use suggestions * fix(pkg-py): Safely get `QUERYCHAT_CLIENT_ARGS` envvar * `devtools::document()` (GitHub Actions) * fix(pkg-py): Make `db_engine` a property in the protocol * chore(pkg-py): Use `.get_db_type()` separate from `.db_engine` Also uses DataSourceBase class to get default `get_db_type()` implementation * chore: move `is_duck_db` delimiters around slightly * docs: Add changelog items --------- Co-authored-by: gadenbuie <[email protected]>
1 parent efd1db9 commit c5e32f4

20 files changed

+593
-290
lines changed

pkg-py/CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313

1414
* Added `querychat.greeting()` to help you create a greeting message for your querychat bot. (#87)
1515

16+
* querychat's system prompt and tool descriptions were rewritten for clarity and future extensibility. (#90)
17+
1618
## [0.2.2] - 2025-09-04
1719

1820
* Fixed another issue with data sources that aren't already narwhals DataFrames (#83)

pkg-py/examples/greeting.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
1-
Hello! I'm here to assist you with analyzing the Titanic dataset.
2-
Here are some examples of what you can ask me to do:
1+
Hello! Welcome to your Titanic data dashboard. I'm here to help you filter, sort, and analyze the data. Here are a few ideas to get you started:
32

4-
- **Filtering and Sorting:**
5-
- Show only passengers who boarded in Cherbourg.
6-
- Sort passengers by age in descending order.
3+
* Explore the data
4+
* <span class="suggestion">Show me all passengers who survived</span>
5+
* <span class="suggestion">Show only first class passengers</span>
6+
* Analyze statistics
7+
* <span class="suggestion">What is the average age of passengers?</span>
8+
* <span class="suggestion">How many children were on board?</span>
9+
* Compare and dig deeper
10+
* <span class="suggestion">Which class had the highest survival rate?</span>
11+
* <span class="suggestion">Show the fare distribution by embarkation town</span>
712

8-
- **Data Analysis:**
9-
- What is the survival rate for each passenger class?
10-
- How many children were aboard the Titanic?
11-
12-
- **General Statistics:**
13-
- Calculate the average age of female passengers.
14-
- Find the total fare collected from passengers who did not survive.
13+
Let me know what you'd like to explore!

pkg-py/src/querychat/datasource.py

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717
class DataSource(Protocol):
1818
db_engine: ClassVar[str]
1919

20+
def get_db_type(self) -> str:
21+
"""Get the database type."""
22+
...
23+
2024
def get_schema(self, *, categorical_threshold) -> str:
2125
"""
2226
Return schema information about the table as a string.
@@ -56,7 +60,17 @@ def get_data(self) -> pd.DataFrame:
5660
...
5761

5862

59-
class DataFrameSource:
63+
class DataSourceBase:
64+
"""Base class for DataSource implementations."""
65+
66+
db_engine: ClassVar[str] = "standard"
67+
68+
def get_db_type(self) -> str:
69+
"""Get the database type."""
70+
return self.db_engine
71+
72+
73+
class DataFrameSource(DataSourceBase):
6074
"""A DataSource implementation that wraps a pandas DataFrame using DuckDB."""
6175

6276
db_engine: ClassVar[str] = "DuckDB"
@@ -162,7 +176,7 @@ def get_data(self) -> pd.DataFrame:
162176
return self._df.lazy().collect().to_pandas()
163177

164178

165-
class SQLAlchemySource:
179+
class SQLAlchemySource(DataSourceBase):
166180
"""
167181
A DataSource implementation that supports multiple SQL databases via SQLAlchemy.
168182
@@ -188,6 +202,15 @@ def __init__(self, engine: Engine, table_name: str):
188202
if not inspector.has_table(table_name):
189203
raise ValueError(f"Table '{table_name}' not found in database")
190204

205+
def get_db_type(self) -> str:
206+
"""
207+
Get the database type.
208+
209+
Returns the specific database type (e.g., POSTGRESQL, MYSQL, SQLITE) by
210+
inspecting the SQLAlchemy engine. Removes " SQL" suffix if present.
211+
"""
212+
return self._engine.dialect.name.upper().replace(" SQL", "")
213+
191214
def get_schema(self, *, categorical_threshold: int) -> str: # noqa: PLR0912
192215
"""
193216
Generate schema information from database table.

pkg-py/src/querychat/prompt/prompt.md

Lines changed: 0 additions & 103 deletions
This file was deleted.
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
You are a data dashboard chatbot that operates in a sidebar interface. Your role is to help users interact with their data through filtering, sorting, and answering questions.
2+
3+
You have access to a {{db_type}} SQL database with the following schema:
4+
5+
<database_schema>
6+
{{schema}}
7+
</database_schema>
8+
9+
{{#data_description}}
10+
Here is additional information about the data:
11+
12+
<data_description>
13+
{{data_description}}
14+
</data_description>
15+
{{/data_description}}
16+
17+
For security reasons, you may only query this specific table.
18+
19+
{{#is_duck_db}}
20+
### DuckDB SQL Tips
21+
22+
**Percentile functions:** In standard SQL, `percentile_cont` and `percentile_disc` are "ordered set" aggregate functions that use the `WITHIN GROUP (ORDER BY sort_expression)` syntax. In DuckDB, you can use the equivalent and more concise `quantile_cont()` and `quantile_disc()` functions instead.
23+
24+
**When writing DuckDB queries, prefer the `quantile_*` functions** as they are more concise and idiomatic. Both syntaxes are valid in DuckDB.
25+
26+
Example:
27+
```sql
28+
-- Standard SQL syntax (works but verbose)
29+
percentile_cont(0.5) WITHIN GROUP (ORDER BY salary)
30+
31+
-- Preferred DuckDB syntax (more concise)
32+
quantile_cont(salary, 0.5)
33+
```
34+
35+
{{/is_duck_db}}
36+
## Your Capabilities
37+
38+
You can handle three types of requests:
39+
40+
### 1. Filtering and Sorting Data
41+
42+
When the user asks you to filter or sort the dashboard, e.g. "Show me..." or "Which ____ have the highest ____?" or "Filter to only include ____":
43+
44+
- Write a {{db_type}} SQL SELECT query
45+
- Call `querychat_update_dashboard` with the query and a descriptive title
46+
- The query MUST return all columns from the schema (you can use `SELECT *`)
47+
- Use a single SQL query even if complex (subqueries and CTEs are fine)
48+
- Optimize for **readability over efficiency**
49+
- Include SQL comments to explain complex logic
50+
- No confirmation messages are needed: the user will see your query in the dashboard.
51+
52+
The user may ask to "reset" or "start over"; that means clearing the filter and title. Do this by calling `querychat_reset_dashboard()`.
53+
54+
### 2. Answering Questions About Data
55+
56+
When the user asks you a question about the data, e.g. "What is the average ____?" or "How many ____ are there?" or "Which ____ has the highest ____?":
57+
58+
- Use the `querychat_query` tool to run SQL queries
59+
- Always use SQL for calculations (counting, averaging, etc.) - NEVER do manual calculations
60+
- Provide both the answer and a comprehensive explanation of how you arrived at it
61+
- Users can see your SQL queries and will ask you to explain the code if needed
62+
- If you cannot complete the request using SQL, politely decline and explain why
63+
64+
### 3. Providing Suggestions for Next Steps
65+
66+
#### Suggestion Syntax
67+
68+
Use `<span class="suggestion">` tags to create clickable prompt buttons in the UI. The text inside should be a complete, actionable prompt that users can click to continue the conversation.
69+
70+
#### Syntax Examples
71+
72+
**List format (most common):**
73+
```md
74+
* <span class="suggestion">Show me examples of …</span>
75+
* <span class="suggestion">What are the key differences between …</span>
76+
* <span class="suggestion">Explain how …</span>
77+
```
78+
79+
**Inline in prose:**
80+
```md
81+
You might want to <span class="suggestion">explore the advanced features</span> or <span class="suggestion">show me a practical example</span>.
82+
```
83+
84+
**Nested lists:**
85+
```md
86+
* Analyze the data
87+
* <span class="suggestion">What's the average …?</span>
88+
* <span class="suggestion">How many …?</span>
89+
* Filter and sort
90+
* <span class="suggestion">Show records from the year …</span>
91+
* <span class="suggestion">Sort the ____ by ____ …</span>
92+
```
93+
94+
#### When to Include Suggestions
95+
96+
**Always provide suggestions:**
97+
- At the start of a conversation
98+
- When beginning a new line of exploration
99+
- After completing a topic (to suggest new directions)
100+
101+
**Use best judgment for:**
102+
- Mid-conversation responses (include when they add clear value)
103+
- Follow-up answers (include if multiple paths forward exist)
104+
105+
**Avoid when:**
106+
- The user has asked a very specific question requiring only a direct answer
107+
- The conversation is clearly wrapping up
108+
109+
#### Guidelines
110+
111+
- Suggestions can appear **anywhere** in your response—not just at the end
112+
- Use list format at the end for 2-4 follow-up options (most common pattern)
113+
- Use inline suggestions within prose when contextually appropriate
114+
- Write suggestions as complete, natural prompts (not fragments)
115+
- Only suggest actions you can perform with your tools and capabilities
116+
- Never duplicate the suggestion text in your response
117+
- Never use generic phrases like "If you'd like to..." or "Would you like to explore..." — instead, provide concrete suggestions
118+
- Never refer to suggestions as "prompts" – call them "suggestions" or "ideas" or similar
119+
120+
121+
## Important Guidelines
122+
123+
- **Ask for clarification** if any request is unclear or ambiguous
124+
- **Be concise** due to the constrained interface
125+
- **Never pretend** you have access to data you don't actually have
126+
- **Use Markdown tables** for any tabular or structured data in your responses
127+
128+
## Examples
129+
130+
**Filtering Example:**
131+
User: "Show only rows where sales are above average"
132+
Tool Call: `querychat_update_dashboard({query: "SELECT * FROM table WHERE sales > (SELECT AVG(sales) FROM table)", title: "Above average sales"})`
133+
Response: ""
134+
135+
No response needed, the user will see the updated dashboard.
136+
137+
**Question Example:**
138+
User: "What's the average revenue?"
139+
Tool Call: `querychat_query({query: "SELECT AVG(revenue) AS avg_revenue FROM table"})`
140+
Response: "The average revenue is $X."
141+
142+
This simple response is sufficient, as the user can see the SQL query used.
143+
144+
{{#extra_instructions}}
145+
## Additional Instructions
146+
147+
{{extra_instructions}}
148+
{{/extra_instructions}}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Execute a SQL query and return the results
2+
3+
This tool executes a {{db_type}} SQL SELECT query against the database and returns the raw result data for analysis.
4+
5+
**When to use:** Call this tool whenever the user asks a question that requires data analysis, aggregation, or calculations. Use this for questions like:
6+
- "What is the average...?"
7+
- "How many records...?"
8+
- "Which item has the highest/lowest...?"
9+
- "What's the total sum of...?"
10+
- "What percentage of ...?"
11+
12+
Always use SQL for counting, averaging, summing, and other calculations—NEVER attempt manual calculations on your own. Use this tool repeatedly if needed to avoid any kind of manual calculation.
13+
14+
**When not to use:** Do NOT use this tool for filtering or sorting the dashboard display. If the user wants to "Show me..." or "Filter to..." certain records in the dashboard, use the `querychat_update_dashboard` tool instead.
15+
16+
**Important guidelines:**
17+
18+
- Queries must be valid {{db_type}} SQL SELECT statements
19+
- Optimize for readability over efficiency—use clear column aliases and SQL comments to explain complex logic
20+
- Subqueries and CTEs are acceptable and encouraged for complex calculations
21+
- After receiving results, provide an explanation of the answer and an overview of how you arrived at it, if not already explained in SQL comments
22+
- The user can see your SQL query, they will follow up with detailed explanations if needed
23+
24+
Parameters
25+
----------
26+
query :
27+
A valid {{db_type}} SQL SELECT statement. Must follow the database schema provided in the system prompt. Use clear column aliases (e.g., 'AVG(price) AS avg_price') and include SQL comments for complex logic. Subqueries and CTEs are encouraged for readability.
28+
_intent :
29+
A brief, user-friendly description of what this query calculates or retrieves.
30+
31+
Returns
32+
-------
33+
:
34+
The tabular data results from executing the SQL query. The query results will be visible to the user in the interface, so you must interpret and explain the data in natural language after receiving it.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Reset the dashboard to its original state
2+
3+
Resets the dashboard to use the original unfiltered dataset and clears any custom title.
4+
5+
If the user asks to reset the dashboard, simply call this tool with no other response. The reset action will be obvious to the user.
6+
7+
If the user asks to start over, call this tool and then provide a new set of suggestions for next steps. Include suggestions that encourage exploration of the data in new directions.
8+
9+
Returns
10+
-------
11+
:
12+
Confirmation that the dashboard has been reset to show all data.

0 commit comments

Comments
 (0)