Skip to content

Conversation

@gadenbuie
Copy link
Contributor

@gadenbuie gadenbuie commented Oct 1, 2025

This PR focuses on improving the clarity and effectiveness of the LLM system prompt, along with enhancements to the Python package implementation.

System Prompt Refactoring

The main system prompt has been restructured for better clarity and usability:

  • Reorganized content with clearer hierarchy and section headings
  • Moved database schema and DuckDB tips earlier in the prompt to be grouped with other database-related information
  • Simplified DuckDB percentile guidance with concrete examples and preference for quantile_* functions
  • Enhanced "Suggestions" section with comprehensive syntax examples, usage guidelines, and best practices for when to include clickable prompts
  • Streamlined filtering/sorting instructions by removing redundant explanations
  • Clarified that no response is needed after successful dashboard updates
  • Added explicit guidance to use Markdown table formatting for structured data
  • Improved examples to be more concise and realistic
  • Made extra_instructions conditional with proper formatting
  • Overall: more scannable structure with better separation of concerns

Tool Description Improvements

We now also store tool descriptions in separate markdown files for easier editing and consistency between R and Python packages. The tool descriptions have been improved to clarify when and when not to use each tool, along with important constraints and guidelines.

Python Package Changes

  • Brought prompt improvements from R package to Python package
  • Refactored how tools are created for better maintainability
  • We're now effectively storing the tool docstrings in markdown files, which means we can't directly copy the tool prompt files from R to Python, but they are similar enough to improve maintenance
  • Added .get_db_type() method to the DataSource protocol class, plus a new DataSourceBase class that contains the default implementation. The SQLAlchemySource has a custom method to return the exact DB type, which is different from the db_engine (always SQLAlchemy).

While here, I also fixed unsafe access to QUERYCHAT_CLIENT_ARGS environment variable when querychat.init(client=) is a string.

Infrastructure

  • Moved prompt.md to inst/prompts directory to match ellmer conventions
  • Added default get_db_type() value of "standard"

gadenbuie and others added 16 commits October 1, 2025 10:20
- Reorganized content with clearer hierarchy and section headings
- Moved database schema and DuckDB tips earlier for better context
- Simplified DuckDB percentile guidance with concrete examples and preference for `quantile_*` functions
- Enhanced "Suggestions" section with comprehensive syntax examples, usage guidelines, and best practices for when to include clickable prompts
- Streamlined filtering/sorting instructions by removing redundant explanations
- Clarified that no response is needed after successful dashboard updates
- Added explicit Markdown table formatting guideline
- Improved examples to be more concise and realistic
- Made extra_instructions conditional with proper formatting
- Overall: more scannable structure with better separation of concerns
Also uses DataSourceBase class to get default `get_db_type()` implementation
@gadenbuie gadenbuie requested a review from jcheng5 October 1, 2025 19:15
@gadenbuie
Copy link
Contributor Author

I'm going to merge this so that we can start to get community feedback

@gadenbuie gadenbuie merged commit c5e32f4 into main Oct 15, 2025
16 checks passed
@gadenbuie gadenbuie deleted the fix/89-system-prompt branch October 15, 2025 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants