-
Notifications
You must be signed in to change notification settings - Fork 672
Description
The current implementation of the SQL generation feature using LLM incurs significant token consumption with each request, primarily due to:
Sending the entire database schema (via db_schema) to the LLM, regardless of relevance to the user’s query.
Including full sample rows for all tables (via sample_data), even when only a subset of tables/columns are needed for the query.
This approach is inefficient for larger databases, leading to:
High latency in LLM response times.
Unnecessary computational overhead and API costs (if using paid LLM services).
Potential performance degradation when handling databases with many tables or wide columns.
Proposed Solution
We should optimize token usage by:
Dynamic Schema Filtering:
First, send only the user’s question to the LLM to identify which tables/columns are potentially relevant.
Then, fetch and send only the relevant portions of db_schema and sample_data for those tables.
Sample Data Truncation:
For non-numeric columns (e.g., VARCHAR, TEXT), enforce a length limit (e.g., 50 characters) when extracting sample values to avoid sending excessively long strings.
Verbose Token Reporting:
In verbose mode, log the number of tokens consumed per LLM request to help users monitor usage and costs.
Benefits
Reduced Token Consumption: Minimizes irrelevant data sent to the LLM.
Improved Performance: Faster LLM responses and lower latency.
Cost Efficiency: Lower API costs for users leveraging paid LLM services.
Better Scalability: Supports larger databases without hitting token limits.
Implementation Notes
The optimization could be implemented as an optional feature (e.g., via a --optimize-tokens flag) to maintain backward compatibility.
The LLM prompt template (LLM_TEMPLATE_NAME) may need adjustments to work with partial schema/sample data.
We welcome community feedback and contributions to refine this approach!