This project presents a multi-agent LLM-powered chatbot system capable of translating natural language queries into efficient SQL (and NoSQL) commands. Designed for non-technical users, it simplifies data access through an intuitive interface, powered by the latest advancements in language models and schema reasoning.
Built using LangChain, LangGraph, Streamlit, FastAPI, MongoDB, and Groq-hosted LLaMA 3.1.
- π Natural Language to Query: Converts user questions into executable MongoDB pipeline or SQL queries.
- π Auto Visualization: Generates charts automatically for visual queries.
- π§ Schema-aware Prompting: Dynamically optimizes schema tokens to fit within LLM context.
- π Error-resilient Execution: Incorporates query validation, execution tracing, and iterative refinement.
- π¬ Interactive Chatbot: Built using Streamlit with real-time backend orchestration using LangGraph.
-
Clone the repository:
git clone https://github.com/charangajjala/ai-project.git cd Query_GenAI
-
Install dependencies:
pip install -r requirements.txt
-
Start the backend API server:
uvicorn backend.main:app --reload
-
Start the Streamlit frontend:
streamlit run frontend/app.py
- LangGraph Agents: Orchestrate logic for schema pruning, query generation, and error handling.
- LLM Model: LLaMA 3.1 (70B) via Groq API.
- Database: MongoDB Atlas β Sample Analytics Dataset.
The system is built using a modular multi-agent framework, where each component is designed to handle a specific function in the Text-to-SQL pipeline. The core workflow is orchestrated using LangGraph, enabling dynamic decision-making through routing logic.
- Extracts and prunes relevant schema details (tables, columns, data types) based on user queries.
- Reduces token count to stay within the LLM context window.
- Provides schema-aware few-shot prompts for accurate query generation.
- Converts natural language to SQL/MongoDB queries using LLaMA 3.1.
- Uses iterative prompting and few-shot examples.
- Validates and re-generates queries using error feedback if execution fails.
- Handles queries requesting plots or charts.
- Generates MongoDB queries and then Python code for plotting (e.g., bar, line charts).
- Returns visualizations rendered in real-time using Streamlit.
- Acts as the decision engine.
- Routes queries to either the query execution path or the visualization path.
- Falls back to error handling if inputs are incomplete or ambiguous.
- Natural Language Input: User submits a query via chatbot UI.
- Intent Detection: LangGraph router determines whether it's a data retrieval or visualization request.
- Schema Optimization: Schema Agent fetches only necessary tables/columns.
- Query Generation: LLM generates SQL/MongoDB queries with in-context examples.
- Validation: Syntax and execution errors are caught, refined, and re-executed.
- Output: Query results or plots are returned to the user.
This design emphasizes:
- π‘ Efficient token usage (schema pruning),
- π Iterative improvement (error-aware prompting),
- π Visualization-first UX (interpretable results for non-technical users),
- π§© Scalability across SQL and NoSQL databases.
Metric | Description |
---|---|
EM (Exact Match) | Measures if query matches reference SQL |
EX (Execution Accuracy) | Checks if result matches ground truth |
VES (Valid Efficiency Score) | Considers correctness + speed |
π Achieved: EM = 60%, EX = 80%, VES = 74%
Charan Gajjala Chenchu |
Divija Kalluri |
This project is licensed under the MIT License.