Skip to content

πŸ§ πŸ’¬ LLM-powered chatbot for querying databases using natural language, with auto visualizations, schema pruning, and multi-agent orchestration.

Notifications You must be signed in to change notification settings

charangajjala/Query_GenAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Text-to-SQL Banner

🧠 LLM-Based Text-to-NoSQL Chatbot

Empowering non-technical users to query complex databases using natural language

Streamlit LangChain MongoDB FastAPI LLaMA3


πŸ“Œ Overview

This project presents a multi-agent LLM-powered chatbot system capable of translating natural language queries into efficient SQL (and NoSQL) commands. Designed for non-technical users, it simplifies data access through an intuitive interface, powered by the latest advancements in language models and schema reasoning.

Built using LangChain, LangGraph, Streamlit, FastAPI, MongoDB, and Groq-hosted LLaMA 3.1.


βš™οΈ Features

  • πŸ” Natural Language to Query: Converts user questions into executable MongoDB pipeline or SQL queries.
  • πŸ“Š Auto Visualization: Generates charts automatically for visual queries.
  • 🧠 Schema-aware Prompting: Dynamically optimizes schema tokens to fit within LLM context.
  • πŸ” Error-resilient Execution: Incorporates query validation, execution tracing, and iterative refinement.
  • πŸ’¬ Interactive Chatbot: Built using Streamlit with real-time backend orchestration using LangGraph.

πŸš€ Installation & Run

  1. Clone the repository:

    git clone https://github.com/charangajjala/ai-project.git
    cd Query_GenAI
  2. Install dependencies:

    pip install -r requirements.txt
  3. Start the backend API server:

    uvicorn backend.main:app --reload
  4. Start the Streamlit frontend:

    streamlit run frontend/app.py

πŸ§ͺ Architecture

  • LangGraph Agents: Orchestrate logic for schema pruning, query generation, and error handling.
  • LLM Model: LLaMA 3.1 (70B) via Groq API.
  • Database: MongoDB Atlas β€” Sample Analytics Dataset.

🧠 System Architecture and Methodology

The system is built using a modular multi-agent framework, where each component is designed to handle a specific function in the Text-to-SQL pipeline. The core workflow is orchestrated using LangGraph, enabling dynamic decision-making through routing logic.

πŸ—οΈ Key Components

1. Schema Agent

  • Extracts and prunes relevant schema details (tables, columns, data types) based on user queries.
  • Reduces token count to stay within the LLM context window.
  • Provides schema-aware few-shot prompts for accurate query generation.

2. Query Generator Agent

  • Converts natural language to SQL/MongoDB queries using LLaMA 3.1.
  • Uses iterative prompting and few-shot examples.
  • Validates and re-generates queries using error feedback if execution fails.

3. Visualization Agent

  • Handles queries requesting plots or charts.
  • Generates MongoDB queries and then Python code for plotting (e.g., bar, line charts).
  • Returns visualizations rendered in real-time using Streamlit.

4. LangGraph Router Node

  • Acts as the decision engine.
  • Routes queries to either the query execution path or the visualization path.
  • Falls back to error handling if inputs are incomplete or ambiguous.

πŸ”„ Methodology Overview

  1. Natural Language Input: User submits a query via chatbot UI.
  2. Intent Detection: LangGraph router determines whether it's a data retrieval or visualization request.
  3. Schema Optimization: Schema Agent fetches only necessary tables/columns.
  4. Query Generation: LLM generates SQL/MongoDB queries with in-context examples.
  5. Validation: Syntax and execution errors are caught, refined, and re-executed.
  6. Output: Query results or plots are returned to the user.

This design emphasizes:

  • πŸ’‘ Efficient token usage (schema pruning),
  • πŸ” Iterative improvement (error-aware prompting),
  • πŸ“ˆ Visualization-first UX (interpretable results for non-technical users),
  • 🧩 Scalability across SQL and NoSQL databases.

πŸ“ˆ Evaluation Metrics

Metric Description
EM (Exact Match) Measures if query matches reference SQL
EX (Execution Accuracy) Checks if result matches ground truth
VES (Valid Efficiency Score) Considers correctness + speed

πŸ“Š Achieved: EM = 60%, EX = 80%, VES = 74%


πŸ“Έ Demo Screenshots

πŸ” Summarized answers from user query

πŸ“Š Plot Generation from user query


Contrbutors

Charan Gajjala
Charan Gajjala Chenchu
Prem Jampuram
Divija Kalluri

πŸ“š References

  1. QueryGPT by Uber
  2. Text-to-SQL with LLMs - arXiv 2023
  3. LangGraph Documentation

πŸ“œ License

This project is licensed under the MIT License.

About

πŸ§ πŸ’¬ LLM-powered chatbot for querying databases using natural language, with auto visualizations, schema pruning, and multi-agent orchestration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •