DermaGPT

An AI-powered health and beauty product recommendation chatbot built with a Retrieval Augmented Generation (RAG) pipeline. DermaGPT helps users discover skincare, hair care, and vitamin/supplement products from a curated catalog of ~3,000 products, while also answering general wellness questions.

Features

Product Recommendations - Find skincare, hair care, and vitamin/supplement products with natural language queries
Smart Filtering - Filter by price range, brand, rating, and product category using conversational queries
Dual Vector Store - FAISS (local) + Pinecone (cloud) for robust, redundant retrieval
General Wellness Q&A - Answers health and beauty questions using web search (BraveSearch) or LLM knowledge
Conversation Memory - Maintains chat context for follow-up questions within a session
Graceful Fallbacks - Every component has a fallback path (see Fallback Strategy)

Architecture

User Query
    │
    ▼
┌─────────────────────┐
│  Query Classification │  ← LLM-based (GPT-4o) with keyword fallback
│  "product" / "general"│
└─────────┬───────────┘
          │
    ┌─────┴─────┐
    ▼           ▼
┌────────┐  ┌────────────┐
│Product │  │  General    │
│Pipeline│  │  Pipeline   │
└───┬────┘  └─────┬──────┘
    │             │
    ▼             ▼
┌────────────┐  ┌──────────────┐
│Category    │  │BraveSearch   │
│Detection   │  │  or          │
│+ Filter    │  │LLM Knowledge │
│Extraction  │  └──────┬───────┘
└─────┬──────┘         │
      ▼                │
┌──────────────────┐   │
│ Dual Retrieval   │   │
│ FAISS + Pinecone │   │
│ (Ensemble)       │   │
└─────┬────────────┘   │
      ▼                ▼
┌──────────────────────────┐
│   GPT-4o Response Gen    │
│   + Conversation Memory  │
└──────────────────────────┘

Tech Stack

Layer	Technology	Purpose
LLM	OpenAI GPT-4o	Query classification, response generation
Embeddings (primary)	OpenAI `text-embedding-3-small` (1536 dim)	Document & query vectorization
Embeddings (fallback)	`sentence-transformers/all-MiniLM-L6-v2` (384 dim)	Free local alternative
Vector Store (local)	FAISS	Local similarity search
Vector Store (cloud)	Pinecone (serverless, AWS us-east-1)	Cloud-based vector DB
RAG Framework	LangChain 0.3.20	Orchestration, retrievers, chains
Web Search	BraveSearch API	General query augmentation
Frontend	Streamlit	Chat-based web UI
Observability	LangSmith	LangChain tracing & debugging
Containerization	Docker + Docker Compose	Deployment

Project Structure

DermaGPT/
├── main.py                 # Streamlit app + full RAG pipeline (production)
├── main.ipynb              # Development notebook (data preprocessing + RAG prototyping)
├── requirements.txt        # Python dependencies (145 packages)
├── Dockerfile              # Python 3.13-slim container
├── docker-compose.yml      # Service orchestration
├── .env                    # API keys (not committed)
├── .gitignore
├── assignment_details.pdf  # Project requirements spec
│
├── data.csv                # Original dataset (~3,000 products, 12 MB)
├── df_skin.csv             # Preprocessed skincare products (2,079 rows)
├── df_hair.csv             # Preprocessed hair care products (599 rows)
├── df_vits_supp.csv        # Preprocessed vitamins/supplements (302 rows)
│
└── faiss_index/            # FAISS vector store (auto-generated, gitignored)

Getting Started

Prerequisites

Python 3.13+
An OpenAI API key (required)
A Pinecone API key (optional, falls back to FAISS-only)
A BraveSearch API key (optional, falls back to LLM knowledge)

Local Setup

# Clone the repository
git clone git@github.com:yash2002vardhan/DermaGPT.git
cd DermaGPT

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Create .env file (see Environment Variables section)

# Run the app
streamlit run main.py

The app will be available at http://localhost:8501.

Docker Setup

# Build and run
docker compose up --build

# Access at http://localhost:8501

Environment Variables

Create a .env file in the project root:

OPENAI_API_KEY=sk-...          # Required - powers GPT-4o and embeddings
PINECONE_API_KEY=pc-...        # Optional - enables cloud vector store
BRAVE_API_KEY=BSA...           # Optional - enables web search for general queries

Variable	Required	Impact if Missing
`OPENAI_API_KEY`	Yes	App will not function
`PINECONE_API_KEY`	No	Falls back to FAISS-only retrieval
`BRAVE_API_KEY`	No	General queries use LLM knowledge instead of web search

Usage

Product Queries

Ask about products using natural language. The system extracts category, price, brand, and rating filters automatically.

"Find me a sunscreen under 500 rupees"
"Hair shampoo for hairfall between 300-800 rs"
"Abbott brand skincare products"
"Vitamin B complex rated above 4 stars"
"Best moisturizer for acne-prone skin"

General Wellness Queries

"How to treat acne naturally?"
"Why do I have dandruff?"
"What vitamins help with hair growth?"
"Tips for glowing skin"

The chatbot maintains conversation context, so follow-up questions like "show me cheaper options" or "any alternatives?" work as expected.

Data Pipeline

Source Data

The original dataset (data.csv) contains ~3,000 health and beauty products with fields including product title, vendor, category, price (INR), brand, description, FAQs, ingredients, key benefits, concerns, and review ratings.

Preprocessing (main.ipynb)

Load & Clean - Rename columns, drop fields with >50% missing values
Extract Metadata - Parse structured fields (FAQs from JSON, benefits lists, ingredient lists, concern tags)
Categorize - Split into three category-specific CSVs:
- df_skin.csv - 2,079 skincare products
- df_hair.csv - 599 hair care products
- df_vits_supp.csv - 302 vitamins & supplements
Chunk & Embed - Text splitting (500 char chunks, 100 char overlap) followed by vectorization

RAG Pipeline Details

1. Query Classification

Incoming queries are classified as "product" or "general" using GPT-4o. If the LLM classification fails, a keyword-based fallback triggers on terms like: buy, recommend, price, brand, product, suggest, sunscreen, shampoo, serum, etc.

2. Category Detection

For product queries, regex patterns detect the relevant category:

Skin: acne, pimple, moisturizer, sunscreen, wrinkles, dark spots, etc.
Hair: hairfall, dandruff, shampoo, conditioner, split ends, frizz, etc.
Vitamins: biotin, collagen, multivitamin, omega, zinc, iron, etc.

3. Filter Extraction

Natural language filters are parsed via regex:

Price: "under 500 rupees", "between 300-800 rs", "less than 1000"
Brand: Matched against 300+ indexed brands
Rating: "rated above 4", "4.5 stars"

4. Dual Vector Store Retrieval

An Ensemble Retriever combines results from both stores with equal weighting:

Store	Type	Index	Dimensions
FAISS	Local	`faiss_index/` directory	1536 (OpenAI) or 384 (HuggingFace)
Pinecone	Cloud (serverless)	`clinikally-rag-2`	1536

Pinecone uses three namespaces: skin, hair, vitamins_supplements. Both stores are populated from the preprocessed CSVs on first initialization.

5. Response Generation

Product queries: Top 8 documents are formatted with title, price (INR), description, and category, then passed to GPT-4o for a numbered recommendation list
General queries: BraveSearch fetches 5 web results (or LLM knowledge as fallback), combined with a health disclaimer

6. Conversation Memory

ConversationBufferMemory (LangChain) stores the full chat history per session, enabling contextual follow-ups. Memory is session-scoped via Streamlit's st.session_state.

Fallback Strategy

Component	Primary	Fallback
Query Classification	GPT-4o LLM call	Keyword pattern matching
Vector Store	Pinecone (cloud)	FAISS (local)
Retrieval	Ensemble (FAISS + Pinecone)	Individual retrievers
Web Search	BraveSearch API	LLM general knowledge
Embeddings	OpenAI `text-embedding-3-small`	HuggingFace `all-MiniLM-L6-v2`

Supported Brands

The system indexes 300+ brands for brand-specific filtering, including:

Indian Pharma: Abbott, Cipla, Sun Pharma, Intas, Lupin, Glenmark
International Dermatology: Galderma, La Roche-Posay, Bioderma, ISDIN, Sesderma
Consumer Skincare: Neutrogena, Aveeno, CeraVe, Cetaphil, The Ordinary
K-Beauty: COSRX, Some By Mi, Belif, The Face Shop, KAHI

See the full brand list in the select_retrievers() function in main.py.

Docker Deployment

Dockerfile

Base image: python:3.13-slim
Installs build-essential for compiled dependencies
Exposes port 8501 (Streamlit default)
Runs streamlit run main.py --server.address=0.0.0.0

docker-compose.yml

Mounts project directory as volume (enables hot reload during development)
Loads environment variables from .env
Restart policy: unless-stopped

# Start
docker compose up --build -d

# View logs
docker compose logs -f app

# Stop
docker compose down

Limitations

Product catalog is static - ~3,000 products with potentially outdated pricing; no live inventory integration
Three categories only - Skincare, hair care, and vitamins/supplements
Session-only memory - Conversation history is not persisted across sessions (no database backend)
No authentication - No user accounts or personalized recommendations
Hardcoded model - GPT-4o with non-configurable temperature/parameters
Max 8 products per query - Response caps at 8 recommendations
OpenAI dependency - Core functionality requires an OpenAI API key (paid)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DermaGPT

Table of Contents

Features

Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Local Setup

Docker Setup

Environment Variables

Usage

Product Queries

General Wellness Queries

Data Pipeline

Source Data

Preprocessing (main.ipynb)

RAG Pipeline Details

1. Query Classification

2. Category Detection

3. Filter Extraction

4. Dual Vector Store Retrieval

5. Response Generation

6. Conversation Memory

Fallback Strategy

Supported Brands

Docker Deployment

Dockerfile

docker-compose.yml

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
assignment_details.pdf		assignment_details.pdf
data.csv		data.csv
df_hair.csv		df_hair.csv
df_skin.csv		df_skin.csv
df_vits_supp.csv		df_vits_supp.csv
docker-compose.yml		docker-compose.yml
main.ipynb		main.ipynb
main.py		main.py
requirements.txt		requirements.txt

yash2002vardhan/DermaGPT

Folders and files

Latest commit

History

Repository files navigation

DermaGPT

Table of Contents

Features

Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Local Setup

Docker Setup

Environment Variables

Usage

Product Queries

General Wellness Queries

Data Pipeline

Source Data

Preprocessing (main.ipynb)

RAG Pipeline Details

1. Query Classification

2. Category Detection

3. Filter Extraction

4. Dual Vector Store Retrieval

5. Response Generation

6. Conversation Memory

Fallback Strategy

Supported Brands

Docker Deployment

Dockerfile

docker-compose.yml

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages