Skip to content

kumar-ayan/data-retriever

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🌐 AI Data Retriever

An AI-powered data collection, semantic search, and analysis tool built with FastAPI, SQLite, and FAISS.
Collect, organize, and explore web content intelligently β€” locally and privately.

🧠 Overview

AI Data Retriever is an end-to-end Python application that:

  • Automatically scrapes content from any website
  • Stores and tracks data locally in a SQLite database
  • Converts text into vector embeddings for semantic search using FAISS
  • Provides a clean, fast web UI (HTML/CSS/JS served by FastAPI)

Think of it as your personal AI-powered web intelligence dashboard β€” perfect for research, knowledge management, or building datasets for AI/ML projects.


✨ Features

Feature Description
🧭 Web Scraper Extracts article titles, text, and metadata using BeautifulSoup.
πŸ’Ύ Local Database (SQLite) Stores and tracks all retrieved data securely.
🧠 Semantic Search Engine (FAISS) Search by meaning, not keywords.
🧩 AI Embeddings Uses SentenceTransformers (all-MiniLM-L6-v2) for vector representations.
🎨 Built-in Web UI HTML + CSS interface to add URLs, view pages, and perform AI searches.
βš™οΈ Offline-Ready No external APIs required β€” runs completely on your machine.
🧱 Modular Codebase Cleanly separated backend, scraper, embedder, and templates.

🧩 Tech Stack

Layer Technology Purpose
Backend 🐍 FastAPI API + Template rendering
Database πŸ’Ύ SQLite (SQLModel) Local structured storage
Scraper 🌐 Requests + BeautifulSoup Web content extraction
Embeddings 🧠 SentenceTransformers Text vectorization
Vector Search ⚑ FAISS Semantic similarity search
Frontend 🎨 HTML + CSS + JS Interactive dashboard
Optional 🧩 Playwright Dynamic site scraping (JS pages)

πŸ—οΈ Architecture

image

πŸš€ Getting Started

🧰 Requirements

  • Python 3.9+
  • pip (Python package manager)
  • (Optional) Playwright if you want dynamic page scraping

Installation

  1. Clone the repo:
   git clone https://github.com/yourusername/ai-data-retriever.git
   cd ai-data-retriever

2. Create virtual environment:
   python -m venv .venv
   source .venv/bin/activate

3. Install dependencies:
   pip install -r backend/requirements.txt

4. Run backend:
   uvicorn backend.app.main:app --reload

5. Open in browser:
   http://127.0.0.1:8000

Authors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published