Skip to content

Successfully developed a Multi-Source Knowledge Intelligence Agent - an advanced agentic AI application that integrates multi-format document and web data processing with LLM-powered analytics and conversational intelligence. It analyzes multi-source data (URL/files) with AI-driven insights, sentiment analysis, knowledge graphs, and chat interface.

License

Notifications You must be signed in to change notification settings

SayamAlt/Multi-Source-Knowledge-Intelligence-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Multi-Source Knowledge Intelligence Agent

Successfully developed a Streamlit-based AI system that intelligently analyzes and interprets information from multiple data sources - including URLs and uploaded documents (PDF, DOCX, CSV, PPTX, TXT, JSON). It performs automated summarization, sentiment analysis, intent detection, stance interpretation, and knowledge graph visualization, while also enabling an interactive chat interface powered by Retrieval-Augmented Generation (RAG) and contextual memory.


🚀 Key Features

  • Multi-Source Input: Supports both URL-based and file-based knowledge ingestion.
  • Automated Insights: Generates summary, sentiment, intent, and stance analysis.
  • Dynamic Knowledge Graphs: Visualizes entity relationships with interactive graph rendering.
  • Conversational Agent: A context-aware chatbot that answers factual and personal queries using RAG and intent classification.
  • Robust Session Handling: Independent workflows for URL and file modes with state management and persistent memory.
  • Seamless User Experience: Real-time feedback, auto-refresh, and intuitive interface design.

🧠 Tech Stack

  • Frontend: Streamlit
  • Backend: LangGraph, LangChain, LangSmith, FAISS Vector Store, Python
  • LLM: OpenAI GPT (via langchain_openai)
  • Visualization: Matplotlib, NetworkX
  • Memory & Retrieval: ConversationalRetrievalChain, ContextualCompressionRetriever

⚙️ Installation

# Clone the repository
git clone https://github.com/yourusername/multi-source-knowledge-agent.git
cd multi-source-knowledge-agent

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🔑 Environment Setup

Create a .env file in the root directory and add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

▶️ Usage

streamlit run app.py

Then open the displayed local URL in your browser.

Choose between:

  • 🌐 URL Mode: Enter a web link to extract insights.
  • 📁 File Upload Mode: Upload a supported document for intelligent analysis.

After processing, you will receive:

  • Comprehensive report (summary, sentiment, stance, etc.)
  • Interactive knowledge graph
  • RAG-powered chatbot interface

🧩 Core Functionalities

Component Description
Workflow Engine Orchestrates knowledge extraction and LLM-driven insights.
Intent Classifier Categorizes user queries into personal, factual, or hybrid.
Memory Recall System Remembers contextual user information and relevant past facts.
RAG Chatbot Answers user queries using retrieved document context and memory.
Graph Visualizer Displays structured knowledge as interconnected entities.

🧩 System Architecture Overview

flowchart TD

    subgraph UI["🖥️ Streamlit Frontend"]
        A1["🌐 URL / 📁 File Input"]
        A2["⚙️ Workflow Execution"]
        A3["💬 Conversational Agent Interface"]
    end

    subgraph CORE["🧠 LangChain Core Pipeline"]
        B1["Retriever (FAISS VectorStore)"]
        B2["LLM (OpenAI GPT via LangChain)"]
        B3["Memory (ConversationBufferMemory)"]
        B4["Contextual Compression (LLMChainExtractor)"]
        B5["Intent Classifier (OpenAI Mini LLM)"]
    end

    subgraph WORKFLOW["🔄 LangGraph Workflow Orchestration"]
        C1["Document Preprocessing"]
        C2["Content Extraction (URL/File)"]
        C3["Embedding Generation"]
        C4["Knowledge Graph Construction (NetworkX)"]
        C5["Insights Generation (Summary, Sentiment, Intent, Stance)"]
    end

    subgraph OBSERVABILITY["📊 LangSmith & Monitoring"]
        D1["Trace & Debug Chains"]
        D2["Performance Metrics & Latency Tracking"]
        D3["Prompt Optimization & Error Reporting"]
    end

    subgraph STORAGE["🗃️ Persistent Storage Layer"]
        E1["FAISS Vector Database"]
        E2["Temporary File Cache"]
        E3["Session & State Management"]
    end

    %% Connections
    UI --> WORKFLOW
    WORKFLOW --> CORE
    CORE --> STORAGE
    CORE --> OBSERVABILITY
    WORKFLOW --> OBSERVABILITY
    UI --> CORE
Loading

⚙️ Data Flow Summary

  1. Input Phase: User provides a URL or uploads a document (PDF, DOCX, CSV, etc.).
  2. Extraction & Processing: The LangGraph workflow extracts raw text, cleans it, and generates embeddings.
  3. Analysis: LangChain orchestrates summarization, sentiment analysis, and intent detection using connected LLMs.
  4. Knowledge Graph Generation: Extracted entities and relationships are visualized interactively using NetworkX.
  5. Conversational Querying: A RAG-based Conversational Agent allows users to query the processed content contextually.
  6. Monitoring: LangSmith tracks every chain invocation, latency, and model performance for continuous observability.

🧱 Architectural Highlights

  • 🧩 Modular, Agentic Design — Each component (ingestion, analysis, visualization) runs as an independent node within a LangGraph workflow.
  • 🧠 Dynamic Context Memory — Past user queries and responses are retained via LangChain Memory for personalized interactions.
  • 🔄 Isolated Workflows per Input Mode — URL and File pipelines function independently with clean session and cache resets.
  • 🪶 Observability First — Full tracing, debugging, and metrics through LangSmith integration.
  • 🚀 Scalable Foundation — Designed to easily extend toward multi-user, API-driven, or enterprise knowledge management use cases.

📈 Example Use Cases

  • Academic and research paper summarization
  • Intelligent document-based Q&A
  • Sentiment and author stance profiling
  • Context-aware data interpretation
  • Enterprise knowledge management

🌟 Future Enhancements

  • Citation tracing and evidence linking
  • Multi-user shared memory persistence
  • Integration with dashboards and APIs
  • Support for additional formats (Excel, HTML scraping)

🧾 License

Released under the Apache 2.0 License — free for modification and distribution with attribution.


🤝 Acknowledgments

Built with ❤️ using LangGraph, Streamlit, LangChain, FAISS, LangSmith, and OpenAI GPT to enable intelligent, explainable, and interactive document analysis.

About

Successfully developed a Multi-Source Knowledge Intelligence Agent - an advanced agentic AI application that integrates multi-format document and web data processing with LLM-powered analytics and conversational intelligence. It analyzes multi-source data (URL/files) with AI-driven insights, sentiment analysis, knowledge graphs, and chat interface.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages