Successfully developed a Streamlit-based AI system that intelligently analyzes and interprets information from multiple data sources - including URLs and uploaded documents (PDF, DOCX, CSV, PPTX, TXT, JSON). It performs automated summarization, sentiment analysis, intent detection, stance interpretation, and knowledge graph visualization, while also enabling an interactive chat interface powered by Retrieval-Augmented Generation (RAG) and contextual memory.
- Multi-Source Input: Supports both URL-based and file-based knowledge ingestion.
- Automated Insights: Generates summary, sentiment, intent, and stance analysis.
- Dynamic Knowledge Graphs: Visualizes entity relationships with interactive graph rendering.
- Conversational Agent: A context-aware chatbot that answers factual and personal queries using RAG and intent classification.
- Robust Session Handling: Independent workflows for URL and file modes with state management and persistent memory.
- Seamless User Experience: Real-time feedback, auto-refresh, and intuitive interface design.
- Frontend: Streamlit
- Backend: LangGraph, LangChain, LangSmith, FAISS Vector Store, Python
- LLM: OpenAI GPT (via langchain_openai)
- Visualization: Matplotlib, NetworkX
- Memory & Retrieval: ConversationalRetrievalChain, ContextualCompressionRetriever
# Clone the repository
git clone https://github.com/yourusername/multi-source-knowledge-agent.git
cd multi-source-knowledge-agent
# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the root directory and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_herestreamlit run app.pyThen open the displayed local URL in your browser.
Choose between:
- 🌐 URL Mode: Enter a web link to extract insights.
- 📁 File Upload Mode: Upload a supported document for intelligent analysis.
After processing, you will receive:
- Comprehensive report (summary, sentiment, stance, etc.)
- Interactive knowledge graph
- RAG-powered chatbot interface
| Component | Description | 
|---|---|
| Workflow Engine | Orchestrates knowledge extraction and LLM-driven insights. | 
| Intent Classifier | Categorizes user queries into personal, factual, or hybrid. | 
| Memory Recall System | Remembers contextual user information and relevant past facts. | 
| RAG Chatbot | Answers user queries using retrieved document context and memory. | 
| Graph Visualizer | Displays structured knowledge as interconnected entities. | 
flowchart TD
    subgraph UI["🖥️ Streamlit Frontend"]
        A1["🌐 URL / 📁 File Input"]
        A2["⚙️ Workflow Execution"]
        A3["💬 Conversational Agent Interface"]
    end
    subgraph CORE["🧠 LangChain Core Pipeline"]
        B1["Retriever (FAISS VectorStore)"]
        B2["LLM (OpenAI GPT via LangChain)"]
        B3["Memory (ConversationBufferMemory)"]
        B4["Contextual Compression (LLMChainExtractor)"]
        B5["Intent Classifier (OpenAI Mini LLM)"]
    end
    subgraph WORKFLOW["🔄 LangGraph Workflow Orchestration"]
        C1["Document Preprocessing"]
        C2["Content Extraction (URL/File)"]
        C3["Embedding Generation"]
        C4["Knowledge Graph Construction (NetworkX)"]
        C5["Insights Generation (Summary, Sentiment, Intent, Stance)"]
    end
    subgraph OBSERVABILITY["📊 LangSmith & Monitoring"]
        D1["Trace & Debug Chains"]
        D2["Performance Metrics & Latency Tracking"]
        D3["Prompt Optimization & Error Reporting"]
    end
    subgraph STORAGE["🗃️ Persistent Storage Layer"]
        E1["FAISS Vector Database"]
        E2["Temporary File Cache"]
        E3["Session & State Management"]
    end
    %% Connections
    UI --> WORKFLOW
    WORKFLOW --> CORE
    CORE --> STORAGE
    CORE --> OBSERVABILITY
    WORKFLOW --> OBSERVABILITY
    UI --> CORE
    - Input Phase: User provides a URL or uploads a document (PDF, DOCX, CSV, etc.).
- Extraction & Processing: The LangGraph workflow extracts raw text, cleans it, and generates embeddings.
- Analysis: LangChain orchestrates summarization, sentiment analysis, and intent detection using connected LLMs.
- Knowledge Graph Generation: Extracted entities and relationships are visualized interactively using NetworkX.
- Conversational Querying: A RAG-based Conversational Agent allows users to query the processed content contextually.
- Monitoring: LangSmith tracks every chain invocation, latency, and model performance for continuous observability.
- 🧩 Modular, Agentic Design — Each component (ingestion, analysis, visualization) runs as an independent node within a LangGraph workflow.
- 🧠 Dynamic Context Memory — Past user queries and responses are retained via LangChain Memory for personalized interactions.
- 🔄 Isolated Workflows per Input Mode — URL and File pipelines function independently with clean session and cache resets.
- 🪶 Observability First — Full tracing, debugging, and metrics through LangSmith integration.
- 🚀 Scalable Foundation — Designed to easily extend toward multi-user, API-driven, or enterprise knowledge management use cases.
- Academic and research paper summarization
- Intelligent document-based Q&A
- Sentiment and author stance profiling
- Context-aware data interpretation
- Enterprise knowledge management
- Citation tracing and evidence linking
- Multi-user shared memory persistence
- Integration with dashboards and APIs
- Support for additional formats (Excel, HTML scraping)
Released under the Apache 2.0 License — free for modification and distribution with attribution.
Built with ❤️ using LangGraph, Streamlit, LangChain, FAISS, LangSmith, and OpenAI GPT to enable intelligent, explainable, and interactive document analysis.