-
Notifications
You must be signed in to change notification settings - Fork 0
Development #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request updates project documentation to align with current architecture and removes the Postgres service from Docker Compose, signaling a transition to Qdrant as the primary database.
- Major documentation overhaul with comprehensive project structure details
- Enhanced quick start guide with actual CLI usage examples and configuration guidance
- Simplified Docker Compose setup by removing Postgres dependency
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/PROJECT_STRUCTURE.md | Comprehensive restructure showing current modules, benchmarking system, and test organization |
| docs/QUICK_START_GUIDE.md | Major update distinguishing tutorial from real project with CLI examples and configuration samples |
| docs/SOSUM_INGESTION.md | Improved dataset download instructions with clearer directory structure |
| docker-compose.yml | Removed Postgres service and related environment variables |
| README.md | Complete overhaul with modern features, benchmarking capabilities, and production-ready documentation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| ```bash | ||
| # Clone the SOSum repository | ||
| git clone https://github.com/BonanKou/SOSum-A-Dataset-of-Extractive-Summaries-of-Stack-Overflow-Posts-and-labeling-tools.git sosum | ||
| # Clone the SOSum repository into the datasets directory |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction to 'cd datasets/' assumes the datasets directory exists. Consider adding a command to create the directory first: 'mkdir -p datasets/' before the 'cd datasets/' command.
| # Clone the SOSum repository into the datasets directory | |
| # Clone the SOSum repository into the datasets directory | |
| mkdir -p datasets/ |
| mkdir -p datasets/sosum | ||
| cd datasets/sosum | ||
| # Download from https://github.com/BonanKou/SOSum-A-Dataset-of-Extractive-Summaries-of-Stack-Overflow-Posts-and-labeling-tools | ||
| # Test the adapter (dry run) | ||
| python bin/ingest.py ingest stackoverflow datasets/sosum/data --config config.yml --dry-run --max-docs 10 --verbose |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path 'datasets/sosum/data' doesn't match the earlier instruction that clones into 'sosum_source'. This should be 'datasets/sosum_source/data' to be consistent with the SOSUM_INGESTION.md instructions.
| ```bash | ||
| git fork https://github.com/your-org/thesis-rag-system | ||
| cd thesis-rag-system | ||
| ``` |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'git fork' command doesn't exist in standard Git. This should be instructions to fork via GitHub UI or use 'git clone' after forking through the web interface.
| ```bash | |
| git fork https://github.com/your-org/thesis-rag-system | |
| cd thesis-rag-system | |
| ``` | |
| - Go to [https://github.com/your-org/thesis-rag-system](https://github.com/your-org/thesis-rag-system) and click **Fork** (top right) to create your own copy. | |
| - Clone your fork (replace `<your-username>` with your GitHub username): | |
| ```bash | |
| git clone https://github.com/<your-username>/thesis-rag-system.git | |
| cd thesis-rag-system | |
| ``` |
|
This pull request focuses on two main areas: removing the PostgreSQL service from the Docker Compose setup, and significantly expanding and updating the project documentation. The documentation now provides a much more comprehensive overview of the project structure, available utilities, benchmarking system, and testing organization. Additionally, the quick start guide has been rewritten to clarify the difference between a simplified tutorial and the advanced features of the actual implementation, and to provide clear instructions for using the real CLI and configuration files. Infrastructure changes:
Documentation improvements:
|
This pull request primarily updates project documentation to better reflect the current architecture, features, and usage patterns of the MLOps pipeline for RAG. It also removes the Postgres service from the development Docker Compose setup, signaling a move to Qdrant as the primary database. The documentation now includes comprehensive project structure, advanced usage notes, and improved quick start guidance.
Key changes include:
1. Documentation and Project Structure Enhancements
docs/PROJECT_STRUCTURE.mdto provide a detailed, up-to-date overview of all project modules, including new directories for benchmarking, scenarios, datasets, logs, and scripts. The structure now reflects recent additions and clarifies the roles of each component. [1] [2] [3] [4]tests/pipeline/directory, test runner, and types of tests available.2. Quick Start and Usage Documentation Improvements
docs/QUICK_START_GUIDE.mdnow clearly distinguishes between the simplified tutorial and the advanced real project, providing references to setup instructions, CLI usage, and configuration files.bin/ingest.py). [1] [2]3. Dataset Ingestion Documentation Update
docs/SOSUM_INGESTION.mdfor obtaining and organizing the SOSum dataset, clarifying directory structure and download steps.4. Docker Compose Simplification
docker-compose.yml, along with all related environment variables and volumes, reflecting a shift to Qdrant as the sole vector database.5. Minor Documentation Corrections and Additions
These changes make the documentation more accurate, user-friendly, and aligned with the project's current state.