Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
df599be
Add PostgresController and connection test script for PostgreSQL inte…
spyrchat May 20, 2025
9045ddf
Implement image and table asset insertion methods in PostgresControll…
spyrchat May 21, 2025
31a2a7e
Add table extraction and SQL uploading functionality; refactor import…
spyrchat May 29, 2025
bdb9037
Add PDF processing, table extraction, and text chunking functionality…
spyrchat May 29, 2025
bcc92da
Enhance embedding pipeline with dynamic embedding strategy; add PDF p…
spyrchat May 30, 2025
d400d85
Refactor import statements to use relative paths; update sandbox dire…
spyrchat May 30, 2025
40d43c9
Enhance Qdrant document insertion with error handling and logging; up…
spyrchat Jun 5, 2025
d6c07b5
Add table extraction functionality with logging; implement PDF proces…
spyrchat Jun 5, 2025
f75f74f
Implement modular RAG pipeline with query interpretation, SQL plannin…
spyrchat Jul 8, 2025
53d213b
Add Dockerfile, docker-compose.yml, and main application logic; imple…
spyrchat Jul 8, 2025
985440e
Refactor QdrantVectorDB: remove unused import and add spacing; update…
spyrchat Jul 9, 2025
20a99bd
Updated requirements.txt
spyrchat Jul 9, 2025
8c08f87
full pipeline is functional
spyrchat Jul 9, 2025
b6feff5
Added logging
spyrchat Jul 9, 2025
4187574
Added docstrings for clarity
spyrchat Jul 9, 2025
fa53084
Added Docstrings
spyrchat Jul 9, 2025
346f0d6
added config.yml
spyrchat Jul 9, 2025
268c15b
System Works with config.yml
spyrchat Jul 9, 2025
4e2d2c7
Feat Agent Works as intended
spyrchat Jul 9, 2025
4b71446
Add smoke tests, vector store uploader, and document validator
spyrchat Aug 20, 2025
2bd4a0c
feat: Add minimal SOSum ingestion test and standalone processor
spyrchat Aug 20, 2025
a3fd333
feat: Enhance data handling and validation in ingestion pipeline
spyrchat Aug 21, 2025
a23242b
Add Quick Start Guide for MLOps Pipeline and implement core components
spyrchat Aug 21, 2025
811b2c6
feat: Implement Stack Overflow adapter analysis and testing tools
spyrchat Aug 21, 2025
663dbbd
feat: Add answer metadata tests and enhance answer retrieval output
spyrchat Aug 21, 2025
00586f0
Add experimental and hybrid retrieval configurations, enhance testing…
spyrchat Aug 21, 2025
3add6e6
Add unit tests for retrieval pipeline and related components
spyrchat Aug 21, 2025
28d11ed
feat: Update dependencies in requirements.txt and add new packages
spyrchat Aug 30, 2025
db65791
feat: Enhance embedding strategy configuration and improve smoke test…
spyrchat Aug 30, 2025
c353fe2
Refactor retrieval pipeline to modern architecture
spyrchat Aug 30, 2025
439708a
Refactor configuration loading and retriever initialization
spyrchat Aug 30, 2025
32a3daf
feat: Consolidate configuration system and enhance benchmark function…
spyrchat Aug 30, 2025
8483973
feat: Enhance benchmark evaluation by implementing NaN handling for m…
spyrchat Aug 30, 2025
9acb29c
feat: Improve document ID handling and external ID preservation in Qd…
spyrchat Aug 30, 2025
3aeceee
Refactor benchmark scripts and retrievers for improved functionality …
spyrchat Aug 30, 2025
d18cdc4
Add dataset configurations for Natural Questions and SOSum Stack Over…
spyrchat Aug 30, 2025
22500db
Remove obsolete test files and add a new local end-to-end test setup …
spyrchat Aug 30, 2025
10b6620
chore: Update Python version to 3.13 in pipeline tests
spyrchat Aug 30, 2025
056f007
chore: Update testing dependencies and Python version in CI workflows
spyrchat Aug 30, 2025
7323f4b
refactor: Simplify dependency management by removing requirements-tes…
spyrchat Aug 31, 2025
a81237f
Remove outdated documentation and SQL components; reorganize configur…
spyrchat Aug 31, 2025
b39c51d
chore: Update requirements-minimal.txt to include missing dependencie…
spyrchat Aug 31, 2025
e1beb8b
chore: Add missing dependencies for boto3, botocore, and langchain-qd…
spyrchat Aug 31, 2025
149ab30
refactor: Enhance Qdrant connectivity tests and remove outdated requi…
spyrchat Aug 31, 2025
63d29af
chore: Remove outdated GitHub Actions CI configuration and local test…
spyrchat Aug 31, 2025
18903d8
chore: Remove outdated example scripts and sample data files for retr…
spyrchat Aug 31, 2025
3393ec5
Fix Google dependencies conflict in requirements.txt
spyrchat Aug 31, 2025
9415e07
fix: Remove unnecessary blank line in insert_documents method
spyrchat Sep 7, 2025
2748761
fix: Improve .env loading and add default values for Qdrant configura…
spyrchat Sep 7, 2025
8d6d16f
fix: Update Qdrant service configuration for improved health checks a…
spyrchat Sep 7, 2025
8163734
fix: Improve Qdrant health check commands and update logging messages…
spyrchat Sep 7, 2025
44baa40
fix: Update Qdrant health check commands to use the correct endpoint …
spyrchat Sep 7, 2025
c128036
fix: Update Qdrant health check commands for improved readiness verif…
spyrchat Sep 7, 2025
0068770
fix: Enhance Qdrant readiness check with retry logic and timeout hand…
spyrchat Sep 7, 2025
b2f8882
fix: Update Qdrant readiness check to use the correct endpoint
spyrchat Sep 7, 2025
eebb5d6
fix: Update Qdrant health check endpoints and enhance pipeline test c…
spyrchat Sep 7, 2025
4fe9c6b
refactor: Remove redundant pipeline test configurations and streamlin…
spyrchat Sep 7, 2025
c3abc86
fix: Simplify commands and enhance output messages in pipeline tests
spyrchat Sep 7, 2025
614692e
changed check in git workflows from ok to HTTP 200
spyrchat Sep 7, 2025
fae42db
fix: Clean up whitespace and improve readability in end-to-end pipeli…
spyrchat Sep 7, 2025
2bd6b2e
fix: Update end-to-end test configuration and enhance Qdrant connecti…
spyrchat Sep 8, 2025
5809813
fix: Enhance Qdrant configuration handling and improve error manageme…
spyrchat Sep 8, 2025
3715f8d
fix: Remove end-to-end tests from pipeline configuration to streamlin…
spyrchat Sep 8, 2025
3ef36b0
Merge pull request #18 from spyrchat/benchmark
spyrchat Sep 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions .github/workflows/pipeline-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
name: Pipeline Tests

on:
push:
branches: [ main, development ]
pull_request:
branches: [ main, development ]

jobs:
test-minimal:
name: Minimal Pipeline Tests
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: false # avoid .gitmodules errors

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.13'

- name: Install dependencies
run: pip install -r tests/requirements-minimal.txt

- name: Run minimal pipeline tests (no external services)
run: python -m pytest tests/pipeline/test_minimal_pipeline.py tests/pipeline/test_components.py -v --tb=short

- name: Run configuration validation
run: |
python -c "
import yaml, sys
configs = ['config.yml', 'pipelines/configs/retrieval/ci_google_gemini.yml']
for config in configs:
try:
with open(config) as f:
yaml.safe_load(f)
print(f'{config} is valid')
except Exception as e:
print(f'{config} failed: {e}')
sys.exit(1)
"

test-integration:
name: Integration Tests with Qdrant
runs-on: ubuntu-latest

services:
qdrant:
image: qdrant/qdrant:latest
ports:
- 6333:6333 # REST
- 6334:6334 # gRPC
env:
QDRANT__SERVICE__HTTP_PORT: 6333
QDRANT__SERVICE__GRPC_PORT: 6334

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: false

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.13'

- name: Install dependencies
run: pip install -r tests/requirements-minimal.txt

- name: Wait for Qdrant to be ready (readiness + API check)
run: |
for i in {1..60}; do
if curl -fsS http://127.0.0.1:6333/readyz > /dev/null && \
curl -fsS http://127.0.0.1:6333/collections > /dev/null; then
echo "Qdrant is ready!"
exit 0
fi
echo "Waiting for Qdrant ($i/60)..."
sleep 2
done
echo "Qdrant did not become ready in time"
exit 1

- name: Test Qdrant connectivity
run: python -m pytest tests/pipeline/test_qdrant_connectivity.py -v --tb=short

- name: Run basic integration tests
run: python -m pytest tests/pipeline/ -v --tb=short -m "not requires_api"

test-security:
name: Security and Config Validation
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: false

- name: Check for hardcoded secrets
run: |
if grep -r "sk-" . --exclude-dir=.git --exclude="*.md" --exclude="*.yml"; then
echo "Found potential hardcoded API keys"
exit 1
fi
if grep -r "google_api_key.*=" . --exclude-dir=.git --exclude="*.md" --exclude="*.yml" | grep -v "getenv\|environ"; then
echo "Found potential hardcoded Google API keys"
exit 1
fi
echo "No hardcoded secrets found"

- name: Validate configuration structure
run: |
python -c "
import yaml
with open('pipelines/configs/retrieval/ci_google_gemini.yml') as f:
config = yaml.safe_load(f)
assert 'retrieval_pipeline' in config
assert 'retriever' in config['retrieval_pipeline']
assert 'embedding' in config['retrieval_pipeline']['retriever']
assert 'google' == config['retrieval_pipeline']['retriever']['embedding']['dense']['provider']
assert 'GOOGLE_API_KEY' == config['retrieval_pipeline']['retriever']['embedding']['dense']['api_key_env']
print('Configuration structure is valid')
"
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,14 @@ climate-fever
*.log
__pycache__
sandbox/*
/__pycache__
/__pycache__
synthetic_dataset\text_dataset_template.json
extraction_output/
.idea/misc.xml
.idea/modules.xml
.idea/Thesis.iml
.idea/vcs.xml
.idea/inspectionProfiles/profiles_settings.xml
*.json
*.csv

18 changes: 18 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Use a slim Python base image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system packages if needed
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the full source code
COPY . .
Loading
Loading