Skip to content

Implement PostgreSQL + FAISS Hybrid Storage for Secure PII Management #38#42

Open
massoudsh wants to merge 1 commit intopooyaphoenix:mainfrom
massoudsh:feat/postgres-faiss-hybrid-38
Open

Implement PostgreSQL + FAISS Hybrid Storage for Secure PII Management #38#42
massoudsh wants to merge 1 commit intopooyaphoenix:mainfrom
massoudsh:feat/postgres-faiss-hybrid-38

Conversation

@massoudsh
Copy link
Copy Markdown
Contributor

Summary

Implements the PostgreSQL + FAISS hybrid described in #38: PII is stored encrypted in PostgreSQL, FAISS is used only for vector search (no PII). Backward compatible: without DATABASE_URL / USE_PG_STORAGE, behavior is unchanged (pickle only).

Closes #38

What’s included

  • Postgres: docker-compose service, schema (pii_records, documents, query_logs), init script
  • Encryption: Fernet (app/utils/pii_encryption.py) for passage text; key via ENCRYPTION_KEY
  • DB layer: app/db/store.py — save/load passages (encrypted), audit query_logs
  • RetrieverAgent: When USE_PG_STORAGE=true, meta load/save uses Postgres; FAISS unchanged; fallback to pickle if PG empty or unavailable
  • Audit: Each /query logs to query_logs when PG is enabled
  • Migration: scripts/migrate_pickle_to_pg.py to copy existing pickle data into Postgres (encrypted)
  • Deps: psycopg2-binary, cryptography, sqlalchemy; .env.example for DATABASE_URL, ENCRYPTION_KEY

Question for maintainer

Is the hybrid (FAISS + Postgres) what you want, or would you prefer a full move to PostgreSQL?

  • Hybrid (this PR): Keeps FAISS for vector search; Postgres only for encrypted PII and audit. No change to retrieval performance; minimal change to indexing path.
  • Full Postgres: Use pgvector for embeddings in Postgres as well, and drop FAISS/pickle. Single store, simpler ops, but requires pgvector and a decision on indexing/migration.

If you’d rather go full Postgres (e.g. pgvector), we can treat this as a stepping stone and plan a follow-up that migrates vectors into Postgres and removes FAISS. Happy to adjust direction based on your preference.

Made with Cursor

- Add Postgres service and schema (pii_records, documents, query_logs)
- Encrypt passage text with Fernet; store in Postgres, FAISS for vectors only
- RetrieverAgent: optional PG meta load/save when USE_PG_STORAGE=true
- Audit logging: query_logs for every /query when PG enabled
- Migration script: migrate_pickle_to_pg.py
- Backward compatible: no PG = existing pickle behavior

Closes pooyaphoenix#38

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement PostgreSQL + FAISS Hybrid Storage for Secure PII Management

1 participant