Skip to content

Implement permission-aware RAG and second-brain memory layer #119

Description

@senolcolak

Goal

Implement a proper RAG / second-brain layer for RustShare so that notes, Obsidian vaults, Markdown files, attachments, decisions, standup records, kanban cards, and workspace documents become queryable, cited, permission-aware memory.

This should not be implemented as a simple “chatbot over files”. The goal is to turn RustShare into a durable personal/company memory layer where every useful artifact can become a structured memory object.

Background

RustShare already collects Obsidian and other user documents. This creates the foundation for a second-brain system. The missing layer is a reliable indexing, retrieval, permission-filtering, and answer-generation pipeline.

The RAG layer should support use cases such as:

  • Ask questions across the whole workspace
  • Ask questions inside a selected folder
  • Ask questions about the current note
  • Find related notes, files, decisions, and kanban cards
  • Summarize project memory
  • Explain decision history
  • Generate ADRs or implementation notes from existing discussions and documents
  • Detect outdated, conflicting, or duplicate knowledge

Product direction

RustShare should treat information as structured memory objects first, not only as vector chunks.

Example memory sources:

  • Markdown notes
  • Obsidian vault files
  • Obsidian frontmatter, tags, folders, and backlinks
  • PDFs and text-based attachments
  • Decisions / ADRs
  • Standup records
  • Kanban cards
  • Chat or discussion records, if available later
  • File versions and update history

Each object should preserve metadata such as:

  • object_id
  • workspace_id
  • owner_id
  • source_type
  • source_path
  • file_name
  • document_title
  • heading_path
  • tags
  • backlinks
  • created_at
  • updated_at
  • permissions
  • importance
  • object_status

Important note: the file/note name must be treated separately from the first Markdown H1 heading. The file identity and document content title are both useful metadata, but they must not be coupled.

Proposed architecture

RustShare files / notes / Obsidian / kanban / decisions
        ↓
Parser + metadata extractor
        ↓
Canonical memory objects
        ↓
Chunking + embeddings + keyword index
        ↓
Hybrid retrieval + permission filters + reranking
        ↓
LLM answer with citations and source links

Recommended initial stack:

  • PostgreSQL for metadata, objects, permissions, and indexing state
  • pgvector for embeddings in the first implementation
  • PostgreSQL full-text search for keyword search in the first implementation
  • Optional later: Qdrant, Tantivy, Meilisearch, OpenSearch, or Vespa if scale requires it
  • Background indexing workers
  • Provider abstraction for embeddings and LLM calls

Core requirements

1. Canonical memory object model

Create a RustShare memory object model that can represent different source types in a common format.

Supported initial source types:

  • markdown_note
  • obsidian_note
  • attachment_text
  • pdf_document
  • decision
  • standup_record
  • kanban_card

Each memory object should be linked back to the original RustShare artifact and workspace.

2. Chunking and embedding pipeline

Implement an indexing pipeline that:

  • Detects file/object changes
  • Parses content
  • Extracts metadata
  • Chunks content according to source type
  • Generates embeddings
  • Stores chunks in a vector index
  • Stores keyword-searchable text
  • Marks indexing state and errors

Markdown / Obsidian chunking should respect heading structure:

# Main title
## Section
### Subsection

Each chunk should preserve:

  • Source object ID
  • Workspace ID
  • Folder path
  • Heading path
  • Tags
  • Line or section position where available
  • Page number for PDFs where available
  • Last updated timestamp
  • Permission scope

3. Hybrid retrieval

Retrieval must not rely only on vector search.

Implement hybrid retrieval using:

  • Vector search
  • Keyword / full-text search
  • Metadata filters
  • Workspace filters
  • Folder filters
  • Source-type filters
  • Permission filters
  • Optional reranking

Retrieval flow:

User query
  ↓
Normalize / optionally rewrite query
  ↓
Apply permission and workspace filters
  ↓
Vector search + keyword search
  ↓
Merge and rank results
  ↓
Optional rerank
  ↓
Build context package
  ↓
Generate answer with citations

4. Permission-aware retrieval

Permissions must be enforced before generation, not after.

The model must never receive context that the current user is not allowed to access.

Filtering should include:

  • Workspace membership
  • Object visibility
  • Owner permissions
  • Group permissions
  • Shared-link restrictions if applicable later
  • Archived/deleted object exclusion unless explicitly requested

5. RAG UX features

Implement the following first-level UX features:

  • Ask Workspace
  • Ask Folder
  • Ask Current Note
  • Find Related Notes
  • Find Related Decisions
  • Summarize Current Folder

The answer should always include source references.

Example answer structure:

Answer

Sources
- /Obsidian/Projects/RustShare/RAG.md
- /Decisions/ADR-004-obsidian-sync.md
- Kanban card: Implement vault sync

Related items
- ...

6. Obsidian-aware indexing

For Obsidian-synced content, preserve:

  • Folder structure
  • Markdown content
  • YAML frontmatter
  • Tags
  • Wiki links [[...]]
  • Backlinks
  • Attachments where possible

RustShare should not frame this as replacing Obsidian. It should be positioned as compatible memory infrastructure for Markdown/Obsidian-style vault content.

7. Source citations and traceability

Every generated answer must show where the information came from.

Source references should include where available:

  • File name
  • Path
  • Heading
  • Page number
  • Kanban card title
  • Decision title
  • Standup record date
  • Last updated date

Answers without sources should be clearly marked as low-confidence or unsupported.

Suggested phases

Phase 1 — Searchable memory MVP

  • PostgreSQL + pgvector schema
  • Memory object table
  • Memory chunk table
  • Markdown parser
  • Obsidian-compatible Markdown ingestion
  • Basic PDF/text extraction
  • Basic embedding generation
  • Workspace/folder/note scoped RAG queries
  • Source citations
  • Permission filtering before retrieval

Phase 2 — RustShare-native memory

  • Obsidian frontmatter support
  • Obsidian tags and backlinks
  • Kanban card indexing
  • Decision/ADR indexing
  • Standup record indexing
  • Related notes panel
  • Related decisions panel

Phase 3 — Retrieval quality

  • Hybrid retrieval
  • Reranking
  • Query rewriting
  • Time-aware retrieval
  • Importance weighting
  • Pinned-source boosting
  • Deduplication of near-identical chunks

Phase 4 — Second-brain intelligence

  • Project memory summaries
  • Decision timelines
  • Auto-generated ADR drafts
  • Conflict detection
  • Outdated-note detection
  • Duplicate-note detection
  • Weekly or project-level memory digest

Phase 5 — Company RAG readiness

  • Workspace-level ACL hardening
  • Group-aware retrieval
  • Audit logs for AI access
  • Admin controls
  • Data retention options
  • Per-workspace model and embedding settings

Acceptance criteria

  • RustShare can index Markdown notes from a workspace.
  • RustShare can index Obsidian-style Markdown files while preserving path, tags, frontmatter, and links where possible.
  • RustShare stores canonical memory objects separately from chunks.
  • RustShare stores embeddings for chunks.
  • RustShare supports full-text or keyword search in addition to vector search.
  • User can ask a question across a workspace.
  • User can ask a question scoped to a folder.
  • User can ask a question scoped to the currently open note.
  • Answers include citations/source references.
  • Retrieval respects workspace and object permissions before any context is sent to the LLM.
  • Updating a note/file triggers re-indexing.
  • Deleted or archived objects are removed from active retrieval.
  • Related notes can be shown for the current note.
  • The file/note name and first Markdown H1 are treated as separate metadata fields.

Non-goals for the first implementation

  • Full autonomous agent behavior
  • Automatic file modification by the AI
  • Cross-workspace memory without explicit permission
  • Complex graph database implementation
  • Full enterprise compliance layer
  • Perfect parsing for every file format
  • Replacing Obsidian or using misleading Obsidian branding

Implementation notes

Start simple and reliable:

  • Prefer PostgreSQL + pgvector for the first implementation.
  • Keep original source files as the source of truth.
  • Make the index rebuildable.
  • Do not store only embeddings; always keep source metadata and source text references.
  • Design the model so that Qdrant/Tantivy/OpenSearch can be added later without rewriting the product model.
  • Permission checks must happen before retrieval context is assembled.

Suggested labels

  • feature
  • rag
  • ai
  • second-brain
  • architecture

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions