Goal
Implement a proper RAG / second-brain layer for RustShare so that notes, Obsidian vaults, Markdown files, attachments, decisions, standup records, kanban cards, and workspace documents become queryable, cited, permission-aware memory.
This should not be implemented as a simple “chatbot over files”. The goal is to turn RustShare into a durable personal/company memory layer where every useful artifact can become a structured memory object.
Background
RustShare already collects Obsidian and other user documents. This creates the foundation for a second-brain system. The missing layer is a reliable indexing, retrieval, permission-filtering, and answer-generation pipeline.
The RAG layer should support use cases such as:
- Ask questions across the whole workspace
- Ask questions inside a selected folder
- Ask questions about the current note
- Find related notes, files, decisions, and kanban cards
- Summarize project memory
- Explain decision history
- Generate ADRs or implementation notes from existing discussions and documents
- Detect outdated, conflicting, or duplicate knowledge
Product direction
RustShare should treat information as structured memory objects first, not only as vector chunks.
Example memory sources:
- Markdown notes
- Obsidian vault files
- Obsidian frontmatter, tags, folders, and backlinks
- PDFs and text-based attachments
- Decisions / ADRs
- Standup records
- Kanban cards
- Chat or discussion records, if available later
- File versions and update history
Each object should preserve metadata such as:
object_id
workspace_id
owner_id
source_type
source_path
file_name
document_title
heading_path
tags
backlinks
created_at
updated_at
permissions
importance
object_status
Important note: the file/note name must be treated separately from the first Markdown H1 heading. The file identity and document content title are both useful metadata, but they must not be coupled.
Proposed architecture
RustShare files / notes / Obsidian / kanban / decisions
↓
Parser + metadata extractor
↓
Canonical memory objects
↓
Chunking + embeddings + keyword index
↓
Hybrid retrieval + permission filters + reranking
↓
LLM answer with citations and source links
Recommended initial stack:
- PostgreSQL for metadata, objects, permissions, and indexing state
- pgvector for embeddings in the first implementation
- PostgreSQL full-text search for keyword search in the first implementation
- Optional later: Qdrant, Tantivy, Meilisearch, OpenSearch, or Vespa if scale requires it
- Background indexing workers
- Provider abstraction for embeddings and LLM calls
Core requirements
1. Canonical memory object model
Create a RustShare memory object model that can represent different source types in a common format.
Supported initial source types:
markdown_note
obsidian_note
attachment_text
pdf_document
decision
standup_record
kanban_card
Each memory object should be linked back to the original RustShare artifact and workspace.
2. Chunking and embedding pipeline
Implement an indexing pipeline that:
- Detects file/object changes
- Parses content
- Extracts metadata
- Chunks content according to source type
- Generates embeddings
- Stores chunks in a vector index
- Stores keyword-searchable text
- Marks indexing state and errors
Markdown / Obsidian chunking should respect heading structure:
# Main title
## Section
### Subsection
Each chunk should preserve:
- Source object ID
- Workspace ID
- Folder path
- Heading path
- Tags
- Line or section position where available
- Page number for PDFs where available
- Last updated timestamp
- Permission scope
3. Hybrid retrieval
Retrieval must not rely only on vector search.
Implement hybrid retrieval using:
- Vector search
- Keyword / full-text search
- Metadata filters
- Workspace filters
- Folder filters
- Source-type filters
- Permission filters
- Optional reranking
Retrieval flow:
User query
↓
Normalize / optionally rewrite query
↓
Apply permission and workspace filters
↓
Vector search + keyword search
↓
Merge and rank results
↓
Optional rerank
↓
Build context package
↓
Generate answer with citations
4. Permission-aware retrieval
Permissions must be enforced before generation, not after.
The model must never receive context that the current user is not allowed to access.
Filtering should include:
- Workspace membership
- Object visibility
- Owner permissions
- Group permissions
- Shared-link restrictions if applicable later
- Archived/deleted object exclusion unless explicitly requested
5. RAG UX features
Implement the following first-level UX features:
Ask Workspace
Ask Folder
Ask Current Note
Find Related Notes
Find Related Decisions
Summarize Current Folder
The answer should always include source references.
Example answer structure:
Answer
Sources
- /Obsidian/Projects/RustShare/RAG.md
- /Decisions/ADR-004-obsidian-sync.md
- Kanban card: Implement vault sync
Related items
- ...
6. Obsidian-aware indexing
For Obsidian-synced content, preserve:
- Folder structure
- Markdown content
- YAML frontmatter
- Tags
- Wiki links
[[...]]
- Backlinks
- Attachments where possible
RustShare should not frame this as replacing Obsidian. It should be positioned as compatible memory infrastructure for Markdown/Obsidian-style vault content.
7. Source citations and traceability
Every generated answer must show where the information came from.
Source references should include where available:
- File name
- Path
- Heading
- Page number
- Kanban card title
- Decision title
- Standup record date
- Last updated date
Answers without sources should be clearly marked as low-confidence or unsupported.
Suggested phases
Phase 1 — Searchable memory MVP
- PostgreSQL + pgvector schema
- Memory object table
- Memory chunk table
- Markdown parser
- Obsidian-compatible Markdown ingestion
- Basic PDF/text extraction
- Basic embedding generation
- Workspace/folder/note scoped RAG queries
- Source citations
- Permission filtering before retrieval
Phase 2 — RustShare-native memory
- Obsidian frontmatter support
- Obsidian tags and backlinks
- Kanban card indexing
- Decision/ADR indexing
- Standup record indexing
- Related notes panel
- Related decisions panel
Phase 3 — Retrieval quality
- Hybrid retrieval
- Reranking
- Query rewriting
- Time-aware retrieval
- Importance weighting
- Pinned-source boosting
- Deduplication of near-identical chunks
Phase 4 — Second-brain intelligence
- Project memory summaries
- Decision timelines
- Auto-generated ADR drafts
- Conflict detection
- Outdated-note detection
- Duplicate-note detection
- Weekly or project-level memory digest
Phase 5 — Company RAG readiness
- Workspace-level ACL hardening
- Group-aware retrieval
- Audit logs for AI access
- Admin controls
- Data retention options
- Per-workspace model and embedding settings
Acceptance criteria
Non-goals for the first implementation
- Full autonomous agent behavior
- Automatic file modification by the AI
- Cross-workspace memory without explicit permission
- Complex graph database implementation
- Full enterprise compliance layer
- Perfect parsing for every file format
- Replacing Obsidian or using misleading Obsidian branding
Implementation notes
Start simple and reliable:
- Prefer PostgreSQL + pgvector for the first implementation.
- Keep original source files as the source of truth.
- Make the index rebuildable.
- Do not store only embeddings; always keep source metadata and source text references.
- Design the model so that Qdrant/Tantivy/OpenSearch can be added later without rewriting the product model.
- Permission checks must happen before retrieval context is assembled.
Suggested labels
feature
rag
ai
second-brain
architecture
Goal
Implement a proper RAG / second-brain layer for RustShare so that notes, Obsidian vaults, Markdown files, attachments, decisions, standup records, kanban cards, and workspace documents become queryable, cited, permission-aware memory.
This should not be implemented as a simple “chatbot over files”. The goal is to turn RustShare into a durable personal/company memory layer where every useful artifact can become a structured memory object.
Background
RustShare already collects Obsidian and other user documents. This creates the foundation for a second-brain system. The missing layer is a reliable indexing, retrieval, permission-filtering, and answer-generation pipeline.
The RAG layer should support use cases such as:
Product direction
RustShare should treat information as structured memory objects first, not only as vector chunks.
Example memory sources:
Each object should preserve metadata such as:
object_idworkspace_idowner_idsource_typesource_pathfile_namedocument_titleheading_pathtagsbacklinkscreated_atupdated_atpermissionsimportanceobject_statusImportant note: the file/note name must be treated separately from the first Markdown H1 heading. The file identity and document content title are both useful metadata, but they must not be coupled.
Proposed architecture
Recommended initial stack:
Core requirements
1. Canonical memory object model
Create a RustShare memory object model that can represent different source types in a common format.
Supported initial source types:
markdown_noteobsidian_noteattachment_textpdf_documentdecisionstandup_recordkanban_cardEach memory object should be linked back to the original RustShare artifact and workspace.
2. Chunking and embedding pipeline
Implement an indexing pipeline that:
Markdown / Obsidian chunking should respect heading structure:
Each chunk should preserve:
3. Hybrid retrieval
Retrieval must not rely only on vector search.
Implement hybrid retrieval using:
Retrieval flow:
4. Permission-aware retrieval
Permissions must be enforced before generation, not after.
The model must never receive context that the current user is not allowed to access.
Filtering should include:
5. RAG UX features
Implement the following first-level UX features:
Ask WorkspaceAsk FolderAsk Current NoteFind Related NotesFind Related DecisionsSummarize Current FolderThe answer should always include source references.
Example answer structure:
6. Obsidian-aware indexing
For Obsidian-synced content, preserve:
[[...]]RustShare should not frame this as replacing Obsidian. It should be positioned as compatible memory infrastructure for Markdown/Obsidian-style vault content.
7. Source citations and traceability
Every generated answer must show where the information came from.
Source references should include where available:
Answers without sources should be clearly marked as low-confidence or unsupported.
Suggested phases
Phase 1 — Searchable memory MVP
Phase 2 — RustShare-native memory
Phase 3 — Retrieval quality
Phase 4 — Second-brain intelligence
Phase 5 — Company RAG readiness
Acceptance criteria
Non-goals for the first implementation
Implementation notes
Start simple and reliable:
Suggested labels
featureragaisecond-brainarchitecture