Implement permission-aware RAG and second-brain memory layer

## Goal

Implement a proper RAG / second-brain layer for RustShare so that notes, Obsidian vaults, Markdown files, attachments, decisions, standup records, kanban cards, and workspace documents become queryable, cited, permission-aware memory.

This should not be implemented as a simple “chatbot over files”. The goal is to turn RustShare into a durable personal/company memory layer where every useful artifact can become a structured memory object.

## Background

RustShare already collects Obsidian and other user documents. This creates the foundation for a second-brain system. The missing layer is a reliable indexing, retrieval, permission-filtering, and answer-generation pipeline.

The RAG layer should support use cases such as:

- Ask questions across the whole workspace
- Ask questions inside a selected folder
- Ask questions about the current note
- Find related notes, files, decisions, and kanban cards
- Summarize project memory
- Explain decision history
- Generate ADRs or implementation notes from existing discussions and documents
- Detect outdated, conflicting, or duplicate knowledge

## Product direction

RustShare should treat information as structured memory objects first, not only as vector chunks.

Example memory sources:

- Markdown notes
- Obsidian vault files
- Obsidian frontmatter, tags, folders, and backlinks
- PDFs and text-based attachments
- Decisions / ADRs
- Standup records
- Kanban cards
- Chat or discussion records, if available later
- File versions and update history

Each object should preserve metadata such as:

- `object_id`
- `workspace_id`
- `owner_id`
- `source_type`
- `source_path`
- `file_name`
- `document_title`
- `heading_path`
- `tags`
- `backlinks`
- `created_at`
- `updated_at`
- `permissions`
- `importance`
- `object_status`

Important note: the file/note name must be treated separately from the first Markdown H1 heading. The file identity and document content title are both useful metadata, but they must not be coupled.

## Proposed architecture

```text
RustShare files / notes / Obsidian / kanban / decisions
        ↓
Parser + metadata extractor
        ↓
Canonical memory objects
        ↓
Chunking + embeddings + keyword index
        ↓
Hybrid retrieval + permission filters + reranking
        ↓
LLM answer with citations and source links
```

Recommended initial stack:

- PostgreSQL for metadata, objects, permissions, and indexing state
- pgvector for embeddings in the first implementation
- PostgreSQL full-text search for keyword search in the first implementation
- Optional later: Qdrant, Tantivy, Meilisearch, OpenSearch, or Vespa if scale requires it
- Background indexing workers
- Provider abstraction for embeddings and LLM calls

## Core requirements

### 1. Canonical memory object model

Create a RustShare memory object model that can represent different source types in a common format.

Supported initial source types:

- `markdown_note`
- `obsidian_note`
- `attachment_text`
- `pdf_document`
- `decision`
- `standup_record`
- `kanban_card`

Each memory object should be linked back to the original RustShare artifact and workspace.

### 2. Chunking and embedding pipeline

Implement an indexing pipeline that:

- Detects file/object changes
- Parses content
- Extracts metadata
- Chunks content according to source type
- Generates embeddings
- Stores chunks in a vector index
- Stores keyword-searchable text
- Marks indexing state and errors

Markdown / Obsidian chunking should respect heading structure:

```text
# Main title
## Section
### Subsection
```

Each chunk should preserve:

- Source object ID
- Workspace ID
- Folder path
- Heading path
- Tags
- Line or section position where available
- Page number for PDFs where available
- Last updated timestamp
- Permission scope

### 3. Hybrid retrieval

Retrieval must not rely only on vector search.

Implement hybrid retrieval using:

- Vector search
- Keyword / full-text search
- Metadata filters
- Workspace filters
- Folder filters
- Source-type filters
- Permission filters
- Optional reranking

Retrieval flow:

```text
User query
  ↓
Normalize / optionally rewrite query
  ↓
Apply permission and workspace filters
  ↓
Vector search + keyword search
  ↓
Merge and rank results
  ↓
Optional rerank
  ↓
Build context package
  ↓
Generate answer with citations
```

### 4. Permission-aware retrieval

Permissions must be enforced before generation, not after.

The model must never receive context that the current user is not allowed to access.

Filtering should include:

- Workspace membership
- Object visibility
- Owner permissions
- Group permissions
- Shared-link restrictions if applicable later
- Archived/deleted object exclusion unless explicitly requested

### 5. RAG UX features

Implement the following first-level UX features:

- `Ask Workspace`
- `Ask Folder`
- `Ask Current Note`
- `Find Related Notes`
- `Find Related Decisions`
- `Summarize Current Folder`

The answer should always include source references.

Example answer structure:

```text
Answer

Sources
- /Obsidian/Projects/RustShare/RAG.md
- /Decisions/ADR-004-obsidian-sync.md
- Kanban card: Implement vault sync

Related items
- ...
```

### 6. Obsidian-aware indexing

For Obsidian-synced content, preserve:

- Folder structure
- Markdown content
- YAML frontmatter
- Tags
- Wiki links `[[...]]`
- Backlinks
- Attachments where possible

RustShare should not frame this as replacing Obsidian. It should be positioned as compatible memory infrastructure for Markdown/Obsidian-style vault content.

### 7. Source citations and traceability

Every generated answer must show where the information came from.

Source references should include where available:

- File name
- Path
- Heading
- Page number
- Kanban card title
- Decision title
- Standup record date
- Last updated date

Answers without sources should be clearly marked as low-confidence or unsupported.

## Suggested phases

### Phase 1 — Searchable memory MVP

- PostgreSQL + pgvector schema
- Memory object table
- Memory chunk table
- Markdown parser
- Obsidian-compatible Markdown ingestion
- Basic PDF/text extraction
- Basic embedding generation
- Workspace/folder/note scoped RAG queries
- Source citations
- Permission filtering before retrieval

### Phase 2 — RustShare-native memory

- Obsidian frontmatter support
- Obsidian tags and backlinks
- Kanban card indexing
- Decision/ADR indexing
- Standup record indexing
- Related notes panel
- Related decisions panel

### Phase 3 — Retrieval quality

- Hybrid retrieval
- Reranking
- Query rewriting
- Time-aware retrieval
- Importance weighting
- Pinned-source boosting
- Deduplication of near-identical chunks

### Phase 4 — Second-brain intelligence

- Project memory summaries
- Decision timelines
- Auto-generated ADR drafts
- Conflict detection
- Outdated-note detection
- Duplicate-note detection
- Weekly or project-level memory digest

### Phase 5 — Company RAG readiness

- Workspace-level ACL hardening
- Group-aware retrieval
- Audit logs for AI access
- Admin controls
- Data retention options
- Per-workspace model and embedding settings

## Acceptance criteria

- [ ] RustShare can index Markdown notes from a workspace.
- [ ] RustShare can index Obsidian-style Markdown files while preserving path, tags, frontmatter, and links where possible.
- [ ] RustShare stores canonical memory objects separately from chunks.
- [ ] RustShare stores embeddings for chunks.
- [ ] RustShare supports full-text or keyword search in addition to vector search.
- [ ] User can ask a question across a workspace.
- [ ] User can ask a question scoped to a folder.
- [ ] User can ask a question scoped to the currently open note.
- [ ] Answers include citations/source references.
- [ ] Retrieval respects workspace and object permissions before any context is sent to the LLM.
- [ ] Updating a note/file triggers re-indexing.
- [ ] Deleted or archived objects are removed from active retrieval.
- [ ] Related notes can be shown for the current note.
- [ ] The file/note name and first Markdown H1 are treated as separate metadata fields.

## Non-goals for the first implementation

- Full autonomous agent behavior
- Automatic file modification by the AI
- Cross-workspace memory without explicit permission
- Complex graph database implementation
- Full enterprise compliance layer
- Perfect parsing for every file format
- Replacing Obsidian or using misleading Obsidian branding

## Implementation notes

Start simple and reliable:

- Prefer PostgreSQL + pgvector for the first implementation.
- Keep original source files as the source of truth.
- Make the index rebuildable.
- Do not store only embeddings; always keep source metadata and source text references.
- Design the model so that Qdrant/Tantivy/OpenSearch can be added later without rewriting the product model.
- Permission checks must happen before retrieval context is assembled.

## Suggested labels

- `feature`
- `rag`
- `ai`
- `second-brain`
- `architecture`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement permission-aware RAG and second-brain memory layer #119

Goal

Background

Product direction

Proposed architecture

Core requirements

1. Canonical memory object model

2. Chunking and embedding pipeline

3. Hybrid retrieval

4. Permission-aware retrieval

5. RAG UX features

6. Obsidian-aware indexing

7. Source citations and traceability

Suggested phases

Phase 1 — Searchable memory MVP

Phase 2 — RustShare-native memory

Phase 3 — Retrieval quality

Phase 4 — Second-brain intelligence

Phase 5 — Company RAG readiness

Acceptance criteria

Non-goals for the first implementation

Implementation notes

Suggested labels

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Implement permission-aware RAG and second-brain memory layer #119

Description

Goal

Background

Product direction

Proposed architecture

Core requirements

1. Canonical memory object model

2. Chunking and embedding pipeline

3. Hybrid retrieval

4. Permission-aware retrieval

5. RAG UX features

6. Obsidian-aware indexing

7. Source citations and traceability

Suggested phases

Phase 1 — Searchable memory MVP

Phase 2 — RustShare-native memory

Phase 3 — Retrieval quality

Phase 4 — Second-brain intelligence

Phase 5 — Company RAG readiness

Acceptance criteria

Non-goals for the first implementation

Implementation notes

Suggested labels

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions