Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@
*.log
node_modules/
__pycache__/
.desloppify/
7 changes: 7 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
repos:
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.44.0
hooks:
- id: markdownlint
args: [--disable, MD013, MD033, MD041, --]
exclude: ^\.claude/
61 changes: 58 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ No build step -- this is a pure documentation repo. Browse by topic:
- **Vector database (Weaviate):** `Knowledge/README.md`
- **Prediction models:** `Predictions/README.md`
- **Language models (8B):** `Generation/README.md`
- **Agent infrastructure:** `Infrastructure/README.md`
- **n8n workflow automations:** `Automation/README.md`

## Architecture
Expand All @@ -18,7 +17,6 @@ No build step -- this is a pure documentation repo. Browse by topic:
Knowledge/ Weaviate vector-graph database docs (connection, search, RAG, schema)
Predictions/ HuggingFace text regression models (performance + preference prediction)
Generation/ 8B language models (Llama 3.1 base, continual pre-training + instruct)
Infrastructure/ Clean-room agent runtime, orchestration, and tool-safety patterns
Automation/ n8n workflow templates for advocacy automation
.github/ Dependabot config + CI workflows
```
Expand All @@ -30,7 +28,6 @@ Automation/ n8n workflow templates for advocacy automation
| `Knowledge/README.md` | Weaviate connection details, search ops, RAG patterns, Content schema |
| `Predictions/README.md` | Prediction model usage, batch processing, score clipping |
| `Generation/README.md` | 8B model usage, generation parameters, known limitations |
| `Infrastructure/README.md` | Clean-room agent runtime roadmap across scanner, platform, and tooling repos |
| `Automation/README.md` | n8n hosting options, workflow import, activation |
| `.gitleaksignore` | Secret scanning exclusions (read-only API keys in docs) |

Expand Down Expand Up @@ -58,3 +55,61 @@ Automation/ n8n workflow templates for advocacy automation
- **Adding documentation:** Create a new directory with a `README.md` following the existing pattern
- **Code examples:** Python with `transformers` or `weaviate` client libraries -- keep examples copy-pasteable
- **Style:** Each section should be self-contained with connection details, code samples, and best practices

## Organizational Context

**Layer:** 1 | **Lever:** Strengthen | **Integration:** Reference material for platform and ecosystem

This repo documents the AI infrastructure layer that Open Paws tools are built on. It is a reference for Guild developers, bootcamp students, and coalition partners building on the platform.

**Settled decisions affecting this repo:**
- **2026-04-01: Clean-room agent architecture** — `documentation` owns the shared infrastructure note for the clean-room reuse decision (PR #7 in flight). See `closed-decisions.md` 2026-04-01.

**Relevant strategy documents:**
- `ecosystem/repos.md` — documentation listed as reference material
- `programs/developer-training-pipeline/guild/operations.md` — Guild developers use this reference

**Current status:** Active reference. PR #7 (shared infrastructure note for clean-room architecture) is in draft PR review as of 2026-04-01.

## Development Standards

### 10-Point Review Checklist (ranked by AI violation frequency)

1. **DRY** — AI clones code at 4x the human rate. Search before writing anything new
2. **Deep modules** — Reject shallow wrappers and pass-through methods
3. **Single responsibility** — Each function does one thing at one level of abstraction
4. **Error handling** — Never catch-all
5. **Information hiding** — Don't expose internal state. Mask API keys (last 4 chars only)
6. **Ubiquitous language** — Use movement terminology consistently
7. **Design for change** — Abstraction layers and loose coupling
8. **Legacy velocity** — Use characterization tests before modifying existing code
9. **Over-patterning** — Simplest structure that works
10. **Test quality** — Every test must fail when the covered behavior breaks

### Quality Gates

- **Desloppify:** `desloppify scan --path .` — minimum score ≥85
- **Speciesist language:** `semgrep --config semgrep-no-animal-violence.yaml` on all docs edits
- **Two-failure rule:** After two failed fixes on the same problem, stop and restart

### Seven Concerns — Critical for This Repo

All 7 concerns apply. Highlighted critical ones:

- **Privacy** (critical) — API keys documented here are read-only Weaviate keys. `.gitleaksignore` covers these. Never commit write-access keys. Check `.gitleaksignore` is current before adding new examples.
- **Security** — Code examples must use environment variables for any keys. Never hardcode credentials.
- **Advocacy domain** — All documentation must use movement terminology. Examples should reference **farmed animals** and **factory farms**, not industry euphemisms.
- **Accessibility** — Documentation must work for developers on low-bandwidth connections. Avoid large embedded images.
- **Emotional safety** — If documentation examples include advocacy content (animal welfare data, investigation statistics), apply content warnings.

### Advocacy Domain Language

Never introduce synonyms for:
- **Farmed animal** — not "livestock" in code examples or documentation
- **Factory farm** — not "farm" or "production facility"
- **Campaign** — organized advocacy effort
- **Investigation** — covert documentation (all data is potential evidence)

### Structured Coding Reference

For tool-specific AI coding instructions (Claude Code rules, Cursor MDC, Copilot, Windsurf, etc.), copy the corresponding directory from `structured-coding-with-ai` into this project root.
72 changes: 72 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Contributing to Open Paws Documentation

This repo contains developer documentation, examples, and guides for building AI implementations that support animal advocacy using Open Paws models, data, and tools. It is a pure docs repo — no build step, no runnable application code.

## What belongs here

- **Reference documentation** for Open Paws infrastructure (vector database, prediction models, language models, workflow automations)
- **Copy-pasteable code examples** showing how to connect to and use Open Paws services
- **Integration guides** for common patterns (RAG, batch inference, n8n automations)
- **HuggingFace dataset documentation** for fine-tuning workflows

What does not belong here: application code, deployment configuration, runnable services, or content that duplicates the canonical source in another repo. If you are documenting something that lives in another Open Paws repo, link to it rather than copying it.

## Directory structure

```
Knowledge/ Weaviate vector-graph database (connection, search, RAG, schema)
Predictions/ HuggingFace text regression models (performance + preference prediction)
Generation/ 8B language models (Llama 3.1 base, continual pre-training + instruct)
Automation/ n8n workflow templates for advocacy automation
Infrastructure/ Clean-room agent runtime, orchestration, and tool-safety patterns
```

Each section has a `README.md` as its entry point. Add new topics as a new directory with a `README.md`, or as additional markdown files within an existing directory if they are closely related to what is already there.

## File naming

- Use lowercase with hyphens: `batch-inference.md`, not `BatchInference.md` or `batch_inference.md`
- Entry point for every directory is `README.md`
- Keep filenames descriptive but short — the directory provides context

## Writing style

Write in plain, direct prose. First-person plural ("we") is appropriate when describing Open Paws practices. Avoid passive constructions that obscure who does what.

**Be self-contained.** Each section should include enough context to be useful without requiring the reader to cross-reference other sections. Include connection details, working code samples, and known limitations in the same document.

**Keep examples copy-pasteable.** Code samples should run with minimal modification — typically just substituting API keys from environment variables.

**Use movement terminology.** This documentation exists to help developers build tools for animal liberation. Use precise advocacy language:

- **Farmed animal** — not "livestock" (industry framing)
- **Factory farm** — not "farm" or "production facility"
- **Campaign** — an organized advocacy effort with defined goals
- **Investigation** — covert documentation (all data is potential evidence)
- **Sanctuary** — permanent care facility, not "shelter"

The full language guide and automated enforcement tools are at [github.com/Open-Paws/no-animal-violence](https://github.com/Open-Paws/no-animal-violence). The pre-commit hook runs on every commit — if it flags something, fix it rather than bypassing it.

**No speciesist idioms.** Avoid idioms that normalize animal violence. Common ones to watch for: "kill two birds with one stone" (use "accomplish two things at once"), "guinea pig" as a test-subject metaphor (use "test subject"), "cattle vs. pets" distinctions (use "ephemeral vs. persistent").

## Referencing Open Paws tools and APIs

When documenting an API or service:

1. State the service name and its role in one sentence
2. Show how to authenticate using environment variables (never hardcode credentials, even read-only ones)
3. Provide a minimal working example before moving to advanced usage
4. List known limitations and edge cases explicitly

Read-only Weaviate keys used in examples are covered by `.gitleaksignore`. Do not add write-access credentials to examples under any circumstances.

## PR guidelines

- **One topic per PR.** Documentation PRs should be easy to review. If you are adding a new section and fixing an existing one, open two PRs.
- **Keep PRs under 200 lines** where possible. Longer PRs are harder to review and more likely to introduce terminology drift.
- **Run the language check** before opening a PR: `semgrep --config semgrep-no-animal-violence.yaml` on any `.md` files you changed. The CI action will catch violations, but it is faster to fix them locally.
- **Link the upstream source** if your documentation describes a feature defined in another repo. Docs that duplicate source-of-truth content in another repo will drift — link instead.

## Quality gate

This repo runs [desloppify](https://github.com/Open-Paws/desloppify) with a minimum score of 85. If your PR drops the score below that threshold, the CI check will fail. Run `desloppify scan --path .` locally before pushing to catch issues early.
Loading