marcus · marcus · Mar 25, 2026
diff --git a/PRIVACY.md b/PRIVACY.md
@@ -0,0 +1,113 @@
+# Privacy & Data Handling
+
+This document describes what data nightshift collects, where it is stored,
+and what leaves your machine.
+
+## Local Storage
+
+All persistent data lives under XDG-compliant paths:
+
+| Data | Default path | Format | Retention |
+|------|-------------|--------|-----------|
+| Database | `~/.local/share/nightshift/nightshift.db` | SQLite (WAL mode) | Permanent |
+| Logs | `~/.local/share/nightshift/logs/nightshift-YYYY-MM-DD.log` | JSON or text | 7 days (configurable) |
+| Audit log | `~/.local/share/nightshift/audit/audit-YYYY-MM-DD.jsonl` | JSONL | Permanent (append-only, no automatic cleanup) |
+| Summaries | `~/.local/share/nightshift/summaries/summary-YYYY-MM-DD.md` | Markdown | Permanent |
+| Config | `~/.config/nightshift/config.yaml` | YAML | Permanent |
+
+The database directory is created with `0700` permissions (owner-only access).
+
+### What the database stores
+
+- Project paths and execution history
+- Task execution timestamps and assignments
+- Run history (start/end times, project, tasks, tokens used, status, errors, provider, branch)
+- Provider usage snapshots (token counts, daily/weekly usage, inferred budget)
+- Bus-factor analysis results
+
+### Provider data directories (read-only)
+
+Nightshift reads — but never writes to — these provider CLI data directories
+to track token usage locally:
+
+- `~/.claude` — session history and `stats-cache.json`
+- `~/.codex` — session JSONL files and rate-limit info
+- `~/.copilot` — nightshift maintains a local request counter at `~/.copilot/nightshift-usage.json`
+
+These paths are configurable via `providers.<name>.data_path` in config.
+
+## External Transmission
+
+Nightshift sends data externally **only** when you explicitly configure it.
+Nothing is sent by default.
+
+### AI provider CLIs
+
+When nightshift runs a task, it invokes provider CLIs as subprocesses:
+
+| Provider | Command | Data sent |
+|----------|---------|-----------|
+| Claude Code | `claude --print <prompt>` | Task prompt + selected file contents |
+| Codex | `codex exec <prompt>` | Task prompt + selected file contents |
+| Copilot | `gh copilot -- -p <prompt>` | Task prompt + selected file contents |
+
+Each invocation is isolated — no session state persists between calls, and
+no cross-project context is shared. The provider CLIs handle their own
+authentication and network communication; nightshift does not transmit API
+keys over the network itself.
+
+Dangerous permission flags (`--dangerously-skip-permissions`,
+`--dangerously-bypass-approvals-and-sandbox`, `--allow-all-tools`) default
+to **false** and require explicit opt-in.
+
+### Slack notifications (optional)
+
+When `reporting.slack_webhook` is configured, nightshift posts morning
+summaries containing: budget usage, completed task list, project counts,
+and failed/skipped task info.
+
+### Email notifications (optional)
+
+When SMTP environment variables are set (`NIGHTSHIFT_SMTP_HOST`, etc.),
+nightshift sends the same morning summary via email.
+
+### GitHub integration (optional)
+
+When enabled, nightshift uses the `gh` CLI to read issues (filtered by
+label) and post completion comments. It relies on `gh`'s existing
+authentication — nightshift does not handle GitHub tokens directly.
+
+## Credential Handling
+
+- **API keys** (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) are read from
+  environment variables only and are never written to disk.
+- **Config file credential protection**: nightshift actively scans config
+  files for credential patterns (`api_key:`, `secret:`, `sk-` prefixes)
+  and rejects them.
+- **Credential masking**: when credentials appear in log output, they are
+  masked to show only the first 3 and last 3 characters.
+- **SMTP credentials** (`NIGHTSHIFT_SMTP_USER`, `NIGHTSHIFT_SMTP_PASS`)
+  are read from environment variables only.
+- **Slack webhook URL** is stored in plaintext in config YAML — consider
+  using an environment variable for sensitive deployments.
+
+## Telemetry
+
+Nightshift includes **zero** telemetry, analytics, crash reporting, or
+phone-home functionality. All usage tracking is local-only, reading data
+from provider CLI directories on disk.
+
+## Deleting Your Data
+
+```bash
+# Remove all nightshift data
+rm -rf ~/.local/share/nightshift
+
+# Remove configuration
+rm -rf ~/.config/nightshift
+
+# Remove nightshift's copilot usage counter
+rm -f ~/.copilot/nightshift-usage.json
+```
+
+Per-project config (`nightshift.yaml`) lives in each project directory.
diff --git a/internal/tasks/tasks.go b/internal/tasks/tasks.go
@@ -531,10 +531,81 @@ Apply safe updates directly, and leave concise follow-ups for anything uncertain
 		DefaultInterval: 72 * time.Hour,
 	},
 	TaskPrivacyPolicy: {
-		Type:            TaskPrivacyPolicy,
-		Category:        CategoryAnalysis,
-		Name:            "Privacy Policy Consistency Checker",
-		Description:     "Check code against privacy policy claims",
+		Type:     TaskPrivacyPolicy,
+		Category: CategoryAnalysis,
+		Name:     "Privacy Policy Consistency Checker",
+		Description: `Cross-reference a project's privacy policy against its actual code behavior. ` +
+			`This task identifies inconsistencies between what a privacy policy claims and what the code actually does.` +
+			"\n\n" +
+			`STEP 1 — LOCATE THE PRIVACY POLICY` +
+			"\n" +
+			`Search the repository for privacy policy documents: PRIVACY.md, privacy-policy.md, ` +
+			`PRIVACY_POLICY.md, privacy.txt, docs/privacy*, website/*/privacy*. Also check README.md ` +
+			`for a privacy section. If no privacy policy is found, report a single finding of category ` +
+			`missing-policy with severity high and stop.` +
+			"\n\n" +
+			`STEP 2 — PARSE POLICY CLAIMS` +
+			"\n" +
+			`Extract each concrete claim from the privacy policy into a checklist. Claims typically cover: ` +
+			`what data is collected, where it is stored, what is transmitted externally, how credentials ` +
+			`are handled, data retention periods, third-party services, telemetry/analytics presence or ` +
+			`absence, and how to delete data.` +
+			"\n\n" +
+			`STEP 3 — INVENTORY ACTUAL CODE BEHAVIOR` +
+			"\n" +
+			`Scan the codebase for all data-handling code paths:` +
+			"\n" +
+			`- Local storage: database writes, file writes, log output, cache directories` +
+			"\n" +
+			`- External transmission: HTTP clients, webhook calls, SMTP/email sending, ` +
+			`CLI subprocess invocations that send data to external services, API calls` +
+			"\n" +
+			`- Credential handling: env var reads, config file parsing, secret storage, token management` +
+			"\n" +
+			`- Data retention: cleanup routines, TTL logic, log rotation, pruning jobs` +
+			"\n" +
+			`- Telemetry: analytics SDKs, usage tracking, crash reporters, phone-home calls` +
+			"\n" +
+			`- Third-party integrations: external service clients, SDK imports, webhook consumers` +
+			"\n\n" +
+			`STEP 4 — CROSS-REFERENCE AND REPORT` +
+			"\n" +
+			`Compare each policy claim against the code inventory. Flag every inconsistency.` +
+			"\n\n" +
+			`OUTPUT FORMAT — For each finding, report:` +
+			"\n" +
+			`- file: path relative to repo root (or "policy" if the issue is in the policy document)` +
+			"\n" +
+			`- line: line number(s) in code, or section heading in policy` +
+			"\n" +
+			`- category: one of [data-collection-undisclosed, data-transmission-undisclosed, ` +
+			`retention-mismatch, credential-handling-mismatch, third-party-undisclosed, ` +
+			`deletion-incomplete, telemetry-mismatch, missing-policy]` +
+			"\n" +
+			`- severity: critical / high / medium / low` +
+			"\n" +
+			`- claim: what the policy says (quote or paraphrase)` +
+			"\n" +
+			`- actual: what the code actually does` +
+			"\n" +
+			`- recommendation: specific fix (update policy, update code, or both)` +
+			"\n\n" +
+			`SEVERITY GUIDE:` +
+			"\n" +
+			`- critical: code sends data externally that the policy says is never sent, or policy ` +
+			`claims no telemetry but code includes analytics/tracking` +
+			"\n" +
+			`- high: missing policy entirely, undisclosed third-party data sharing, or credential ` +
+			`handling weaker than claimed` +
+			"\n" +
+			`- medium: retention periods differ from documented values, deletion instructions ` +
+			`incomplete, or storage locations not mentioned in policy` +
+			"\n" +
+			`- low: minor wording inaccuracies, optional features not clearly marked as optional, ` +
+			`or documented paths that differ from defaults` +
+			"\n\n" +
+			`Summarize total findings by category and severity at the end. If no inconsistencies ` +
+			`are found, confirm that the policy accurately reflects the code.`,
 		CostTier:        CostMedium,
 		RiskLevel:       RiskLow,
 		DefaultInterval: 72 * time.Hour,

diff --git a/website/docs/task-reference.md b/website/docs/task-reference.md
@@ -51,7 +51,7 @@ Completed analysis with conclusions. These tasks produce reports without modifyi
 | `cost-attribution` | Cost Attribution Estimator | Estimate resource costs by component | Medium | Low | 72h |
 | `security-footgun` | Security Foot-Gun Finder | Find common security anti-patterns | Medium | Low | 72h |
 | `pii-scanner` | PII Exposure Scanner | Scan for potential PII exposure | Medium | Low | 72h |
-| `privacy-policy` | Privacy Policy Consistency Checker | Check code against privacy policy claims | Medium | Low | 72h |
+| `privacy-policy` | Privacy Policy Consistency Checker | Cross-reference privacy policy claims against actual code behavior | Medium | Low | 72h |
 | `schema-evolution` | Schema Evolution Advisor | Analyze database schema changes | Medium | Low | 72h |
 | `event-taxonomy` | Event Taxonomy Normalizer | Normalize event naming and structure | Medium | Low | 72h |
 | `roadmap-entropy` | Roadmap Entropy Detector | Detect roadmap scope creep and drift | Medium | Low | 72h |