Complete the identity separation from upstream Highlight.io. The codebase still carries legacy namespaces in code-level identifiers that don't affect end users but will cause confusion as the project grows.
Status: DONE
npm org: holdfast-io (https://www.npmjs.com/org/holdfast-io)
All 17 workspace packages renamed from @highlight-run/* to @holdfast-io/*. 29 package.json files, 426+ source files, root scripts, and tsconfig references updated. highlight.run (core browser SDK) also renamed to @holdfast-io/browser. @highlight-run/react-mentions left as-is (external package).
NPM publish workflow is implemented and passing. Publishes in 4 dependency-ordered tiers. Dry-run mode supported. First publish to npm pending when ready to release publicly.
Status: DONE
All Go module paths have been renamed:
- Backend module:
github.com/highlight-run/highlight/backend→github.com/BrewingCoder/holdfast/src/backend - Go SDK module:
github.com/highlight/highlight/sdk/highlight-go→github.com/BrewingCoder/holdfast/sdk/highlight-go src/backend/go.mod,go.work, and all import paths across every Go file updated.go build ./...passes.
Status: Not started Priority: Medium
| Ecosystem | Current Name | Registry | Action |
|---|---|---|---|
| Go | github.com/BrewingCoder/holdfast/sdk/highlight-go |
pkg.go.dev | Module path updated — published at new path |
| Python | highlight-io |
PyPI | Claim holdfast on PyPI, update setup.py |
| Ruby | highlight_io |
RubyGems | Claim holdfast, update gemspec |
| Java | io.highlight |
Maven Central | Update group ID (heavyweight process) |
| .NET | Highlight.ASPCore |
NuGet | Claim new package name |
| PHP | highlight/php-sdk |
Packagist | Update composer.json |
| Rust | highlightio |
crates.io | Claim holdfast crate name |
| Elixir | highlight |
Hex.pm | Check availability |
Status: Not started Priority: Medium
Docker images are currently referenced as ghcr.io/highlight/highlight-*. Need to publish under HoldFast.
- Set up GitHub Container Registry under BrewingCoder (or future org)
- Update
infra/docker/compose.hobby.yml,infra/docker/compose.enterprise.yml - Update CI/CD workflows
- Update Dockerfiles with new labels
Status: Placeholders in place Priority: Low (until project has public presence)
- Register
holdfast.devor chosen domain - Set up
security@,support@addresses - Update SECURITY.md, CODE_OF_CONDUCT.md, SDK docs with real addresses
HoldFast handles sensitive telemetry — session replays, error traces, application logs, and performance data. For organizations in government, defense, healthcare, and finance, this data is often subject to strict compliance requirements. Security is not an afterthought; it's a core feature.
Status: Not started Priority: Critical
All stored data must be encrypted at rest:
- PostgreSQL — Enable Transparent Data Encryption (TDE) or use encrypted storage volumes. Document configuration for both managed (RDS, Cloud SQL) and self-managed Postgres.
- ClickHouse — Enable encrypted storage for analytics data. ClickHouse supports encrypted disks via
encrypteddisk type instorage_policies. - Redis — Enable encryption at rest where supported (Redis Enterprise, or encrypted volumes for OSS Redis).
- S3 / Object Storage — Enforce SSE-S3 or SSE-KMS for session replay recordings and exported data. Support customer-managed keys (CMK).
- Kafka — Enable encryption at rest for message log segments via encrypted volumes or managed Kafka encryption (MSK, Confluent).
Goal: Zero unencrypted data at rest across all storage layers. Configurable via environment variables with secure defaults.
Status: Not started Priority: High
Sensitive fields should be encrypted at the application layer before they reach the database, so that even database administrators cannot read them without the application key:
- PII fields — user emails, IP addresses, user identifiers, custom user attributes
- Session metadata — URLs visited, form inputs, network request/response bodies
- Error payloads — stack traces may contain environment variables, secrets, or internal paths
- API keys and tokens — workspace API keys, OAuth tokens, integration credentials
Approach:
- Implement envelope encryption with a configurable KMS provider (AWS KMS, GCP KMS, HashiCorp Vault, or local key file)
- AES-256-GCM for field encryption
- Key rotation support without re-encrypting all data
- Configurable per-workspace: operators choose which fields to encrypt
- Search over encrypted fields via blind index or deterministic encryption where needed
Status: Not started Priority: Critical
All network communication — external and internal — must use TLS 1.2 or higher. TLS 1.0 and 1.1 must be explicitly rejected.
- Ingress — Frontend, public GraphQL, private GraphQL, OTLP collector endpoints: enforce TLS 1.2+ with strong cipher suites
- Inter-service — Backend to PostgreSQL, ClickHouse, Redis, Kafka: enforce TLS for all internal connections
- SDK to collector — SDKs must default to HTTPS and validate certificates. Provide clear documentation for self-signed cert deployment.
- Certificate management — Document and support cert-manager (Kubernetes), Let's Encrypt (Docker), and manual certificate configuration
- Cipher suite policy — Disable weak ciphers (RC4, 3DES, export ciphers). Default to ECDHE key exchange with AES-GCM.
Goal: No plaintext connections between any HoldFast components or between clients and the platform. Configurable via env vars (TLS_MIN_VERSION, TLS_CERT_PATH, TLS_KEY_PATH, per-service connection strings).
Status: Not started (Firebase auth is legacy dead code) Priority: High
Replace the legacy Firebase authentication with a flexible, standards-based auth system:
- OIDC (OpenID Connect) — First-class support for bringing your own identity provider. Connect HoldFast to your organization's Okta, Azure AD, Google Workspace, Keycloak, or any OIDC-compliant IdP.
- SAML 2.0 — Support for enterprise SSO via SAML where OIDC is not available.
- Local password auth — Retain as fallback for small deployments without an IdP. Enforce bcrypt/argon2 hashing, configurable password policy.
- Single digital identity — Users authenticate once via their organization's IdP. No separate HoldFast credentials to manage, rotate, or leak.
Current state: The backend has an OAUTH_PROVIDER_URL env var and basic OAuth2 flow support. Firebase code is present but non-functional for self-hosted. The hobby deployment uses password auth. This needs to be cleaned up and expanded into a proper multi-provider auth system.
Status: Not started Priority: High
Support modern, phishing-resistant multi-factor authentication:
- WebAuthn / FIDO2 — Hardware security keys (YubiKey, etc.) and platform authenticators (Windows Hello, Touch ID, passkeys). This is the gold standard for phishing resistance.
- TOTP — Time-based one-time passwords (Google Authenticator, Authy) as a fallback for environments where hardware keys aren't feasible.
- MFA enforcement — Workspace administrators can require MFA for all users. Configurable policy:
optional,required,required-phishing-resistant. - IdP-delegated MFA — When using OIDC/SAML, MFA can be enforced at the IdP level rather than in HoldFast. HoldFast should respect and surface the
amr(authentication methods reference) claim.
Goal: Operators can enforce that all access to their observability data requires phishing-resistant authentication, meeting NIST AAL2/AAL3 requirements.
Status: Partial (project-level access exists but needs hardening) Priority: Medium
- RBAC — Role-based access control: admin, member, viewer roles with configurable permissions
- Project-level isolation — Users see only the projects they're assigned to (flag exists:
EnableProjectLevelAccess) - API key scoping — Ingestion keys vs. read keys vs. admin keys with distinct permissions
- Audit logging — Log all authentication events, permission changes, data access, and administrative actions to an immutable audit trail
- Session timeout — Configurable session duration and idle timeout
Status: Not started Priority: Medium
- CORS hardening — Configurable allowed origins, default to same-origin only
- Rate limiting — Configurable rate limits on all API endpoints to prevent abuse
- Content Security Policy — Strict CSP headers on the frontend
- HSTS — HTTP Strict Transport Security headers enabled by default
- Container security — Non-root container execution, read-only filesystems where possible, minimal base images
Replace [[ ]] with POSIX [ ] in infra/docker/configure-collector.sh. Five-minute fix.
cd src/backend && go mod tidy— remove clearbit-go, stripe-go, unused AWS SDK modulesmake private-gen— regenerate backend GraphQL after schema changesyarn install— update lockfile after LD SDK removalyarn codegen— regenerate frontend GraphQL typescd src/frontend && yarn types:check— verify no TypeScript errors
Evaluate what's in enterprise/ — if it's proprietary-licensed code from Highlight Inc., it should be removed from an AGPL-3.0 project. If it contains useful self-hosted features, consider relicensing or rewriting.
These are Highlight.io's documentation and blog content. Already deleted. Confirm no orphan references remain.
Audit and update outdated dependencies. Prioritize known CVEs.
The Firebase authentication integration is non-functional for self-hosted. Clean out Firebase SDK references, config objects, and auth flows. Replace with the new OIDC/local auth system from Phase 2.
The AI features ("Harold") currently use GPT-3.5-turbo exclusively. Replace with Claude or make provider-configurable.
Key files:
/src/backend/private-graph/graph/resolver.go— session insights/src/backend/private-graph/graph/schema.resolvers.go— error suggestions/src/backend/openai_client/— OpenAI client wrapper/src/backend/prompts/— prompt templates/packages/ai/— AI insights Lambda
Approach:
- Add
ANTHROPIC_API_KEYenv var - Create provider abstraction (OpenAI, Anthropic, or user-configurable)
- Upgrade from GPT-3.5 to modern models
- Add user-facing API key configuration in workspace settings (bring your own key)
Currently uses OpenAI text-embedding-3-small + optional HuggingFace gte-large. Consider:
- Voyage AI (Anthropic partner)
- Keep HuggingFace (already self-hostable)
- Make configurable
Create official Helm charts for Kubernetes deployment with security defaults baked in (TLS, network policies, pod security standards).
Evaluate whether the 7+ service architecture can be consolidated for smaller deployments.
Ensure all containers build and run on ARM64 (important for self-hosted on Apple Silicon, Graviton, etc.).
Document and script backup/restore for PostgreSQL, ClickHouse, and Redis state. Include encrypted backup support.
Provide deployment guides mapped to common compliance frameworks:
- FedRAMP — control mapping for federal deployments
- HIPAA — PHI handling guidance for healthcare
- SOC 2 — control evidence for audit readiness
- NIST 800-53 — security control alignment
These are not certifications (HoldFast is a tool, not a service) but guidance for operators deploying HoldFast within their compliance boundary.
Document every module in the codebase, starting from the deepest layer (storage, data models, core libraries) and working upward through the stack. Documentation serves two audiences: human contributors reading a wiki, and agentic AI contributors that need structured context to work effectively.
Work bottom-up through the dependency graph:
- Storage layer — ClickHouse schema, PostgreSQL models (GORM), Redis cache patterns, Kafka topics and message formats, S3/object storage
- Data access —
store/package,clickhouse/query layer,model/structs and migrations - Core libraries —
parser/,queryparser/,errorgroups/,stacktraces/,embeddings/,otel/extraction - GraphQL APIs —
public-graph/(ingestion) andprivate-graph/(dashboard) schemas, resolvers, middleware - Workers — Kafka consumer handlers, scheduled tasks, async processing pipeline
- Alert system —
alerts/,alerts/v2/, integration destinations (Slack, Discord, Teams, webhooks, issue trackers) - Frontend — React component tree, page routing, Apollo Client state, search/filter UI
- SDKs — Per-SDK architecture, public API surface, configuration options, data flow to collector
Each module gets a MODULE.md file in its directory containing:
- Purpose — what this module does, in one paragraph
- Dependencies — what it imports, what imports it
- Key types/interfaces — the public API surface with brief descriptions
- Data flow — how data enters and leaves this module
- Configuration — environment variables and config options
- Gotchas — non-obvious behavior, known issues, historical context
- Testing — how to test this module, what fixtures exist
These files serve as context anchors for both human readers and AI agents. An agent dropping into store/MODULE.md should have enough context to make changes without reading every file in the package.
Not started. This is a significant effort but pays compound interest — every module documented makes future contributions faster for both humans and agents.
The inherited test suite is thin — 72 frontend tests and 98 backend tests for a 200K+ line codebase. Most backend tests require a full infrastructure stack (Postgres, ClickHouse, Redis, Kafka) to run. This is tech debt we can't see until something breaks in production.
Same bottom-up approach as module documentation — start at the foundation and work up. Document and test together: you can't write good tests for a module you don't understand, and you can't document a module properly without testing its edge cases.
Follow the same dependency order as Phase 6. For each module:
- Audit — what tests exist, what's covered, what's not
- Unit tests first — pure logic that doesn't need infrastructure (parsers, validators, transformers, serializers)
- Integration tests second — tests that need a database, using Docker Compose test fixtures
- Contract tests for APIs — GraphQL schema compliance, SDK-to-backend contract validation
- Ratchet, don't mandate — set coverage thresholds at current levels, only allow them to go up. No "achieve 80% by Friday" mandates.
| Module | Current Coverage | Risk | Notes |
|---|---|---|---|
parser/, queryparser/ |
87-100% | Low | Already well-tested. Maintain. |
public-graph/ resolvers |
~0% (needs infra) | Critical | Data ingestion path. Bugs here = data loss. |
private-graph/ resolvers |
~0% (needs infra) | High | Dashboard API. Bugs here = broken UI. |
worker/ handlers |
~0% (needs infra) | High | Async processing. Silent failures. |
clickhouse/ queries |
~0% (needs infra) | High | Analytics queries. Wrong results = misleading dashboards. |
store/ data access |
~0% (needs infra) | High | Core CRUD. Every feature depends on this. |
model/ |
~0% (needs infra) | Medium | GORM models. Migrations are the real risk. |
alerts/ |
~3% | Medium | Alert delivery. False negatives = missed incidents. |
errorgroups/ |
~14% | Medium | Error grouping logic. Bad groups = noise. |
stacktraces/ |
~0% (needs AWS) | Medium | Stack trace enhancement. Needs S3 mock. |
| Frontend components | ~5% | Medium | UI logic. Search, filters, graphing. |
| SDKs | ~10% | Medium | Client-facing. Bugs here = broken customer apps. |
The biggest blocker is that most backend tests need Postgres + ClickHouse + Redis + Kafka. To make testing practical:
- Create a
docker-compose.test.ymlwith lightweight test instances - Add a
make test-with-infratarget that spins up containers, runs tests, tears down - Set up test database seeding scripts
- Add ClickHouse test fixtures
- Mock S3 with MinIO for stack trace tests
- CI workflow that runs integration tests against Docker services
- Backend:
go test -coverprofile(already in Makefile) - Frontend: Vitest with v8 coverage provider (configured)
- CI: Coverage artifacts uploaded, thresholds enforced via quality gates
- Add coverage badge to README (via Codecov, SonarQube, or custom shield)
Not started. Pairs naturally with Phase 6 (documentation) — do both per-module as you go.
These are first-come-first-served and should be grabbed regardless of timeline:
- npmjs.com —
holdfast-ioorg created - GitHub — repo created at
BrewingCoder/holdfast - GitHub Actions — CI (backend, frontend, SDK, security) + NPM publish workflows
- Self-hosted runner — dedicated Ubuntu 24.04 VM
- PyPI — register
holdfastpackage name - crates.io — register
holdfastcrate name - Domain — register
holdfast.devor preferred domain