diff --git a/docs/proposals/contractor-pr-triage.md b/docs/proposals/contractor-pr-triage.md new file mode 100644 index 0000000..fb24762 --- /dev/null +++ b/docs/proposals/contractor-pr-triage.md @@ -0,0 +1,120 @@ +# Contractor PR Triage: PRs #813 and #817 + +**Author:** mryoho (contractor) +**Date reviewed:** 2026-02-20 +**Status:** Recommend closing both PRs + +## Background + +A contractor (mryoho) submitted two PRs in late January 2026 attempting to centralize schema parsing and dynamic model generation from an A&M proof-of-concept codebase. The work was unfinished when the contractor's engagement ended. Since then, the team has independently built equivalent functionality via `openapi_schema_parser`, `semantic_search_service`, `schema_state_manager`, and `lif_schema_config`. + +--- + +## PR #813: "328 Add `schema` and `dynamic_models` components" + +- **Created:** 2026-01-27 | **Commits:** 4 | **Files:** 13 +- **Branch:** `328-schema-and-dynamic-models-from-a-and-m` + +### What it adds + +| Component | Lines | Purpose | +|-----------|-------|---------| +| `components/lif/datatypes/schema.py` | 11 | `SchemaField` dataclass (json_path, description, attributes, py_field_name) | +| `components/lif/schema/core.py` | 144 | OpenAPI schema parsing: `extract_nodes()`, `resolve_openapi_root()`, `load_schema_nodes()` | +| `components/lif/dynamic_models/core.py` | 406 | Dynamic Pydantic model builder: filter, mutation, full variants from `SchemaField` lists | +| `string_utils/core.py` changes | ~20 | Improved `to_pascal_case` (acronym handling), `to_camel_case`, `safe_identifier` (underscore collapse) | +| Tests | ~1,400 | Comprehensive tests for all new components | +| `test/data/test_openapi_schema.json` | 71 | Test fixture | + +### Overlap with existing code + +| PR adds | Already exists | Location | +|---------|---------------|----------| +| `SchemaField` dataclass | `SchemaLeaf` (identical fields minus `py_field_name`) | `openapi_schema_parser/core.py` | +| `load_schema_nodes()` | `load_schema_leaves()` | `openapi_schema_parser/core.py` | +| `extract_nodes()` | `extract_leaves()` | `openapi_schema_parser/core.py` | +| `resolve_openapi_root()` | Same function, same logic | `openapi_schema_parser/core.py` | +| `build_dynamic_model()` (filter/mutation/full) | `build_dynamic_filter_model()` + `build_dynamic_mutation_model()` | `semantic_search_service/core.py` | +| Improved `to_pascal_case` / `to_camel_case` | Centralized in `lif_schema_config/naming.py` | `lif_schema_config/naming.py` | + +The `schema_state_manager` component already orchestrates the full lifecycle: loads schema via `openapi_schema_parser`, builds models via `semantic_search_service`, and maintains a `SchemaState` with leaves, filter_models, mutation_models, and embeddings. + +### What might be worth cherry-picking + +1. **`safe_identifier` underscore collapse** (1 line): `safe = re.sub(r"_+", "_", safe)` -- prevents `foo__bar` from CamelCase boundaries +2. **`build_full_models()`** -- Existing code only builds filter and mutation variants; this adds a "full model" (all fields, non-optional). Useful if needed elsewhere, but not currently required. +3. **Test patterns** -- The 1,400 lines of tests are thorough, but they test duplicate code. + +### Assessment + +**Recommendation: Close.** All core functionality already exists in the codebase. The one-line `safe_identifier` fix can be cherry-picked if needed. + +--- + +## PR #817: "328 graphql utilizing dynamic models" + +- **Created:** 2026-01-28 | **Commits:** 5 (includes all 4 from #813) | **Files:** 37 +- **Branch:** `328-graphql-utilizing-dynamic-models` +- **Self-described as:** "this one still needs work to get it functioning correctly" + +### What it adds (beyond PR #813) + +| Component | Lines | Purpose | +|-----------|-------|---------| +| `components/lif/graphql/core.py` | 108 | `Backend` protocol + `HttpBackend` (talks to Query Planner) | +| `components/lif/graphql/schema_factory.py` | 341 | Strawberry schema builder: OpenAPI -> SchemaField -> Pydantic -> Strawberry types | +| `components/lif/graphql/type_registry.py` | 125 | Singleton type registry for caching types (**unused** -- not wired in) | +| `components/lif/graphql/utils.py` | 87 | PascalCase helpers, serialization, selection path extraction | +| `components/lif/openapi_schema/core.py` | 71 | MDR-first schema sourcing with silent file fallback | +| `components/lif/utils/core.py` | 84 | Env var validation utilities | +| `components/lif/utils/strings.py` | 123 | Copy of `string_utils` (**unused** -- nothing imports it) | +| `components/lif/utils/validation.py` | 66 | Truthy/falsy/bool parsing | +| `bases/lif/api_graphql/core.py` | 66 | **Rewritten** app initialization (sync, no auth) | +| Tests | ~1,200 | Tests for new components | + +### Architecture comparison + +| Aspect | Current (`openapi_to_graphql/`) | PR's approach (`graphql/` + `dynamic_models/` + `schema/`) | +|--------|-------------------------------|--------------------------------------------------| +| Pipeline | OpenAPI dict -> Strawberry types (direct) | OpenAPI -> SchemaField -> Pydantic models -> Strawberry types (3-step) | +| Components | 1 component, ~3 files | 4+ components, ~10 files | +| Mutations | Fully supported | **Commented out** | +| API key auth | Supported via middleware | **Removed entirely** | +| Schema source | `LIFSchemaConfig` + `mdr_client` (fail-loudly) | Silent MDR fallback (**contradicts project policy**) | +| Strawberry bridge | Direct `strawberry.type()` decoration | `strawberry.experimental.pydantic` (marked experimental) | + +### Issues found + +1. **Mutations disabled** -- commented out in `build_schema()`, only present in dead-code `build_schema_old()` +2. **API key auth removed** -- security regression from current codebase +3. **~200 lines of dead code** -- `build_schema_old()` left in `schema_factory.py` +4. **`type_registry.py` fully implemented but unused** -- not wired into any code path +5. **`utils/strings.py` unused** -- nothing imports it +6. **Stray `print()` statements** in `schema_factory.py` and `openapi_schema/core.py` +7. **Duplicate `get_schema_fields()` functions** -- one in `dynamic_models/core.py` and one in `openapi_schema/core.py` with different env vars +8. **Duplicate utility functions** in `graphql/utils.py` with TODO comments acknowledging the duplication +9. **Silent MDR fallback** contradicts project's "fail loudly" philosophy (CLAUDE.md) +10. **Global mutable state** in `schema_factory.py` (`global RootType, FilterInput, MutationInput`) +11. **`logging.basicConfig()` at module level** in `dynamic_models/core.py` -- resets root logger + +### What might be worth adopting + +1. **`Backend` protocol pattern** (`graphql/core.py`) -- clean separation of the Query Planner HTTP client behind a protocol. The existing code has this inline. Could be useful for testing. +2. **`extract_selected_fields()` from Strawberry `Info`** -- computes dotted JSON paths from the GraphQL selection set. Not sure if current code does this. +3. **`pydantic_inputs_to_dict()`** -- recursive serializer that handles Pydantic/dataclass/enum/date types. Useful utility if not already covered. +4. **The 3-step type pipeline concept** (OpenAPI -> Pydantic -> Strawberry) -- architecturally cleaner for Pydantic reuse, but adds complexity and uses experimental Strawberry APIs. Not worth adopting now. + +### Assessment + +**Recommendation: Close.** The code is self-described as non-functional, removes security features (API key auth), disables mutations, contradicts project architecture decisions (schema loading policy), and introduces significant dead code. The existing `openapi_to_graphql/` component, combined with the recent Strawberry version fix (PR #863), handles all of this already. + +--- + +## Summary of action items + +| Action | Details | +|--------|---------| +| Close PR #813 | Comment explaining the functionality now exists in `openapi_schema_parser`, `semantic_search_service`, and `schema_state_manager` | +| Close PR #817 | Comment explaining the code is non-functional and the existing `openapi_to_graphql` pipeline (fixed in PR #863) covers the same ground | +| Cherry-pick consideration | `safe_identifier` underscore collapse fix -- evaluate if `lif_schema_config/naming.py` already handles this | +| Delete branches | After closing, delete `328-schema-and-dynamic-models-from-a-and-m` and `328-graphql-utilizing-dynamic-models` | diff --git a/docs/proposals/generalize-mdr-schema-types.md b/docs/proposals/generalize-mdr-schema-types.md new file mode 100644 index 0000000..9eb4c48 --- /dev/null +++ b/docs/proposals/generalize-mdr-schema-types.md @@ -0,0 +1,44 @@ +# Option A: Generalize MDR Schema Types (Preserve Inheritance Model) + +**Date:** 2026-02-19 +**Status:** Draft — discussion only + +--- + +## Summary + +Rename and generalize the existing LIF-specific type hierarchy while preserving the base/derived/partner inheritance model. This approach keeps the extension and inclusion mechanics but removes the LIF branding so any standard can serve as a base schema. + +| Current | Proposed | +|---------|----------| +| `BaseLIF` | `BaseSchema` | +| `OrgLIF` | `DerivedSchema` | +| `PartnerLIF` | `PartnerSchema` | +| `SourceSchema` | `SourceSchema` | + +--- + +## Effort Estimate (AI-assisted) + +| Area | Estimated Hours | +|------|----------------| +| Database migration (rename enum, migrate data) | 4–6 | +| API — type system & services | 8–12 | +| API — endpoints | 4–6 | +| Frontend — routing & constants | 6–8 | +| Frontend — tree & model explorer | 8–12 | +| Frontend — services & API calls | 4–6 | +| Testing | 8–12 | +| Documentation | 2–4 | +| **Total** | **44–66 hours** | + +Without AI assistance, expect roughly 2–3x these numbers. + +--- + +## Risks + +- **Backward compatibility**: Existing deployments have `BaseLIF`/`OrgLIF`/`PartnerLIF` in their databases. Migration must be seamless. +- **Seed data**: Any seed data or scripts referencing LIF types by name need updating. +- **Downstream consumers**: Services that read `DataModelType` from the MDR API (query planner, translator, GraphQL) may need updates. +- **Scope creep**: Generalizing types is one thing; actually supporting multiple concurrent base schemas with cross-standard translation is a larger architectural change that may surface additional work beyond this estimate. diff --git a/docs/proposals/issue-prioritization-q1-2026.md b/docs/proposals/issue-prioritization-q1-2026.md new file mode 100644 index 0000000..047089d --- /dev/null +++ b/docs/proposals/issue-prioritization-q1-2026.md @@ -0,0 +1,200 @@ +# Issue Prioritization — Q1 2026 + +**Date:** 2026-02-24 +**Context:** 2 devs at ~50% capacity for 3 months (through end of March for feature work), then docs and cleanup through end of June (contract end). +**Capacity:** ~520 dev-hours through March, then docs/cleanup mode. + +--- + +## Likely Already Done — Verify and Close + +These open issues have merged PRs and should just need verification before closing: + +| # | Title | Evidence | +|---|-------|----------| +| 833 | Convert legacy entityIdPaths to CSV | PRs #834, #831 merged | +| 830 | EntityAttributeAssociation missing Extension field | PR #850 merged | +| 812 | Review GH workflow scripts for consistency | PR addressed | +| 811 | MDR API: Don't allow spaces in name/uniqueName | PR merged (UI side in PR #822) | +| 799 | Remove need for docker in pytest tests | PR merged | +| 771 | Expose export transformation group endpoint | PR merged | +| 715 | SPIKE: Review Advisor/Semantic Search tuning | PR merged | +| 717 | Fetch OpenAPI schema from MDR instead of file | Done via PRs #854, #853 (schema loading refactor) | +| 670 | Add PR Template & Reviewer Checklist | PR #829 merged | +| 674 | Implement Flyway for MDR Schema Changes | Already in place (SAM + Flyway infrastructure exists) | +| 741/743 | Advisor login hangs on bad password | Likely resolved by PR #861 (demo user password to env var) | + +**Action: Verify and close ~11 issues. Estimated effort: 1–2 hours.** + +--- + +## Tier 1: Must Do — Demo Stability & Data Quality + +**Estimated effort: 60–80 hours** + +Fixes broken demo functionality or blocks the evaluator experience. + +| # | Title | Est | Assignee | Notes | +|---|-------|-----|----------|-------| +| 847 | MDR dev issue (duplicate ID sequences) | 4–6h | cbeach47 | Active bug, blocks dev work | +| 851 | Partner Filter/View for OrgLIF always empty | 4–6h | — | Broken demo feature | +| 756 | MDR schema creation by upload not honoring refs | 6–8h | cbeach47 | Core MDR functionality | +| 776 | MDR responds with database error in response | 4–6h | — | Leaks internal details | +| 757 | Org3 proficiency names not displayed in Advisor | 4–6h | — | Broken demo data | +| 810 | Remove spaces in MDR entity/attribute names | 6–8h | thelmick | Data cleanup, blocks #801 | +| 801 | MDR UI: Don't allow spaces in name/uniqueName | 4–6h | cbeach47, ahungerford | PR #822 open — needs regex fix (see PR review comment about dots in UniqueName) | +| 729 | Query Planners for Org2/Org3 not finding data | 6–8h | — | Broken multi-org demo | +| 816 | Moving multi-source transform loses attributes | 4–6h | — | Data loss bug in MDR UI | + +--- + +## Tier 2: High Value — MDR Portability & Self-Serve Foundation + +**Estimated effort: 80–120 hours** + +Enables the tenant isolation work (`mdr-tenant-isolation-cognito.md`) and data portability. These are pre-requisites for the self-serve demo capability. + +| # | Title | Est | Assignee | Notes | +|---|-------|-----|----------|-------| +| 772 | Expose import transformation group endpoint | 6–8h | cbeach47 | Needed for data portability | +| 773 | MDR Transformation Portability: UX Support | 8–12h | ahungerford | Pairs with #772 | +| 774 | Expose UX widget to delete transformation group | 4–6h | — | Completes transformation portability | +| 762 | Portable search for existing attributes on upload | 4–6h | cbeach47 | Part of portable upload set | +| 763 | Portable search for existing valuesets on upload | 4–6h | cbeach47 | Part of portable upload set | +| 764 | Portable search for existing values on upload | 4–6h | cbeach47 | Part of portable upload set | +| 765 | Portable search for existing entities on upload | 4–6h | cbeach47 | Part of portable upload set | +| 768 | Support data model ID mapping on upload | 8–12h | ahungerford | Key for import portability | +| 746 | Enforce unique names for exportable items | 4–6h | cbeach47 | Data integrity guardrail | +| 858 | Don't allow mapping unmapped entities/attributes | 4–6h | — | MDR quality guardrail | +| 856 | Align association create/update logic | 6–8h | — | MDR consistency | +| 857 | Association extension data cleanup | 4–6h | — | MDR data quality | + +--- + +## Tier 3: Documentation & Cleanup (post-March) + +**Estimated effort: 40–60 hours** + +Ideal for the April–June docs/cleanup phase. + +| # | Title | Est | Assignee | Notes | +|---|-------|-----|----------|-------| +| 805 | MDR technical documentation (single source of truth) | 8–12h | thelmick | | +| 785 | LIF MDR Guide (Documentation) | 8–12h | thelmick | | +| 661 | Document: how LIF data model differs from a standard | 4–6h | — | Key conceptual doc | +| 733 | Add builder-focused explanation of demo | 4–6h | — | Community onboarding | +| 740 | Document recommended branching strategy | 4–6h | — | Infrastructure docs | +| 739 | Document GitHub Actions pathing | 2–4h | — | Infrastructure docs | +| 736 | Docs: Review Adapter design docs | 4–6h | — | Orchestrator docs | +| 732 | Finalize README for Dagster OSS release | 4–6h | — | Release prep | +| 708 | Delete spreadsheets and python script from repo | 1–2h | — | Quick cleanup | +| 678 | Clean up leftover code from initial buildout | 8–12h | cbeach47 | | +| 660 | Clean up MDR Frontend in lif-main | 4–6h | — | UI tech debt | +| 664 | MDR Frontend cleanup & structure improvements | 4–6h | — | UI tech debt | +| 751 | Publish LIF test plan | 4–6h | — | | +| 693 | Add component-level captured I/O tests | 6–8h | — | | +| 826 | Prevent referencing non-embedded entities/attributes | 4–6h | — | MDR guardrail | +| 775 | On upload, list which data model IDs are in import file | 4–6h | — | MDR UX improvement | +| 848 | Visible version number for MDR/LIF codebase | 2–4h | — | Quick win | +| 814 | Change refs to start with lowercase in JSON samples | 2–4h | — | Quick cleanup | +| 788 | MDR: Prompt for organization name | 2–4h | — | MDR UX improvement | + +--- + +## Tier 4: Spikes & Nice-to-Have (Defer or Descope) + +These are investigation items or features that likely don't fit the remaining window. Recommend closing with a "deferred" comment or moving to a backlog milestone. + +### Research Spikes + +| # | Title | Notes | +|---|-------|-------| +| 804 | SPIKE: Get data out of LIF in non-LIF format | Defer unless directly needed | +| 723 | SPIKE: Bidirectional translations | Defer | +| 721 | SPIKE: Alternative translation patterns | Defer | +| 720 | SPIKE: Custom code translations | Defer | +| 719 | SPIKE: Enum translation approach | Defer | +| 714 | SPIKE: Review x-queryable usage and configuration | Defer | +| 713 | SPIKE: Advisor prompting for job position preferences | Defer | +| 688 | Spike: GraphQL behavior & library evaluation | Defer | +| 684 | Spike: Revisit data issues from R1.1 | Defer | +| 683 | Spike: Update MDR to allow mapping to Base LIF | Defer | +| 694 | Spike: Query Planner job cache | Defer | + +### Feature Work + +| # | Title | Notes | +|---|-------|-------| +| 808 | Add Organization as queryable root entity | Feature — defer | +| 692 | Add top-level GraphQL queries for Org, Course, Credential | Feature — defer | +| 689 | Build processor to unroll non-person references | Feature — defer | +| 696 | Build processor to make entities lowercase for GraphQL | Feature — defer | +| 690 | Normalize SEDS Title Case vs GraphQL enum ALL_CAPS | Feature — defer | +| 654 | Integrate identity mapper with Orchestrator | Large feature — defer | +| 803 | Spike: Frontends deployed in shared S3 bucket | Infrastructure — defer | + +### Advisor-Specific + +| # | Title | Notes | +|---|-------|-------| +| 731 | Advisor: Refactor API in Polylith | Large refactor, assigned to cbeach47 — evaluate scope | +| 734 | Advisor: Reduce hallucinations and circular loops | Tuning work — defer unless blocking demos | +| 735 | Disable prompts for unanswerable follow-up questions | Demo polish — defer | +| 737 | Lock input + show busy state while thinking | UI polish — defer | +| 726 | Advisor not returning proficiency topics | May overlap with #757 | +| 728 | Advisor not returning Remuneration info | Data issue — evaluate | +| 711 | Advisor: Lots of data in response for employment prefs | Tuning — defer | +| 718 | Advisor: Failed to trim messages before summarization | Error handling — evaluate | +| 700 | SPIKE: Advisor fails to recover after LIF-610 | Assigned to cbeach47, resilience — evaluate | +| 691 | Remove over-pruning of fields in Advisor/MCP | Feature — defer | +| 716 | Semantic Search: Fix descriptions of informationSourceId | Data model fix — evaluate | + +### Infrastructure + +| # | Title | Notes | +|---|-------|-------| +| 675/676 | OpenTelemetry strategy & tracing | Nice-to-have — defer | +| 677 | Draft CI smoke test compose up + health check | CI improvement — defer | +| 738 | Script to validate workflow watch paths | CI improvement — defer | +| 673 | Enable Dependabot | Quick setup — could be a small win if time permits | +| 671 | Canonicalize Docker Compose & clean dev variants | Infrastructure cleanup — defer | +| 682 | Deploying MDR database from Mac fails | Platform-specific — defer | +| 685 | Fix flyway configs for MDR/Dagster OS | May already be resolved | + +### Orchestrator / Data + +| # | Title | Notes | +|---|-------|-------| +| 730 | Orchestrator: Deeper review of design doc | Docs — could fit in Tier 3 | +| 725 | Bug: Bad Dagster name | Quick fix if still relevant | +| 710 | Orchestrator: Double Dagster runs | May be resolved | +| 704 | Renee's user fails to call the Orchestrator | May be resolved | +| 709 | Update lif-sample.json and lif-sample_clean.json | Data cleanup | +| 745 | Audit source data models for unique names | Assigned to thelmick | +| 712 | LIF API: Identification system error | Evaluate | +| 687 | Investigate: School Assigned Number issue | Evaluate | +| 722 | Investigate and improve Translator performance | Optimization — defer | +| 668 | MDR API import service method not found | May be resolved by portability work | +| 662 | Export/import translations into MDR | May overlap with #771/#772 | + +--- + +## Suggested Sprint Plan + +| Timeframe | Focus | Target Issues | +|-----------|-------|---------------| +| **Week 1** (now) | Close done issues, fix critical bugs | Verify/close ~11 issues, start Tier 1 bugs | +| **Weeks 2–4** | Demo stability | Complete Tier 1 (60–80h) | +| **Weeks 5–8** | Portability + tenant isolation POC | Tier 2 portability issues, POC for schema isolation | +| **Weeks 9–12** (March) | Tenant isolation build (if POC validates), stabilize | Integration testing, remaining Tier 2 | +| **April–June** | Docs and cleanup | Tier 3 documentation, close/defer Tier 4 | + +--- + +## Open Questions for Team + +1. **Advisor scope:** Several Advisor issues (#726, #728, #734, #735) are open. Are we actively maintaining the Advisor through March, or is it stable enough to leave as-is? +2. **Orchestrator/Dagster:** Issues #710, #704, #725 may already be resolved. Who can verify? +3. **thelmick capacity:** Issues #805, #785, #810, #745 are assigned to thelmick. Is this person still active on the project? +4. **ahungerford capacity:** Issues #773, #768 are assigned. Is this person still available? +5. **Tenant isolation priority:** Does the POC for schema isolation (4–6 hours, see `mdr-tenant-isolation-cognito.md`) fit into the first two weeks alongside bug fixes? diff --git a/docs/proposals/mdr-tenant-isolation-cognito.md b/docs/proposals/mdr-tenant-isolation-cognito.md new file mode 100644 index 0000000..758db50 --- /dev/null +++ b/docs/proposals/mdr-tenant-isolation-cognito.md @@ -0,0 +1,250 @@ +# Proposal: MDR Tenant Isolation via PostgreSQL Schema Namespacing + Cognito + +**Date:** 2026-02-24 +**Status:** Draft — ready for team review +**Related:** `LIF Self-Serve Demo Website Outline.docx`, `plan-demo-user-password.md` + +--- + +## Problem + +The LIF demo environment needs to support external evaluators exploring the MDR without interfering with each other or the production demo data. Today there is no registration, no user isolation, and multiple evaluators share the same accounts and data — causing collisions and confusion. + +The ambitious design doc (`LIF Self-Serve Demo Website Outline.docx`) estimates 160–260 hours for a full self-serve platform. This proposal scopes a **modest, high-value slice**: Cognito authentication + PostgreSQL schema-based data isolation for the MDR. + +--- + +## Solution Overview + +1. **AWS Cognito** replaces the hardcoded user database for MDR authentication +2. **PostgreSQL schema namespacing** gives each guest user their own isolated copy of the MDR data +3. **Role-based schema routing** determines which schema a user operates in +4. The existing `public` schema remains the production demo data used by the translator, GraphQL, and semantic search services + +--- + +## Role-Based Schema Routing + +### Roles + +| Role | Identity Source | PostgreSQL Schema | Access Level | +|------|----------------|-------------------|--------------| +| **service** | API key (`X-API-Key` header) | `public` | Full — translator, GraphQL, semantic search | +| **admin** | Cognito JWT, `lif-admins` group | `public` (default) | Full — dev team | +| **guest** | Cognito JWT, no admin group | `tenant_{user_id}` | Full — isolated sandbox | + +### Routing Logic (in auth middleware) + +``` +if request has API key → schema = "public" +elif "lif-admins" in JWT cognito:groups → schema = "public" +else → schema = "tenant_{user_id}" +``` + +### Admin Schema Override (debugging) + +Admins can pass an `X-MDR-Schema` header to operate in a guest's schema — useful for debugging issues in a specific user's sandbox. The middleware only honors this header for admin-role users. + +``` +if admin AND "X-MDR-Schema" header present → schema = header value +``` + +This means the dev team logs into Cognito like anyone else, but their accounts are in the `lif-admins` group. One auth system for everyone — no separate login flows. + +--- + +## How It Works + +### The Key Mechanism: `SET search_path` + +PostgreSQL's `search_path` determines which schema is used when queries reference unqualified table names. By setting it per-request, every existing query automatically operates in the right schema with zero changes to service or endpoint code. + +```python +# database_setup.py — the single change that propagates everywhere +async def get_session(request: Request) -> AsyncSession: + async with async_session() as session: + schema = getattr(request.state, "tenant_schema", "public") + await session.execute(text(f"SET search_path TO {schema}")) + yield session +``` + +All 14 service files and 14 endpoint files are unchanged — they already receive `session` via FastAPI's `Depends(get_session)`. The session just silently operates in the correct schema. + +### Schema Provisioning + +When a guest registers via Cognito: + +1. Cognito post-confirmation Lambda trigger calls `POST /tenants/provision` +2. MDR creates a new PostgreSQL schema: `CREATE SCHEMA tenant_{user_id}` +3. Replays the seed data (DDL + base data model) into the new schema +4. Records the mapping in `public.tenant_schemas` table + +### Schema Reset ("Start Over") + +Guest clicks "Reset Demo" in the MDR UI: + +1. Calls `POST /tenants/reset` +2. MDR drops the schema: `DROP SCHEMA tenant_{user_id} CASCADE` +3. Re-provisions from the seed template + +### Schema Cleanup + +For abandoned accounts: + +- Scheduled Lambda or cron job queries Cognito for inactive users +- Drops their schemas and removes the mapping + +--- + +## What Changes + +### Layer 1: Database Session (the lynchpin) + +**File:** `components/lif/mdr_utils/database_setup.py` + +Modify `get_session()` to accept the `Request` and set `search_path` from `request.state.tenant_schema` before yielding the session. + +### Layer 2: Auth Middleware — Tenant Resolution + +**File:** `components/lif/mdr_auth/core.py` + +Extend `AuthMiddleware.dispatch()`: + +1. After authenticating (existing JWT or API key logic), determine the role +2. For API keys: set `request.state.tenant_schema = "public"` +3. For Cognito JWTs: check `cognito:groups` claim + - If `lif-admins` in groups: set schema to `"public"` (or `X-MDR-Schema` header value if present) + - Otherwise: look up schema from `public.tenant_schemas` mapping table +4. Replace current custom JWT validation with Cognito JWT validation (RS256 with Cognito JWKS) + +### Layer 3: Tenant Service (new component) + +**File:** `components/lif/mdr_services/tenant_service.py` (new) + +Functions: +- `provision_tenant_schema(session, user_id)` — CREATE SCHEMA + replay seed +- `reset_tenant_schema(session, user_id)` — DROP + re-provision +- `cleanup_tenant_schema(session, user_id)` — DROP + remove mapping +- `resolve_tenant_schema(user_id)` — look up schema name from mapping table + +### Layer 4: Tenant Endpoints (new) + +**File:** `bases/lif/mdr_restapi/tenant_endpoints.py` (new) + +| Route | Method | Purpose | Auth | +|-------|--------|---------|------| +| `POST /tenants/provision` | POST | Provision schema for new user | Cognito Lambda trigger (service key) | +| `POST /tenants/reset` | POST | Reset current user's schema | Guest (self-service) | +| `DELETE /tenants/{user_id}` | DELETE | Drop schema and mapping | Admin only | +| `GET /tenants/status` | GET | Check schema health | Guest (self-service) | +| `GET /tenants/` | GET | List all tenant schemas | Admin only | + +### Layer 5: Cognito + CloudFormation + +- Cognito User Pool with email verification +- Custom attributes: Organization, Role (optional) +- `lif-admins` group for dev team +- Post-confirmation Lambda trigger → `POST /tenants/provision` +- CloudFormation template for User Pool, App Client, Lambda trigger +- SSM parameters for Cognito Pool ID, App Client ID + +### Layer 6: MDR Frontend + +- Replace current username/password login with Cognito Hosted UI (or Amplify Auth) +- Pass Cognito JWT in `Authorization: Bearer` header (same pattern, different token issuer) +- Add "Reset Demo" button for guest users +- Show role indicator (admin vs. guest) in UI header + +### Layer 7: Seed Script Adaptation + +The V1.1 Flyway migration contains DDL + seed data for the `public` schema. We need a version that can be replayed into an arbitrary schema: + +- Extract DDL from V1.1 into a parameterized template +- Strip sequence ownership and schema-specific references +- Seed only the base data model (BaseLIF) — guests don't need OrgLIF/PartnerLIF initially + +### Layer 8: Sync Connection (`sql_util.py`) + +**File:** `components/lif/mdr_utils/sql_util.py` + +This uses synchronous `psycopg2` directly (separate from the async ORM). It needs `SET search_path` added after connection, or should receive the schema from the caller. + +--- + +## What Doesn't Change + +Because `SET search_path` is transparent to SQLAlchemy: + +- **All 14 service files** — unchanged (queries use table names, PostgreSQL resolves to the right schema) +- **All 14 endpoint files** — unchanged (receive session via `Depends`) +- **SQL model definitions** (`mdr_sql_model.py`) — unchanged +- **Schema generation service** — unchanged (operates within session schema) +- **Jinja helper/translation services** — unchanged +- **DataModelType enum and all type-based logic** — unchanged (each schema has its own BaseLIF, OrgLIF, etc.) +- **Frontend API calls** — same endpoints, same request format (just different JWT issuer) + +--- + +## Edge Cases + +| Concern | Resolution | +|---------|------------| +| Each schema has its own ID sequences | IDs overlap between tenants — fine, schemas are fully isolated | +| DataModelType.BaseLIF delete protection | Works per-schema — each guest has their own protected BaseLIF | +| `sql_util.py` uses sync psycopg2 | Add `SET search_path` after connect | +| Abandoned guest schemas | Scheduled cleanup job drops schemas for inactive/deleted Cognito users | +| PostgreSQL schema count limits | No hard limit; Aurora handles hundreds of schemas comfortably | +| Cross-schema queries | Not needed — tenants are fully isolated by design | +| Service accounts (translator, GraphQL) | Always route to `public` via API key detection | +| Dev team access to guest data | Admin schema override via `X-MDR-Schema` header | +| Cognito token validation | Replace HS256 custom JWT with RS256 Cognito JWKS verification | +| Existing demo user passwords | Eliminated — Cognito handles all authentication (`plan-demo-user-password.md` becomes moot) | + +--- + +## Effort Estimate + +| Area | Hours | Notes | +|------|-------|-------| +| `database_setup.py` — search_path injection | 4–6 | The single change that enables everything | +| `mdr_auth/core.py` — Cognito JWT + role routing | 6–8 | Replace custom JWT with Cognito JWKS validation | +| `tenant_service.py` — provision/reset/cleanup | 8–12 | Schema lifecycle management | +| `tenant_endpoints.py` — new API routes | 4–6 | Thin layer over tenant_service | +| Cognito CloudFormation + Lambda trigger | 8–12 | User Pool, App Client, post-confirmation hook | +| MDR frontend — Cognito auth swap | 6–8 | Replace login form with Cognito Hosted UI | +| Seed script adaptation (V1.1 → parameterized) | 4–6 | Extract DDL + base data into replayable template | +| `sql_util.py` — sync search_path | 2–3 | Small fix for sync connection path | +| Admin schema override (`X-MDR-Schema`) | 2–3 | Header check in middleware, admin-only | +| Testing | 8–10 | Schema isolation, provisioning, role routing | +| **Total** | **52–74 hours** | + +With AI-assisted development, this is roughly **2–3 weeks of focused work**. + +--- + +## Migration Path + +### Phase 1: Schema isolation (this proposal) +- Cognito + tenant schemas for MDR only +- Dev team as `lif-admins`, guests get sandboxes +- `public` schema unchanged for translator/GraphQL/semantic search + +### Phase 2: Advisor integration (future) +- Extend Cognito to Advisor app +- Advisor reads from guest's isolated GraphQL/data stack (separate effort) + +### Phase 3: Elevated access (future) +- Guests can request admin access via support workflow +- Move to `lif-admins` group in Cognito + +This aligns with the phased approach in the Self-Serve Demo Website Outline while delivering the highest-value piece first. + +--- + +## Open Questions + +1. **Seed data scope**: Should guest schemas get only BaseLIF, or also a sample SourceSchema + OrgLIF with mappings so they can experience the full workflow? +2. **Schema naming**: `tenant_{cognito_sub}` works but Cognito subs are UUIDs — long schema names. Alternative: `tenant_{short_hash}`? +3. **Concurrency**: How many simultaneous guest schemas do we expect? 10? 100? This affects Aurora sizing. +4. **Schema TTL**: Auto-expire guest schemas after N days of inactivity? Or keep indefinitely until manual cleanup? +5. **DocumentDB consideration**: If Option C (MongoDB migration from `docs/proposals/migrate-mdr-to-mongodb.md`) is pursued, tenant isolation would use MongoDB databases instead of PostgreSQL schemas. Should we sequence these decisions? diff --git a/docs/proposals/mdr-tenant-isolation-qa.md b/docs/proposals/mdr-tenant-isolation-qa.md new file mode 100644 index 0000000..2014513 --- /dev/null +++ b/docs/proposals/mdr-tenant-isolation-qa.md @@ -0,0 +1,181 @@ +# MDR Tenant Isolation — Q&A from Design Review + +**Date:** 2026-02-24 +**Context:** Questions raised during team design review of `mdr-tenant-isolation-cognito.md` + +--- + +## Team Questions + +### 1. Does Alex (AWS Ops) need to be involved? If so, in what way/effort? + +**Yes, but scoped to two areas:** + +| Area | Alex's Role | Effort | +|------|------------|--------| +| Cognito CloudFormation | Author or review the User Pool, App Client, and Lambda trigger templates. This is IAM/infrastructure work that fits his lane. | 4–8 hours | +| Aurora capacity review | Confirm the dev/demo Aurora instance can handle the expected schema count. May need to adjust instance class or storage. | 1–2 hours | + +Everything else (database_setup.py, auth middleware, tenant service, frontend) is application code that the dev team handles. Alex doesn't need to touch the MDR codebase — just the CloudFormation templates and a quick Aurora sanity check. + +**Timing:** He can work the Cognito stack in parallel with the application changes. No blocking dependency until integration testing. + +--- + +### 2. Eight layers change — what is the risk to the stability of our existing code? + +The 8 layers are misleading — most are **new files**, not modifications to existing code. + +| Layer | Change Type | Risk | Why | +|-------|------------|------|-----| +| 1. `database_setup.py` | **Modify** | **Medium** | The single riskiest change — if `search_path` isn't set correctly, queries hit the wrong schema. But it's ~5 lines of code, and the fallback is `public` (current behavior). | +| 2. `mdr_auth/core.py` | **Modify** | **Medium** | Replacing JWT validation with Cognito JWKS. Risk is breaking existing auth. Mitigated by keeping API key auth untouched (services keep working). | +| 3. `tenant_service.py` | **New file** | **Low** | Additive — no existing code changes. | +| 4. `tenant_endpoints.py` | **New file** | **Low** | Additive — new routes, no existing routes affected. | +| 5. Cognito CloudFormation | **New stack** | **Low** | Separate infrastructure — doesn't touch existing stacks. | +| 6. MDR frontend auth | **Modify** | **Medium** | Swapping login mechanism. Risk is locking out users during transition. | +| 7. Seed script | **New file** | **Low** | Additive — V1.1 migration stays untouched. | +| 8. `sql_util.py` | **Modify** | **Low** | Small change, and this code path is rarely used. | + +**Bottom line:** Only 3 existing files are modified (`database_setup.py`, `mdr_auth/core.py`, `sql_util.py`). The 14 service files and 14 endpoint files are completely untouched. The blast radius is narrow. + +--- + +### 3. Can we do a small-scale POC first to build confidence? + +**Absolutely — strongly recommended.** Here's a POC that proves the core mechanism in a day or two: + +**POC Scope: Schema Isolation Only (No Cognito)** + +1. Manually create a second schema in the dev MDR database: + ```sql + CREATE SCHEMA poc_tenant_1; + SET search_path TO poc_tenant_1; + -- Replay V1.1 DDL + minimal seed data + ``` + +2. Add `search_path` injection to `database_setup.py` with a header-based toggle: + ```python + schema = request.headers.get("X-MDR-Schema", "public") + await session.execute(text(f"SET search_path TO {schema}")) + ``` + +3. Test with curl/Postman: + - `GET /datamodels/` with no header → returns `public` schema data (existing demo) + - `GET /datamodels/` with `X-MDR-Schema: poc_tenant_1` → returns tenant data + - `POST` to create an entity in `poc_tenant_1` → verify it doesn't appear in `public` + - Verify translator/GraphQL still work (they don't send the header → `public`) + +4. Verify zero bleed-through (see question below for full test plan) + +**POC Effort: 4–6 hours.** No Cognito, no frontend changes, no CloudFormation. Just proves the core `search_path` mechanism works end-to-end through the existing MDR API. + +--- + +## Additional Design Considerations + +### Could we scope a schema to a Cognito group instead of per-user? + +**Yes — and there's clear value.** Instead of 1 user = 1 schema, a Cognito group maps to a shared schema. Use cases: + +- A team from the same organization evaluating together +- A workshop cohort sharing a dataset +- The dev team already benefits from this (`lif-admins` group → `public`) + +The mapping table becomes `(cognito_group, schema_name)` instead of `(user_id, schema_name)`. The middleware checks group membership first, falls back to per-user schema. Minimal extra effort — maybe 2–3 hours on top of the base estimate. + +### Layer 7 — should we ensure data portability (import/export) first? + +**Agreed this is the right sequencing.** The MDR already has `/import_export/export/{data_model_id}` and `/import_export/import/` endpoints. If we ensure those work reliably for a full round-trip (export base model → import into fresh schema), then: + +1. Provisioning uses the import endpoint rather than raw SQL replay +2. V1.1 seed data can eventually be stripped to just DDL (no COPY statements) +3. Base data becomes a JSON fixture file rather than embedded SQL + +Recommendation: **POC uses raw SQL replay, production uses import/export** — validates both paths. + +### Auto-expiry — how do we warn users? + +Suggested UX: + +- **On login:** Banner: "Your demo environment expires on {date}. Export your work to keep it." +- **7 days before expiry:** Email via Cognito + SES (Cognito can trigger Lambda on custom events) +- **On expiry:** Schema is archived (renamed to `expired_{user_id}_{date}`) for 30 days, then dropped +- **Post-expiry login:** "Your environment has expired. Reset to start fresh or contact support." + +This is Phase 2 polish — not needed for POC or MVP. + +--- + +## Technical Risk Questions + +### What is the blast radius if `search_path` misbehaves? + +**Contained.** + +- **Worst case (wrong schema):** A user sees another tenant's data. Detectable in POC testing. Mitigated by validating the schema name against the mapping table before setting it. +- **Worst case (SET fails):** SQLAlchemy raises an exception, the request returns 500. No data corruption — it's a session-level setting, not a DDL change. +- **Service accounts are immune** — API key auth always routes to `public` before the search_path logic runs. +- **Rollback:** Remove the one `SET search_path` line → everything uses `public` again. + +### How do we prove zero schema bleed-through? + +**POC test plan:** + +1. Create entity "CANARY_TENANT_1" in `poc_tenant_1` +2. Create entity "CANARY_PUBLIC" in `public` +3. Query `public` — must see CANARY_PUBLIC, must NOT see CANARY_TENANT_1 +4. Query `poc_tenant_1` — must see CANARY_TENANT_1, must NOT see CANARY_PUBLIC +5. Run the full integration test suite against `public` — must pass unchanged +6. Automate as a recurring integration test for CI + +### What is the rollback plan? + +1. **Code rollback:** Revert `database_setup.py` (remove `SET search_path`) and `mdr_auth/core.py` (revert to custom JWT). Two-file revert, everything works as before. +2. **Data rollback:** Tenant schemas are isolated — `DROP SCHEMA tenant_xxx CASCADE` cleans up completely. `public` schema is never modified by tenant operations. +3. **Cognito rollback:** Cognito User Pool is a separate CloudFormation stack. Delete it. Re-enable the hardcoded user database in `core.py`. + +### What happens if provisioning partially fails? + +Schema creation is a two-step process: `CREATE SCHEMA` then seed. If seeding fails: + +- The schema exists but is empty/incomplete +- Tenant status endpoint returns "unhealthy" +- Resolution: `DROP SCHEMA ... CASCADE` and retry +- Can wrap in a transaction: if seed fails, schema creation rolls back + +Design the provisioning to be **idempotent** — calling it twice for the same user either succeeds (already provisioned) or retries cleanly. + +### How many schemas can Aurora handle under expected load? + +PostgreSQL has **no hard limit** on schemas. Aurora PostgreSQL inherits this. Practical considerations: + +| Scenario | Tables | Storage | Impact | +|----------|--------|---------|--------| +| 10 tenants | ~150 tables | ~20 MB | Trivial | +| 100 tenants | ~1,500 tables | ~200 MB | Comfortable | +| 1,000 tenants | ~15,000 tables | ~2 GB | Monitor `pg_catalog` query times | + +Each tenant's seed data is ~2MB (V1.1 is 1.9MB). For the expected load (10–50 evaluators), Aurora won't even notice. + +### Does this conflict with the MongoDB migration proposal? + +**No conflict for sequencing.** If we build this now on PostgreSQL and later pursue the MongoDB migration (`migrate-mdr-to-mongodb.md`): + +- `SET search_path` maps to **MongoDB databases** (each tenant gets their own DB, connection switches per-request) +- Middleware logic (role → schema routing) translates directly +- Cognito integration is database-agnostic — stays the same +- Tenant service interface stays the same, just the implementation changes + +Building on PostgreSQL now is not wasted work — the patterns carry over. + +### Can we spike schema isolation before Cognito? + +**Yes — that's exactly what the POC is.** Use the `X-MDR-Schema` header as the tenant selector. No Cognito needed. Proves the hardest part (database isolation) independently from the auth plumbing. + +**Recommended build sequence:** + +1. **POC** (1–2 days): Schema isolation with header-based switching +2. **Cognito setup** (parallel, Alex): User Pool, groups, Lambda trigger +3. **Integration** (~1 week): Wire Cognito JWT → middleware → search_path +4. **Frontend** (~1 week): Swap login, add reset button diff --git a/docs/proposals/migrate-mdr-to-mongodb.md b/docs/proposals/migrate-mdr-to-mongodb.md new file mode 100644 index 0000000..c06224c --- /dev/null +++ b/docs/proposals/migrate-mdr-to-mongodb.md @@ -0,0 +1,283 @@ +# Option C: Migrate MDR from PostgreSQL to MongoDB + +**Date:** 2026-02-19 +**Status:** Draft — discussion only + +--- + +## Summary + +Replace the MDR's PostgreSQL relational database with MongoDB. The MDR stores metadata about data standards (schemas, entities, attributes, value sets, transformations, mappings) — a domain that is arguably a better fit for a document model than rigid relational tables. The current relational schema has 15 tables, 20+ foreign key constraints, 8 custom PostgreSQL enum types, a stored procedure, and Flyway migrations. + +This option can be combined with Option A (generalize types) or Option B (simplify to flat schemas). It addresses the storage layer independently of the type system question. + +--- + +## Why Consider This Change + +**Arguments for MongoDB:** +- The MDR models hierarchical, schema-like data (entities containing attributes, value sets containing values, transformation groups containing transformations). Documents naturally represent these nested structures without join tables. +- The project already runs MongoDB for learner data storage. Consolidating on one database technology reduces operational complexity and team cognitive load. +- Schema evolution is simpler — adding fields to a document doesn't require migrations, ALTER TABLE, or enum updates. +- The current `ExtInclusionsFromBaseDM` join table, `EntityAssociation`, `EntityAttributeAssociation`, and similar relationship tables could collapse into embedded arrays within parent documents. +- Value set mappings and transformations are inherently document-shaped (nested, variable structure, JSONata expressions). + +**Arguments for keeping PostgreSQL:** +- The MDR has well-defined, stable relationships that relational databases model cleanly. +- Foreign key constraints enforce data integrity at the database level — MongoDB relies on application-level enforcement. +- Existing Flyway migrations, stored procedures, and AWS RDS Aurora infrastructure are already built and working. +- Transactional consistency across multiple tables is simpler in PostgreSQL. +- The team has already invested in the SQLModel/SQLAlchemy async stack. + +--- + +## Current Relational Schema + +### Tables (15) + +| Table | Purpose | Key Relationships | +|-------|---------|-------------------| +| `DataModels` | Schema definitions | Self-ref via `BaseDataModelId` | +| `Entities` | Entities within a schema | FK → DataModels | +| `Attributes` | Entity attributes | FK → DataModels, FK → ValueSets | +| `EntityAssociation` | Parent-child entity relationships | FK → Entities (parent & child) | +| `EntityAttributeAssociation` | Entity-attribute join | FK → Entities, FK → Attributes | +| `Constraints` | Attribute constraints | FK → Attributes (CASCADE) | +| `DataModelConstraints` | Schema-level constraints | FK → DataModels | +| `ValueSet` | Enumerated value sets | FK → DataModels | +| `ValueSetValue` | Individual values | FK → ValueSets, FK → DataModels | +| `ValueSetValueMapping` | Source→target value mappings | FK → ValueSetValues, FK → TransformationsGroup | +| `TransformationGroup` | Grouped transformations | FK → DataModels (source & target) | +| `Transformation` | Transformation rules | FK → TransformationsGroup | +| `TransformationAttribute` | Transformation I/O attributes | FK → Entities, Attributes, Transformations | +| `ExtInclusionsFromBaseDM` | Extension inclusion tracking | FK → DataModels | +| `ExtMappedValueSet` | Extension value set mapping | FK → ValueSets | + +### SQL-Specific Features in Use +- 8 custom PostgreSQL ENUM types (accesstype, datamodeltype, constrainttype, etc.) +- Stored procedure: `deletedatamodelrecords()` for cascade cleanup +- PostgreSQL extensions: `pg_stat_statements`, `pgaudit` +- 20+ foreign key constraints with ON DELETE CASCADE +- GENERATED ALWAYS AS IDENTITY sequences +- Server-side timestamp defaults with timezone +- AWS RDS Aurora PostgreSQL with IAM authentication + +--- + +## Proposed MongoDB Document Model + +### Collection: `schemas` + +Replaces: `DataModels`, and optionally absorbs `Entities`, `Attributes`, `EntityAssociation`, `EntityAttributeAssociation`, `Constraints`, `DataModelConstraints` + +```json +{ + "_id": "ObjectId", + "name": "LIF", + "description": "Learner Information Framework", + "version": "1.0", + "state": "Published", + "organization": "LIF Initiative", + "tags": ["learner", "interoperability"], + "created_at": "ISODate", + "entities": [ + { + "entity_id": "uuid", + "name": "Person", + "description": "A learner or individual", + "attributes": [ + { + "attribute_id": "uuid", + "name": "firstName", + "data_type": "string", + "required": true, + "constraints": [ + { "type": "Length", "max": 255 } + ] + } + ], + "children": [ + { + "child_entity_id": "uuid", + "name": "Name", + "placement": "Embedded", + "cardinality": "one-to-many" + } + ] + } + ] +} +``` + +### Collection: `value_sets` + +Replaces: `ValueSet`, `ValueSetValue` + +```json +{ + "_id": "ObjectId", + "name": "IdentifierType", + "schema_id": "ObjectId (ref to schemas)", + "values": [ + { "value_id": "uuid", "code": "SSN", "description": "Social Security Number" }, + { "value_id": "uuid", "code": "SCHOOL_ASSIGNED_NUMBER", "description": "..." } + ] +} +``` + +### Collection: `transformations` + +Replaces: `TransformationGroup`, `Transformation`, `TransformationAttribute`, `ValueSetValueMapping` + +```json +{ + "_id": "ObjectId", + "name": "SIS to LIF", + "source_schema_id": "ObjectId", + "target_schema_id": "ObjectId", + "state": "Published", + "rules": [ + { + "rule_id": "uuid", + "source_path": "student.first_name", + "target_path": "Person.Name.firstName", + "expression_language": "JSONata", + "expression": "$source.first_name", + "value_set_mappings": [ + { + "source_value": "SSN", + "source_value_set": "id_types", + "target_value": "SSN", + "target_value_set": "IdentifierType" + } + ] + } + ] +} +``` + +### Collection: `extension_mappings` (only if keeping inheritance — Option A) + +Replaces: `ExtInclusionsFromBaseDM`, `ExtMappedValueSet` + +If Option B (flat model) is chosen, this collection is not needed. + +--- + +## Work Breakdown + +### 1. MongoDB Infrastructure +- Add MongoDB instance for MDR to Docker Compose (or reuse existing MongoDB) +- Replace AWS RDS Aurora PostgreSQL with DocumentDB or MongoDB Atlas in CloudFormation +- Remove Flyway migration infrastructure for MDR +- **Estimate: 4–6 hours** + +### 2. Data Model Layer — Replace SQLModel with Motor/Beanie/PyMongo +- Remove `mdr_sql_model.py` (SQLModel definitions, enums) +- Create MongoDB document models (Beanie ODM or raw Motor) +- Replace `database_setup.py` async PostgreSQL session with Motor async client +- Remove `sql_util.py` raw SQL utilities +- **Estimate: 8–10 hours** + +### 3. Service Layer — Rewrite Queries +- Rewrite all services (`datamodel_service.py`, `entity_service.py`, `attribute_service.py`, `valueset_service.py`, `transformation_service.py`, `constraint_service.py`, `inclusions_service.py`) +- Replace SQLModel `select()` / `where()` with MongoDB queries +- Replace join-based queries with document lookups and `$lookup` aggregations where needed +- Replace stored procedure with application-level cascade logic +- Adapt soft-delete pattern (same concept, different query syntax) +- Adapt pagination (offset/limit works similarly) +- **Estimate: 20–28 hours** + +### 4. API Endpoints — Minimal Changes +- Endpoints mostly delegate to services, so changes are minor +- Update any response models that exposed SQLModel-specific fields +- Remove or adapt ID handling (auto-increment integers → ObjectIds or UUIDs) +- **Estimate: 4–6 hours** + +### 5. Jinja Template Service +- Rewrite `jinja_helper_service.py` queries for MongoDB +- Schema export and OpenAPI generation logic needs MongoDB data access +- **Estimate: 4–6 hours** + +### 6. Data Migration +- Write migration script: PostgreSQL → MongoDB document transformation +- Map relational joins to embedded documents +- Preserve IDs for backward compatibility or define new ID scheme +- Test migration with existing dev/demo data +- **Estimate: 6–8 hours** + +### 7. Frontend — ID and Response Format Changes +- Update API response handling if ID format changes (integer → string/ObjectId) +- Minimal other changes — frontend talks to REST API, not directly to DB +- **Estimate: 2–4 hours** + +### 8. Testing +- Rewrite all MDR service unit tests for MongoDB +- Update integration tests +- Add MongoDB test fixtures (replace SQL seed data) +- **Estimate: 10–14 hours** + +### 9. Remove PostgreSQL Infrastructure +- Remove Flyway files, stored procedures, SAM database templates +- Remove `asyncpg`, `psycopg2-binary`, `sqlmodel` dependencies from MDR project +- Remove `sql_util.py`, sync database connection code +- Clean up Docker Compose (remove PostgreSQL container, restore container) +- **Estimate: 3–4 hours** + +### 10. Documentation +- Update CLAUDE.md, deployment guides, SAM README +- Document new MongoDB schema and connection setup +- **Estimate: 2–3 hours** + +--- + +## Effort Summary + +| Area | Estimated Hours | +|------|----------------| +| MongoDB infrastructure | 4–6 | +| Data model layer (ODM) | 8–10 | +| Service layer rewrite | 20–28 | +| API endpoints | 4–6 | +| Jinja template service | 4–6 | +| Data migration script | 6–8 | +| Frontend adjustments | 2–4 | +| Testing | 10–14 | +| Remove PostgreSQL infra | 3–4 | +| Documentation | 2–3 | +| **Total** | **63–89 hours** | + +These estimates assume AI-assisted development. Without AI assistance, expect roughly 2–3x. + +--- + +## Comparison: All Three Options + +| | Option A: Generalize Types | Option B: Simplify (Flat) | Option C: MongoDB Migration | +|---|---|---|---| +| **Approach** | Rename types, keep inheritance | Remove types and inheritance | Replace database engine | +| **Estimate** | 44–66 hrs | 32–46 hrs | 63–89 hrs | +| **Can combine with** | Option C | Option C | Option A or B | +| **Complexity after** | Same structure, generic names | Significantly reduced | Different technology, simpler documents | +| **Operational impact** | Low — same DB, same infra | Low — same DB, less schema | High — new DB engine, new infra | +| **Team skill alignment** | Same stack | Same stack | Aligns with existing MongoDB usage elsewhere | +| **Schema evolution** | Still requires migrations | Still requires migrations | No migrations needed | +| **Data integrity** | FK constraints enforced by DB | FK constraints enforced by DB | Application-level enforcement | + +### Recommended Combinations + +- **Option B + C** (simplify types AND move to MongoDB): **78–113 hours combined** (some overlap reduces total). This is the most transformative option — flat schema model stored as documents. Eliminates both the type hierarchy complexity and the relational overhead. +- **Option A + C** (generalize types AND move to MongoDB): **88–127 hours combined**. Less value — you'd be migrating complexity to a new database rather than eliminating it. +- **Option B alone**: Best ROI if the relational database isn't causing pain beyond the type system. +- **Option C alone**: Best if the primary pain is the relational model and migration overhead, but the type system is acceptable. + +--- + +## Risks & Considerations + +- **Service layer rewrite is the largest cost**: 7 service files with complex query logic all need rewriting. This is where most bugs will be introduced. +- **Loss of referential integrity**: PostgreSQL enforces FK constraints at the DB level. MongoDB requires application-level enforcement or accepts eventual consistency. Bugs in cascade deletes or orphaned documents become possible. +- **AWS DocumentDB vs MongoDB Atlas**: DocumentDB has MongoDB API compatibility gaps (no multi-document transactions in older versions, limited aggregation pipeline support). MongoDB Atlas is fully compatible but adds a vendor dependency. Evaluate which is appropriate for the AWS deployment. +- **Identity Mapper**: The identity mapper service uses a separate MariaDB database. This proposal does not cover migrating that — it could be a follow-on effort. +- **Existing backup/restore**: The current Docker setup uses `pg_dump`/`pg_restore`. MongoDB uses `mongodump`/`mongorestore` — backup scripts need updating. +- **Audit and compliance**: `pgaudit` extension provides query-level audit logging. MongoDB has its own audit log capability but configuration differs. diff --git a/docs/proposals/simplify-mdr-schema-model.md b/docs/proposals/simplify-mdr-schema-model.md new file mode 100644 index 0000000..36a833e --- /dev/null +++ b/docs/proposals/simplify-mdr-schema-model.md @@ -0,0 +1,172 @@ +# Option B: Simplify MDR to Flat Schema Model (Remove Inheritance) + +**Date:** 2026-02-19 +**Status:** Draft — discussion only + +--- + +## Summary + +Rather than generalizing the existing base/derived/partner inheritance hierarchy, remove it entirely. The MDR becomes a registry of schemas (standards) that can be browsed, explored, and mapped between — with no special type distinctions or extension mechanics. + +The core ask: **capture, navigate, and explore any standard, and define mappings between them.** + +--- + +## Proposed Model + +### Before (4 types with inheritance) + +``` +BaseLIF ──┬── OrgLIF (extends BaseLIF, tracks inclusions) + └── PartnerLIF (extends BaseLIF, tracks inclusions) +SourceSchema (standalone, no inheritance) +``` + +### After (1 type, flat) + +``` +Schema (any standard: LIF, CTDL, Ed-Fi, CASE, org-specific, etc.) + └── Mappings between schemas define field-level relationships +``` + +All schemas are peers. LIF is just another schema in the registry, not a privileged base type. + +--- + +## What Gets Removed + +| Current Concept | Disposition | +|----------------|-------------| +| `DataModelType` enum (BaseLIF/OrgLIF/PartnerLIF/SourceSchema) | Replace with a single type or remove entirely | +| `BaseDataModelId` foreign key | Remove — no parent/child schema relationship | +| `ExtInclusionsFromBaseDM` table | Remove — no inclusion tracking needed | +| `ExtendedByDataModelId` on EntityAssociation / EntityAttributeAssociation | Remove — no extension tracking | +| `ContributorOrganization` filtering ("LIF" vs org vs partner) | Remove — schemas are standalone | +| `partner_only` / `org_ext_only` / `public_only` filter modes | Remove from API and UI | +| BaseLIF deletion protection | Remove — all schemas are deletable (or use a generic "locked" flag) | +| Hardcoded `data_model_id=1` | Remove | +| `/is_orglif/`, `/orglif/`, `/base/` endpoints | Remove | +| Dedicated "LIF Model" and "Data Extensions" UI pages | Replace with a unified schema explorer | +| OrgLIF sub-views (All, Base Inclusions, Public, Extensions, Partner) | Remove — single view per schema | +| Extension/partner classification in tree explorer | Remove | + +--- + +## What Gets Added or Changed + +### API +- **Schema CRUD** simplifies — create, read, update, delete any schema with no type-specific validation +- **Mappings** become the primary relationship between schemas (this already partially exists in the Mappings feature) +- Optional metadata fields on schemas: `standard_name`, `version`, `organization` for categorization without hard type distinctions +- Endpoints simplify to standard REST for schemas, entities, attributes, and mappings + +### Frontend +- **Unified schema explorer** — single tree/list of all schemas, no folder grouping by type +- **Schema detail view** — browse entities and attributes for any schema, no inclusion/extension overlays +- **Mappings view** — already exists, becomes the primary way to relate schemas to each other +- Simpler tree component — no type-based branching, color coding, or sub-views + +### Database +- Migration to drop or ignore `DataModelType`, `BaseDataModelId`, `ExtInclusionsFromBaseDM`, and extension tracking columns +- Existing data preserved (BaseLIF becomes just a schema named "LIF") + +--- + +## Work Breakdown + +### 1. Database Migration +- Drop or deprecate `DataModelType` column (or migrate all values to a single type) +- Drop `BaseDataModelId` foreign key +- Drop `ExtInclusionsFromBaseDM` table +- Drop `ExtendedByDataModelId` from association tables +- Add optional metadata columns if needed (`standard_name`, `version`) +- **Estimate: 3–5 hours** + +### 2. API — Remove Inheritance Logic +- Remove `DataModelType` enum or collapse to single value +- Remove all type-specific validation in `datamodel_service.py` +- Remove extension/inclusion filtering in `entity_service.py` and `attribute_service.py` +- Remove base model chain traversal in `jinja_helper_service.py` +- Remove auto-extension marking in entity creation +- **Estimate: 6–8 hours** + +### 3. API — Simplify Endpoints +- Remove `/is_orglif/`, `/orglif/`, `/base/` endpoints +- Remove `partner_only`, `org_ext_only`, `public_only` parameters from `/with_details/` +- Remove hardcoded `data_model_id=1` defaults +- Remove or simplify inclusions endpoints +- **Estimate: 3–4 hours** + +### 4. Frontend — Simplify Schema Explorer +- Remove type constants and type-based routing +- Replace `/explore/lif-model` and data-extensions pages with unified schema list +- Simplify `DataModelSelector.tsx` — no type-based auto-navigation or defaults +- Simplify `SimpleTree.tsx` — flat list of schemas, no type-based folders +- **Estimate: 6–8 hours** + +### 5. Frontend — Simplify Model Explorer +- Remove BaseLIF base-data fetching logic +- Remove inclusion/visibility toggles +- Remove extension/partner classification in `TreeModelExplorer.tsx` +- Simplify to: browse entities and attributes for a single schema +- **Estimate: 4–6 hours** + +### 6. Frontend — Services & API Calls +- Remove `listOrgLifModels()` and type-specific service functions +- Simplify `CreateDataModelParams` — no type restrictions +- Remove inclusion-related API calls +- **Estimate: 2–4 hours** + +### 7. Testing +- Remove/update tests for deleted functionality +- Add tests for simplified schema CRUD +- Verify mappings still work correctly +- Integration test updates +- **Estimate: 6–8 hours** + +### 8. Documentation +- Update CLAUDE.md and API docs +- Migration guide for existing deployments +- **Estimate: 2–3 hours** + +--- + +## Effort Summary + +| Area | Estimated Hours | +|------|----------------| +| Database migration | 3–5 | +| API — remove inheritance logic | 6–8 | +| API — simplify endpoints | 3–4 | +| Frontend — simplify schema explorer | 6–8 | +| Frontend — simplify model explorer | 4–6 | +| Frontend — services & API calls | 2–4 | +| Testing | 6–8 | +| Documentation | 2–3 | +| **Total** | **32–46 hours** | + +These estimates assume AI-assisted development. Without AI assistance, expect roughly 2–3x. + +--- + +## Comparison with Option A (Generalize Types) + +| | Option A: Generalize | Option B: Simplify | +|---|---|---| +| **Approach** | Rename types, keep inheritance | Remove types and inheritance | +| **Estimate** | 44–66 hours | 32–46 hours | +| **Complexity after** | Same as today, different names | Significantly reduced | +| **Extensible schemas** | Yes | No — by design | +| **Multi-standard support** | Requires additional design | Natural — all schemas are peers | +| **Risk** | Medium — rename across full stack | Medium — removing features has fewer edge cases than renaming them | +| **Migration** | Rename in place | Drop columns/tables, existing data becomes flat | + +--- + +## Risks & Considerations + +- **Loss of extension capability**: If org-specific or partner extensions are needed in the future, they would need to be re-implemented. Confirm with stakeholders that flat schemas meet the need. +- **Existing OrgLIF/PartnerLIF data**: Any deployments with OrgLIF or PartnerLIF schemas need a migration path. Extended entities/attributes would become standalone entries in the base schema or be dropped. +- **Mappings feature maturity**: The existing Mappings view becomes the primary way to relate schemas. Verify it is robust enough for the cross-standard use case. +- **Downstream services**: Query planner, translator, and GraphQL services may reference `DataModelType`. Audit for impact.