Skip to content

Feature/add dbt support#357

Merged
kami619 merged 6 commits intomainfrom
feature/add-dbt-support
Mar 26, 2026
Merged

Feature/add dbt support#357
kami619 merged 6 commits intomainfrom
feature/add-dbt-support

Conversation

@kami619
Copy link
Collaborator

@kami619 kami619 commented Mar 26, 2026

Description

Add 4 essential dbt assessors covering core best practices for AI-assisted data engineering workflows.

Rebased and conflict-resolved version of #334 (by @patrickstrick).

Changes

  • SQL language detection via .sql extension mapping
  • DbtProjectConfigAssessor (Tier 1, 10%) - validates dbt_project.yml
  • DbtModelDocumentationAssessor (Tier 1, 10%) - checks schema.yml docs
  • DbtDataTestsAssessor (Tier 2, 3%) - validates PK tests
  • DbtProjectStructureAssessor (Tier 2, 3%) - ensures staging/marts layout
  • 48 unit tests with 6 comprehensive test fixtures
  • Rich remediation with dbt-specific guidance

Follow-up Issues

Closes #334

Co-Authored-By: Patrick Strick patrickstrick@users.noreply.github.com
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

🤖 Generated with Claude Code

patrickstrick and others added 6 commits March 5, 2026 10:54
Add 4 essential dbt assessors covering core best practices for AI-assisted
data engineering workflows. Assessors validate project configuration, model
documentation, data tests, and project structure.

Key features:
- SQL language detection via .sql extension mapping
- DbtProjectConfigAssessor (Tier 1, 10%) - validates dbt_project.yml
- DbtModelDocumentationAssessor (Tier 1, 10%) - checks schema.yml docs
- DbtDataTestsAssessor (Tier 2, 3%) - validates PK tests
- DbtProjectStructureAssessor (Tier 2, 3%) - ensures staging/marts layout
- 48 unit tests with 6 comprehensive test fixtures
- Rich remediation with dbt-specific guidance

Total impact: 26% of AgentReady score for dbt projects.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
…sors

Fix DbtModelDocumentationAssessor and DbtDataTestsAssessor to detect YAML
files with any naming convention, not just *schema.yml and *_models.yml.

dbt supports multiple documentation patterns:
- schema.yml (standard)
- _models.yml (alternative)
- one-file-per-model.yml (used by bookingsmaster-dbt)

Changed from pattern matching to glob "*.yml" to support all conventions.

Before: Only detected schema.yml and _models.yml files
After: Detects all .yml/.yaml files in models/ directory

Tested on bookingsmaster-dbt project:
- Documentation coverage: 0% → 54% (102/189 models detected)
- Test coverage: 0% → 4.8% (9/189 models detected)
- 171 YAML files now properly scanned

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Apply code style fixes from isort and ruff:
- Fix import ordering (pathlib before pytest)
- Remove unnecessary blank lines
- Remove unnecessary f-string prefixes (ruff F541)

All checks passing:
- ✅ isort (import sorting)
- ✅ ruff (code quality)
- ✅ 48 unit tests passing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kami619 kami619 mentioned this pull request Mar 26, 2026
19 tasks
@kami619 kami619 merged commit 599bea9 into main Mar 26, 2026
9 of 10 checks passed
@coderabbitai
Copy link

coderabbitai bot commented Mar 26, 2026

Caution

Review failed

The pull request is closed.

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Validation error: String must contain at most 250 character(s) at "tone_instructions"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae04fcae-b307-4ba4-89bb-48cbe0152848

📥 Commits

Reviewing files that changed from the base of the PR and between 2028ce9 and d48fef3.

📒 Files selected for processing (77)
  • src/agentready/assessors/__init__.py
  • src/agentready/assessors/dbt.py
  • src/agentready/services/language_detector.py
  • tests/fixtures/dbt_projects/flat_structure/dbt_project.yml
  • tests/fixtures/dbt_projects/flat_structure/models/model_1.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_10.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_11.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_12.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_13.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_14.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_15.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_16.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_17.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_18.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_19.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_2.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_20.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_21.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_22.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_23.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_24.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_25.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_26.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_27.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_28.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_29.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_3.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_30.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_31.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_32.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_33.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_34.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_35.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_36.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_37.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_38.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_39.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_4.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_40.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_41.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_42.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_43.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_44.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_45.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_46.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_47.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_48.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_49.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_5.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_50.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_51.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_52.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_53.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_54.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_55.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_6.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_7.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_8.sql
  • tests/fixtures/dbt_projects/flat_structure/models/model_9.sql
  • tests/fixtures/dbt_projects/minimal_valid/dbt_project.yml
  • tests/fixtures/dbt_projects/minimal_valid/models/sample_model.sql
  • tests/fixtures/dbt_projects/missing_docs/dbt_project.yml
  • tests/fixtures/dbt_projects/missing_docs/models/model1.sql
  • tests/fixtures/dbt_projects/missing_docs/models/model2.sql
  • tests/fixtures/dbt_projects/missing_tests/dbt_project.yml
  • tests/fixtures/dbt_projects/missing_tests/models/customers.sql
  • tests/fixtures/dbt_projects/missing_tests/models/schema.yml
  • tests/fixtures/dbt_projects/non_dbt/README.md
  • tests/fixtures/dbt_projects/non_dbt/sample.sql
  • tests/fixtures/dbt_projects/well_structured/dbt_project.yml
  • tests/fixtures/dbt_projects/well_structured/macros/sample_macro.sql
  • tests/fixtures/dbt_projects/well_structured/models/marts/dim_customers.sql
  • tests/fixtures/dbt_projects/well_structured/models/marts/schema.yml
  • tests/fixtures/dbt_projects/well_structured/models/staging/schema.yml
  • tests/fixtures/dbt_projects/well_structured/models/staging/stg_customers.sql
  • tests/fixtures/dbt_projects/well_structured/tests/assert_positive_ids.sql
  • tests/unit/test_assessors_dbt.py

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting


Walkthrough

Four new dbt assessor classes are introduced to evaluate dbt project validity, model documentation coverage, data test completeness, and project structure. Updates include dbt assessor registration, SQL language detection, test fixtures spanning six dbt project structures, and comprehensive unit test coverage.

Changes

Cohort / File(s) Summary
Core dbt assessors
src/agentready/assessors/dbt.py
New module with 4 BaseAssessor implementations: DbtProjectConfigAssessor (validates dbt_project.yml structure), DbtModelDocumentationAssessor (measures model doc coverage), DbtDataTestsAssessor (validates PK test coverage), DbtProjectStructureAssessor (evaluates directory organization). Includes shared helpers for dbt project detection, YAML file discovery, and safe parsing. Each assessor generates proportional scores and remediation guidance.
Assessor registration
src/agentready/assessors/__init__.py
Imports and registers four new dbt assessor classes into the assessors list. Updates docstring tier counts from 10 to 12 total assessors.
Language detection
src/agentready/services/language_detector.py
Adds .sql file extension mapping to "SQL" language type for language detection.
Well-structured dbt fixture
tests/fixtures/dbt_projects/well_structured/...
Complete dbt project with proper organization: dbt_project.yml, staging/marts models with schema.yml documentation, tests, and a sample macro. Demonstrates best practices.
Minimal and incomplete dbt fixtures
tests/fixtures/dbt_projects/minimal_valid/..., tests/fixtures/dbt_projects/missing_docs/..., tests/fixtures/dbt_projects/missing_tests/...
Three fixture projects testing edge cases: minimal valid config, missing documentation, and missing tests.
Flat structure dbt fixture
tests/fixtures/dbt_projects/flat_structure/dbt_project.yml, tests/fixtures/dbt_projects/flat_structure/models/model_*.sql
55 SQL model files (model_1.sql through model_55.sql) plus config, designed to test flat model directory detection logic.
Non-dbt fixture
tests/fixtures/dbt_projects/non_dbt/...
Non-dbt project fixture with README and sample SQL to verify assessor applicability checks.
Unit tests
tests/unit/test_assessors_dbt.py
Comprehensive test suite with 727 lines covering all four assessors: fixtures for each dbt project structure, helper function validation, applicability checks, success/failure/not-applicable scenarios, proportional scoring verification, and remediation generation assertions.

Sequence Diagram

sequenceDiagram
    participant User
    participant AssessmentEngine as Assessment Engine
    participant DbtAssessor as Dbt Assessor
    participant FileSystem as File System
    participant YAMLParser as YAML Parser
    participant Evidence as Evidence Builder

    User->>AssessmentEngine: assess(repository)
    AssessmentEngine->>DbtAssessor: is_applicable(repository)
    DbtAssessor->>FileSystem: check dbt_project.yml exists
    FileSystem-->>DbtAssessor: exists/not exists
    
    alt Is applicable
        AssessmentEngine->>DbtAssessor: assess(repository)
        DbtAssessor->>FileSystem: locate models/, schema.yml files
        FileSystem-->>DbtAssessor: file paths
        
        par Parallel Analysis
            DbtAssessor->>YAMLParser: parse dbt_project.yml
            YAMLParser-->>DbtAssessor: config dict
            DbtAssessor->>FileSystem: scan model SQL files
            FileSystem-->>DbtAssessor: model list
        end
        
        DbtAssessor->>DbtAssessor: analyze structure/docs/tests
        DbtAssessor->>DbtAssessor: calculate proportional score
        DbtAssessor->>Evidence: build remediation if needed
        Evidence-->>DbtAssessor: Finding with score, evidence, remediation
        DbtAssessor-->>AssessmentEngine: Finding
    else Not applicable
        DbtAssessor-->>AssessmentEngine: not_applicable Finding
    end
    
    AssessmentEngine-->>User: Assessment result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/add-dbt-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kami619 kami619 deleted the feature/add-dbt-support branch March 26, 2026 03:11
@github-actions
Copy link
Contributor

🎉 This PR is included in version 2.31.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@github-actions
Copy link
Contributor

📈 Test Coverage Report

Branch Coverage
This PR 67.1%
Main 67.1%
Diff ✅ +0%

Coverage calculated from unit tests only

@github-actions
Copy link
Contributor

AgentReady Code Review — PR #357: Feature/add dbt support

Overall assessment: Well-structured addition with good test coverage (48 tests, 6 fixtures). The assessors follow established patterns and include rich remediation. However, there are 3 blocking issues that should be resolved before merge.


Blocking Issues

1. placeholder_texts includes "description" — false negatives in documentation scoring

File: src/agentready/assessors/dbt.py:307

placeholder_texts = {"todo", "tbd", "fixme", "placeholder", "description"}

Any meaningful description containing the word "description" (e.g. "Description of the customer staging model", "Enriched description from CRM") will be treated as undocumented. This is already tracked in #353 but the bug ships as-is. Since DbtModelDocumentationAssessor is Tier 1 (10% weight), this will cause incorrect scoring on real projects.

Suggestion: remove "description" from placeholder_texts or require it to be the entire description (exact match).


2. default-weights.yaml not updated — scoring math breaks for all repos

The 4 new assessors declare default_weight values totalling 26% (0.10 + 0.10 + 0.03 + 0.03). The existing default-weights.yaml sums to exactly 1.0 and does not include these new attributes.

For non-dbt repos, is_applicable() returns False and the assessors are skipped — but the weight file is the source of truth for scoring. For dbt repos, the assessors will run but since their attribute_id keys are absent from default-weights.yaml, scoring behavior depends on how the scorer handles unknown attributes.

Issue #354 tracks weight dilution, but the underlying weights file omission should be a blocker. At minimum, the four new keys need to be added to default-weights.yaml with clear commentary explaining the conditional-applicability design.


3. _find_yaml_files replaces all yml occurrences — fragile pattern mutation

File: src/agentready/assessors/dbt.py:44

yaml_files = list(directory.rglob(pattern.replace("yml", "yaml")))

str.replace replaces all occurrences. A pattern like "yml_backup.yml" becomes "yaml_backup.yaml". More practically, this is already tracked in #356 as fragile yml/yaml matching, but the PR ships the pattern as-is. Recommend pattern[:-3] + "yaml" for a suffix-only replacement, or use two explicit rglob calls with hardcoded extensions.


Non-blocking Issues

4. New dbt test syntax not supported (data_tests: vs tests:)

File: src/agentready/assessors/dbt.py:496

dbt v1.8+ introduced data_tests: as the canonical key (replacing tests: at the column level). The DbtDataTestsAssessor only checks column.get("tests", []). Projects using dbt ≥1.8 syntax will score 0% on test coverage even with full test coverage. Worth tracking as a follow-up issue if not already.


5. tmp_path fixture parameter accepted but unused

File: tests/unit/test_assessors_dbt.py:24,41,59,77,96

All module-scoped fixtures accept tmp_path as a parameter but use hardcoded fixture directory paths. The tmp_path parameter is unused. Clean this up to avoid confusion:

# Current (misleading)
def minimal_valid_repo(tmp_path):

# Better
def minimal_valid_repo():

6. Attribute property re-instantiates on every call

File: src/agentready/assessors/dbt.py:84-93 (and similarly in all 4 assessors)

@property
def attribute(self) -> Attribute:
    return Attribute(...)  # New object on every access

This matches the pattern used in other assessors (e.g. DbtProjectConfigAssessor), so it's consistent. However, since attribute is accessed multiple times per assessment (in Finding.create_* calls and evidence building), consider caching with functools.cached_property. Non-blocking since existing assessors do the same.


Positive Notes

  • yaml.safe_load used throughout (not yaml.load) — correct security posture
  • is_applicable() gating is correctly implemented on all 4 assessors via _is_dbt_project()
  • Proportional scoring via calculate_proportional_score() follows the established pattern
  • Rich remediation with steps, tools, commands, examples, and citations — excellent for AI-assisted remediation
  • Graceful degradation on missing directories and malformed YAML
  • Finding.not_applicable() used correctly when models/ exists but has no SQL files
  • Test coverage for partial scoring scenarios (50% coverage → 62.5 score assertions)

Summary

Issue Severity Related
placeholder_texts includes "description" Blocking #353
default-weights.yaml not updated Blocking #354
Fragile ymlyaml pattern replacement Blocking #356
data_tests: dbt v1.8+ syntax not supported Minor
Unused tmp_path in test fixtures Nit
attribute property re-instantiates on each access Nit

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants