Skip to content

Latest commit

 

History

History
138 lines (108 loc) · 7.69 KB

File metadata and controls

138 lines (108 loc) · 7.69 KB

Influences & Citations

All dependencies and inspirations are MIT or BSD-2 licensed. adit-code is MIT licensed. It does not reimplement any proprietary metric.

Academic

  • Markus Borg et al., "Code for Machines, Not Just Humans," FORGE/ICSE 2026. Found CodeHealth predicts AI refactoring success (30%+ higher defect risk in unhealthy code). Adit builds on this finding by measuring structural properties they didn't test: cross-file navigation cost, co-location, grep ambiguity. Adit does NOT reimplement CodeHealth (which is proprietary to CodeScene).

  • Wong et al., "Static and dynamic distance metrics for feature-based code analysis," Journal of Systems and Software, 2005. Feature dispersion metrics (distance, disparity, concentration, dedication). Closest academic precedent for co-location scoring, but operates at feature level, not definition-usage level.

  • Wong, Abe, De Benedictis, Halim, Peruma, "Identifier Name Similarities: An Exploratory Study," ESEM 2025. 7-category taxonomy of similar identifier names across Java projects. Closest academic precedent for grep noise scoring, but classification study, not tool.

  • Szalay et al., "Measuring Mangled Name Ambiguity in Large C/C++ Projects," CEUR-WS Vol-1938. Measured symbol name ambiguity at linker level. Adjacent concept applied to compiled symbols rather than source identifiers.

  • Thomas McCabe, "A Complexity Measure," IEEE TSE 1976. Cyclomatic complexity — foundational metric referenced in file size budget context.

  • Maurice Halstead, Elements of Software Science, 1977. Halstead's vocabulary metric (η = distinct operators + operands) is the closest precedent for adit's node diversity metric. Adit counts distinct AST node types rather than lexical tokens, but the intuition is the same: varied vocabulary = structural complexity independent of size.

  • "Structural Code Understanding with AST Entropy," 2025. Computes entropy over depth-bounded AST subtrees. Related to node diversity but uses subtree frequencies rather than node type counts.

  • Butler et al., "Exploring the Influence of Identifier Names on Code Quality," IEEE CSMR 2010. Statistically significant associations between flawed identifier names and code defects. Foundational work on naming quality, though focused on individual name quality, not cross-codebase uniqueness.

  • "Rethinking Code Complexity Through the Lens of Large Language Models," Feb 2026. Proposes LM-CC, a complexity metric based on LLM entropy. Found that after controlling for code length, classical complexity metrics show no consistent correlation with LLM performance. Supports file size as the primary driver over complexity.

  • "Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering," Jan 2026. First empirical analysis of token distribution in AI coding agents. Code review consumes 59.4% of tokens; input tokens are 53.9% of total.

  • "LocAgent: Graph-Guided LLM Agents for Code Localization," ACL 2025. Dependency graph-based localization achieved 92.7% file-level accuracy and reduced costs ~86%. Validates that code structure affects navigation cost.

  • "LoCoBench-Agent," Salesforce, Nov 2025. Found negative correlation between thorough exploration and efficiency. Agents that traverse more files spend more tokens.

  • Bishara & Hittner, "Testing the Significance of a Correlation with Nonnormal Data," 2012. Methodological reference for using Spearman rank correlation alongside Pearson on non-normal data (tool call counts are right-skewed count data).

Principles

  • Kent C. Dodds, "Colocation". Articulated co-location as a design principle for React. Adit quantifies it as a computable metric applicable to any language.

  • Carson Gross, "Locality of Behaviour". Formalized LoB: "The behaviour of a unit of code should be as obvious as possible by looking only at that unit of code." Adit measures compliance.

  • Kent Beck, Tidy First? (O'Reilly, 2023). Cohesion economics framework — keeping elements that change together close together, analyzed through discounted cash flows and optionality.

  • Robert C. Martin, Clean Code Ch. 2, "Use Searchable Names." The qualitative advice that adit's grep noise metric quantifies.

  • Jamie Wong, "The Grep Test" (2013). Described the principle that identifiers should be greppable.

Tools & People

All MIT or BSD-2 licensed. No code copied.

Complementary tools (not replaced)

  • ruff (MIT) — Python linter. Orthogonal: style vs structure.
  • import-linter + grimp (BSD-2) — architectural boundary enforcement. Complementary: adit finds what to move, import-linter enforces where.
  • ast-grep (MIT) — structural search/lint/rewrite. Independent.
  • Repomix (MIT) — context packing. Sequential: restructure with adit, then pack.
  • pre-commit (MIT) — hook framework. Integration target for adit enforce.
  • radon (MIT) — cyclomatic complexity and maintainability index for Python.
  • CodeScene CodeHealth (proprietary) — cited as prior art only. Adit measures different properties and does not reimplement CodeHealth's scoring methodology.

Prior Art

Searched 16+ tree-sitter-based tools including ast-grep, Semgrep, Probe, difftastic, Aider RepoMap, Sourcegraph SCIP, and rust-code-analysis. None measure co-location, grep noise, or AI-context-aware file size budgets.

Validation

Metrics validated against SWE-bench agent trajectories (1,840 files across 49 repos, nebius/SWE-rebench-openhands-trajectories). File size is the dominant predictor; structural metrics provide diagnostic value. See README.md for correlation data.