Skip to content

Introduce DFA-based Uniqueness Checking#53

Open
paganma wants to merge 83 commits intomainfrom
marco/uniqueness-dataflow-analysis
Open

Introduce DFA-based Uniqueness Checking#53
paganma wants to merge 83 commits intomainfrom
marco/uniqueness-dataflow-analysis

Conversation

@paganma
Copy link
Collaborator

@paganma paganma commented Mar 4, 2026

This PR introduces a DFA-based uniqueness checking system providing a more precise and flexible implementation compared to the original recursive traversal approach of the FIR AST. The new implementation operates on the Control Flow Graph (CFG) provided by the Kotlin compiler's kotlin.fir.resolve.dfa.cfg module.

Architecture

The implementation consists of two main components:

Analyzer

  • DataFlowAnalyzer.kt: Is a generic, worklist-based dataflow analysis facility for CFG nodes providing:

    • Ability to handle arbitrary flow objects.
    • Options for forward/backward flow direction.
    • Abstract join operation at control flow merge points.
    • Abstract transfer function for propagating dataflow facts.
    • A FlowFacts result object containing flowBefore and flowAfter information for each CFG node.
  • UniquenessGraphAnalyzer.kt: Performs forward-flow analysis to infer uniqueness types at each CFG node. The analyzer:

    • Tracks path-to-path assignments.
    • Tracks paths passed as function arguments.
    • Updates uniqueness types based on consumption and borrowing.
    • Produces a uniqueness typing environment for each node (as a UniquenessTrie).

Checker

  • UniquenessGraphChecker.kt: Validates uniqueness invariants using the analysis results, currently checking:
    • Function calls don't leak borrowed, moved, partially-moved, or partially-shared paths.
    • Return and throw expressions don't leak invalid references (moved, or partially-moved).
    • Function argument types match parameter requirements.
    • Interdependent function arguments are handled correctly (e.g., passing the same borrowed value twice).

Supporting Constructs

Uniqueness Types

The implementation uses an algebraic datatype UniquenessType instead of passing around separate UniquenessLevel and BorrowLevel enums:

  • Moved: Represents the top type in the lattice (a consumed/invalidated value)
  • Active(UniqueLevel, BorrowLevel): Represents active values with:
    • UniqueLevel: Either Unique or Shared.
    • BorrowLevel: Either Consumed or Borrowed.

Paths

Paths represent logical units holding uniqueness information. Concretely they are a list of Fir symbols able to represent:

  • Locals: through a list of a single symbol.
  • Properties: through a list of more symbols representing nested property access

Path.kt provides also the means to extract paths from arbitrary expressions with:

  • receiverPath: Extracting paths from atomic receiver expressions.
  • valuePaths: Extracting potentially multiple paths from value expressions.

Default Uniqueness Resolver

UniquenessResolver.kt fetches default uniqueness types for symbols (properties, functions, and function parameters) based on their annotations.

Uniqueness Typing Environment

The uniqueness typing environment is represented by UniquenessTrie.kt: A trie-based data structure to store uniqueness information about paths. Each trie level corresponds to one path component. If the accessed path doesn't have an associated uniqueness type, the trie lazily computes the default value for that path via UniquenessResolver. The trie also supports a join operation to join an instance in-place with another; we use this operation to merge multiple UniquenessTries coming from separate in-flows using dataflow analysis.

Current Limitations

The implementation currently assumes a three-address code form where expressions are atomic (locals or fields). This means that the following features are not supported:

Nested Scopes and Shadowing

Variables shadowing other variables in inner scopes are treated as the same variable. Example:

var x: Any = a

if (false) {
    var x: Any = b  // Treated as the same 'x' as outer scope
}

This may lead to loss of uniqueness information in outer scopes.

Non-Atomic Expressions

Expressions containing paths must be atomic. Complex expressions like conditional returns are not yet supported:

return if (false) { a } else { b } 

Each function argument must be a single path; passing multiple possible paths via complex expressions is currently unsupported.

Testing

This implementation includes a set of reorganized test cases covering:

  • Borrowing, sharing, and consumption semantics.
  • Function argument passing.
  • Local and property assignments.
  • Return and throw expressions.

Copy link
Collaborator

@jesyspa jesyspa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start, some feedback on the Kotlin, and some on the expected test results.

@paganma paganma requested a review from jesyspa March 5, 2026 09:42
@ligee
Copy link
Collaborator

ligee commented Mar 5, 2026

It will be nice to cleanup/squash commits before merging. There are too many of them now.

Copy link
Collaborator

@ligee ligee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash some commits please, we don't need 70 commits in the history.

Copy link
Collaborator

@jesyspa jesyspa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I think we should get it merged. Not sure whether fixing the semantics of the tests is best done now or after the merge, since I think the structure is going to stay the same

@paganma
Copy link
Collaborator Author

paganma commented Mar 5, 2026

I will cleanup the commit history once I finish with the requested changes. Should I keep some intermediate commits or is squash-merging preferable?

@ligee
Copy link
Collaborator

ligee commented Mar 6, 2026

I will cleanup the commit history once I finish with the requested changes. Should I keep some intermediate commits or is squash-merging preferable?

I would prefer some logical grouping when needed. The PR of this size will probably benefit from some division. But reasonable amount, not 70. A single one is fine to me too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement dataflow analysis for tracking uniqueness information

3 participants