Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 36 additions & 44 deletions docs/news/posts/2025-09-05-orm.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,23 @@ AiiDA's ORM follows a four-layer architecture that provides clean separation bet
│ User interface │ ← Node (Python ORM class)
│ (orm/nodes) │
├─────────────────────┤
│ Abstract interface │ ← BackendNode (Abstract base class)
│ (implementation) │
├─────────────────────┤
│ ORM backend │ ← SqlaNode (SQLAlchemy implementation)
│ (psql_dos/sqlite) │
├─────────────────────┤
│ Abstract interface │ ← BackendNode (Abstract base class)
│ (implementation) │
├─────────────────────┤
│ Database models │ ← DBNode (SQLAlchemy models)
│ (models) │
└─────────────────────┘
```

Each layer serves a distinct purpose:

* __User interface__ (`Node`): Provides a clean, Pythonic API that hides database complexity
* __Abstract interface__ (`BackendNode`): Defines contracts that all backend implementations must follow
* __ORM backend__ (`SqlaNode`): Backend connection to the different database systems
* __Database model__ (`DbNode`): Defines the actual table schemas and relationships using SQLAlchemy's declarative approach
- **User interface** (`Node`): Provides a clean, Pythonic API that hides database complexity
- **Abstract interface** (`BackendNode`): Defines contracts that all backend implementations must follow
- **ORM backend** (`SqlaNode`): Backend connection to the different database systems
- **Database model** (`DbNode`): Defines the actual table schemas and relationships using SQLAlchemy's declarative approach

Importantly, you, the user, will typically not have to interact directly with database-specific code.
Instead, you can work with the high-level `Node` class (and its child classes), while AiiDA automatically delegates database operations to the appropriate backend implementation.
Expand Down Expand Up @@ -105,15 +105,15 @@ This declarative approach means that table schemas, constraints, relationships,

Some of the key features of AiiDA's SQLAlchemy models, that can already be seen from the `DbNode` definition above, are:

1. __Use of JSON(B) for flexibility__: The use of PostgreSQL's `JSONB` (and SQLite's `JSON`) type for `attributes`, `extras`, and `repository_metadata` provides schema flexibility while maintaining query performance.
This is crucial for scientific computing where one can't predict what data users will want to store.
Traditional relational databases would require creating new columns for every new property, but JSON columns allow storing arbitrary structured data (given that it's JSON-serializable).
2. __UUID-based identity__: Each node has both an integer primary key (`id` in the table, `pk` in the Python API[^2]) for database efficiency and a UUID for global uniqueness and portability.
The integer `id` is user-friendly, fast for database joins and indexing, while the UUID allows nodes to be moved between different AiiDA installations without conflicts.
3. __Automatic timestamps__: Creation and modification times are automatically managed through SQLAlchemy's `default` and `onupdate` parameters.
4. __Indexing__: Important columns like `node_type`, `process_type`, and timestamps are indexed for query performance.
Without indexes, queries like "find all calculation nodes from last month" would be extremely slow.
The indexes make these common queries fast even with millions of nodes.
1. **Use of JSON(B) for flexibility**: The use of PostgreSQL's `JSONB` (and SQLite's `JSON`) type for `attributes`, `extras`, and `repository_metadata` provides schema flexibility while maintaining query performance.
This is crucial for scientific computing where one can't predict what data users will want to store.
Traditional relational databases would require creating new columns for every new property, but JSON columns allow storing arbitrary structured data (given that it's JSON-serializable).
2. **UUID-based identity**: Each node has both an integer primary key (`id` in the table, `pk` in the Python API[^2]) for database efficiency and a UUID for global uniqueness and portability.
The integer `id` is user-friendly, fast for database joins and indexing, while the UUID allows nodes to be moved between different AiiDA installations without conflicts.
3. **Automatic timestamps**: Creation and modification times are automatically managed through SQLAlchemy's `default` and `onupdate` parameters.
4. **Indexing**: Important columns like `node_type`, `process_type`, and timestamps are indexed for query performance.
Without indexes, queries like "find all calculation nodes from last month" would be extremely slow.
The indexes make these common queries fast even with millions of nodes.

## The abstract interface: `BackendNode`

Expand Down Expand Up @@ -148,7 +148,7 @@ class BackendNode(BackendEntity, BackendEntityExtrasMixin, metaclass=abc.ABCMeta
This abstract base class (ABC) ensures that regardless of which ORM backend is used, the same interface is available to higher-level code (currently, AiiDA uses only SQLAlchemy, but also Django ORM was supported in the past[^3]).
It defines essential properties like `uuid`, `node_type`, `process_type`, and methods for managing attributes, links, and storage operations.

The abstract class serves as a __contract__ —any backend implementation must provide all these methods and properties.
The abstract class serves as a **contract** —any backend implementation must provide all these methods and properties.

## The ORM backend: `SqlaNode`

Expand Down Expand Up @@ -259,7 +259,7 @@ In [5]: n.backend_entity.model
Out[5]: <aiida.storage.psql_dos.orm.utils.ModelWrapper at 0x7d8ac6c6fcd0>
```

__Namespacing__
**Namespacing**

AiiDA further uses a namespace pattern on the `Node` class to organize functionality:

Expand Down Expand Up @@ -298,10 +298,10 @@ class NodeBase:
This namespace pattern was introduced because having all properties and methods directly on the `Node` class (as was the case in the past) made the API cluttered and prone to name conflicts.

The namespace approach further groups related functionality together, e.g.:

- `node.base.attributes -> NodeAttributes` - attribute management
- `node.base.repository -> NodeRepository` - file repository operations
- `node.base.links -> NodeLinks` - provenance links
and the use of `@cached_property` ensures that these namespace objects are created only once per node instance[^5].
- `node.base.links -> NodeLinks` - provenance links and the use of `@cached_property` ensures that these namespace objects are created only once per node instance[^5].

## Honorable mentions

Expand Down Expand Up @@ -379,9 +379,10 @@ class Node(Entity['BackendNode', NodeCollection]):
```

The integration of `pydantic` brings various additional features and advantages:
- __Automatic validation__: Pydantic automatically validates data types and constraints based on type annotations
- __Serialization__: ORM objects can be automatically converted to/from `JSON` for export/import
- __API integration__: Pydantic's automatic serialization and validation capability allows for easy creation standard APIs (e.g., [aiida-restapi](https://github.com/aiidateam/aiida-restapi)) and integration with frameworks like FastAPI

- **Automatic validation**: Pydantic automatically validates data types and constraints based on type annotations
- **Serialization**: ORM objects can be automatically converted to/from `JSON` for export/import
- **API integration**: Pydantic's automatic serialization and validation capability allows for easy creation standard APIs (e.g., [aiida-restapi](https://github.com/aiidateam/aiida-restapi)) and integration with frameworks like FastAPI

### The QueryBuilder

Expand Down Expand Up @@ -438,23 +439,21 @@ Finally, they also provide essential validation logic, such as preventing deleti
AiiDA's ORM architecture provides the foundation for all derived data types, for both, data and processes. More specialized types like `orm.Int`, `orm.CalcJobNode`, and `orm.SinglefileData` all build on `Node` and the functionality it provides.
The implementation makes use of several important design principles:

1. __Multi-layer abstraction__: The multi-layer design achieves a clean separation and allows for a uniform API, while leveraging backend-specific optimizations.
1. __Multi-database support__: The architecture supports both PostgreSQL and SQLite backends through SQLAlchemy's dialect system.
1. __Use of `JSON` columns__: JSON(B) columns provide schema flexibility without sacrificing query performance.
1. __Namespace organization__: The namespace pattern on the user-facing `Node` class keeps the API clean by structuring it in distinct, nested categories.
1. __Modern integration__: Pydantic model integration brings modern Python data validation and serialization capabilities to the ORM.
1. __Universal query interface__: The `QueryBuilder` allows for identical query syntax across database backends.
1. __Collection patterns__: The collection system provides a consistent, intuitive interface for data access that complements the `QueryBuilder`.
1. **Multi-layer abstraction**: The multi-layer design achieves a clean separation and allows for a uniform API, while leveraging backend-specific optimizations.
1. **Multi-database support**: The architecture supports both PostgreSQL and SQLite backends through SQLAlchemy's dialect system.
1. **Use of `JSON` columns**: JSON(B) columns provide schema flexibility without sacrificing query performance.
1. **Namespace organization**: The namespace pattern on the user-facing `Node` class keeps the API clean by structuring it in distinct, nested categories.
1. **Modern integration**: Pydantic model integration brings modern Python data validation and serialization capabilities to the ORM.
1. **Universal query interface**: The `QueryBuilder` allows for identical query syntax across database backends.
1. **Collection patterns**: The collection system provides a consistent, intuitive interface for data access that complements the `QueryBuilder`.

This architecture allows AiiDA to provide a powerful and flexible ORM that supports different database backends while maintaining a consistent user experience.

__Footnotes__
**Footnotes**

[^1]:
Note that code snippets in this blog post can be simplified or slightly modified from the actual implementation (e.g., omitting type hints) for better readability or to highlight the specific aspects discussed in the text.
[^1]: Note that code snippets in this blog post can be simplified or slightly modified from the actual implementation (e.g., omitting type hints) for better readability or to highlight the specific aspects discussed in the text.

[^2]:
To avoid confusion with Python's `id()` function.
[^2]: To avoid confusion with Python's `id()` function.

[^3]:
In practice, the `BackendNode` abstract base class is currently only used as the parent class for `SqlaNode`, making this layer of abstraction appear redundant.
Expand All @@ -464,13 +463,6 @@ __Footnotes__
It would also facilitate adding new backends in the future, however, the practical value of maintaining this abstraction versus the added complexity is debatable—it represents a classic case of YAGNI (You Aren't Gonna Need It).
Disregarding the historical context, one could also go by without the `BackendNode` class.

[^4]:
For the SQLite backend, an `SqliteNode` is actually created, which, however, inherits from `SqlaNode`.

[^5]:
Although we do understand that access via `node.base.` rather than direct `node.` can be counter-intuitive for new users.
[^4]: For the SQLite backend, an `SqliteNode` is actually created, which, however, inherits from `SqlaNode`.

<!-- TODO: -->
<!-- Add statement that the whole infrastructure also had to be implemented for the other, specialized data types of AiiDA -->
<!-- Add some admonititions? -->
<!-- add gh permalinks -->
[^5]: Although we do understand that access via `node.base.` rather than direct `node.` can be counter-intuitive for new users.