Skip to content

Commit c2e0d39

Browse files
committed
should be good until honorable mentions
1 parent d0e7ea2 commit c2e0d39

File tree

1 file changed

+98
-107
lines changed

1 file changed

+98
-107
lines changed

docs/news/posts/2025-09-05-orm.md

Lines changed: 98 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -8,46 +8,49 @@ date: 2025-09-05
88

99
# Understanding AiiDA's ORM architecture
1010

11-
AiiDA provides an Object-Relational Mapping (ORM) system that abstracts database operations while supporting multiple database backends.
12-
In this post, we'll explore how AiiDA leverages SQLAlchemy to create a flexible, multi-backend ORM to separate concerns between Python objects and database persistence.
11+
In this post, we'll take a deep dive into the implementation of AiiDA's Object-Relational Mapping (ORM) system.
12+
We'll explore how AiiDA leverages [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy) to create a flexible (ORM) backend that abstracts database operations and separates concerns between the user-facing Python objects and the underlying database persistence.
1313

1414
## Architecture overview: The multi-layer design
1515

1616
AiiDA's ORM follows a four-layer architecture that provides clean separation between the user interface, business logic, and data persistence:
1717

1818
```
1919
┌─────────────────────┐
20-
│ User Interface │ ← Node (Python ORM class)
20+
│ User interface │ ← Node (Python ORM class)
2121
│ (orm/nodes) │
2222
├─────────────────────┤
23-
│ Backend Interface │ ← BackendNode (Abstract base class)
23+
│ Backend interface │ ← BackendNode (Abstract base class)
2424
│ (implementation) │
2525
├─────────────────────┤
26-
│ Database Backends │ ← SqlaNode (SQLAlchemy implementation)
26+
│ Database backends │ ← SqlaNode (SQLAlchemy implementation)
2727
│ (psql_dos/sqlite) │
2828
├─────────────────────┤
29-
│ Database Layer │ ← DBNode (SQLAlchemy models)
29+
│ Database layer │ ← DBNode (SQLAlchemy models)
3030
│ (models) │
3131
└─────────────────────┘
3232
```
3333

34-
This design allows AiiDA to support multiple database backends (currently PostgreSQL and SQLite) while providing a unified Python interface for users.
35-
Importantly, users never have to interact directly with database-specific code.
36-
Instead, they work with the high-level `Node` class (and its child classes), which automatically delegates operations to the appropriate backend implementation.
34+
Each layer serves a distinct purposes:
3735

38-
The three layers serve distinct purposes:
36+
* __User interface layer__: Provides a clean, Pythonic API that hides database complexity
37+
* __Backend interface layer__: Defines contracts that all database implementations must follow
38+
* __Database implementation layer__: Handles the specifics of different database systems
39+
* __Database model layer__: Defines the actual table schemas and relationships using SQLAlchemy's declarative approach
3940

40-
* __User Interface Layer__: Provides a clean, Pythonic API that hides database complexity
41-
* __Backend Interface Layer__: Defines contracts that all database implementations must follow
42-
* __Database Implementation Layer__: Handles the specifics of different database systems
41+
Importantly, you, the user never have to interact directly with database-specific code.
42+
Instead, you can work with the high-level `Node` class (and its derived classes), while AiiDA automatically delegates database operations to the appropriate backend implementation.
43+
This design further allows AiiDA to support multiple database backends (currently PostgreSQL and SQLite), while providing a unified Python interface for users.
4344

44-
So let's start from the bottom, shall we?
45+
Let's start from the bottom and work our way up, shall we?
4546

4647
## The database: `DbNode`
4748

4849
AiiDA uses SQLAlchemy's declarative approach to define database tables.
4950
This means the database schema is defined using Python classes rather than raw SQL.
50-
Here's how the core `DbNode` model is structured:
51+
Here's how the core
52+
[`DbNode`](https://github.com/aiidateam/aiida-core/blob/313f342f5d28eeba5967fec8196ed6fce393a77a/src/aiida/storage/psql_dos/models/node.py#L22)
53+
model is constructed:
5154

5255
```python
5356
from sqlalchemy.dialects.postgresql import JSONB, UUID
@@ -88,30 +91,34 @@ class DbNode(Base):
8891
user = relationship('DbUser', backref='dbnodes')
8992
```
9093

91-
<!-- NOTE: -->
92-
<!-- Where base is derived from SQLAlchemy's `declarative_base`. -->
94+
The `Base` class from which `DbNode` inherits is derived from SQLAlchemy's `declarative_base()`, which provides the metaclass and base functionality that allows Python classes to be automatically mapped to database tables.
9395

94-
__The key features of AiiDA's SQLAlchemy models:__
96+
```python
97+
Base = declarative_base(
98+
cls=Model,
99+
name='Model',
100+
metadata=MetaData(naming_convention=dict(naming_convention))
101+
)
102+
```
103+
This declarative approach means that table schemas, constraints, relationships, and indexes are all defined in Python code rather than separate SQL files, making the database structure self-documenting and version-controllable alongside the application logic.
95104

96-
1. **JSON(B) for Flexibility**: The use of PostgreSQL's `JSONB` type (`JSON` for SQLite) for `attributes`, `extras`, and `repository_metadata` provides schema flexibility while maintaining query performance.
97-
This is crucial for scientific computing where you can't predict what data users will want to store.
98-
Traditional relational databases would require creating new columns for every new property, but JSON columns let you store arbitrary structured data.
105+
__Key features of AiiDA's SQLAlchemy models:__
99106

100-
2. **UUID-based Identity**: Each node has both an integer primary key (`id` in the table, `pk` in the Python API) for database efficiency and a UUID for global uniqueness and portability.
107+
1. **Use of JSON(B) for flexibility**: The use of PostgreSQL's `JSONB` (and SQLite's `JSON`) type for `attributes`, `extras`, and `repository_metadata` provides schema flexibility while maintaining query performance.
108+
This is crucial for scientific computing where one can't predict what data users will want to store.
109+
Traditional relational databases would require creating new columns for every new property, but JSON columns let you store arbitrary structured data (as long as it's JSON-serializable).
110+
2. **UUID-based Identity**: Each node has both an integer primary key (`id` in the table, `pk` in the Python API[^1]) for database efficiency and a UUID for global uniqueness and portability.
101111
The integer `id` is user-friendly, fast for database joins and indexing, while the UUID allows nodes to be moved between different AiiDA installations without conflicts.
102-
103112
3. **Automatic Timestamps**: Creation and modification times are automatically managed through SQLAlchemy's `default` and `onupdate` parameters.
104-
105-
<!-- TODO: Need to understand this better myself -->
106113
4. **Strategic Indexing**: Important columns like `node_type`, `process_type`, and timestamps are indexed for query performance.
107114
Without indexes, queries like "find all calculation nodes from last month" would be extremely slow.
108115
The indexes make these common queries fast even with millions of nodes.
109116

110-
<!-- TODO: Check again how SqlaNode and Node interact -->
111-
112117
## The contract: `BackendNode`
113118

114-
Above the database layer sits the abstract `BackendNode` class, which defines the interface contract that all database backend implementations must follow:
119+
Above the database layer sits the abstract
120+
[`BackendNode`](https://github.com/aiidateam/aiida-core/blob/313f342f5d28eeba5967fec8196ed6fce393a77a/src/aiida/orm/implementation/nodes.py#L27)
121+
class, which defines the interface contract that all database backend implementations must follow:
115122

116123
```python
117124
class BackendNode(BackendEntity, BackendEntityExtrasMixin, metaclass=abc.ABCMeta):
@@ -142,12 +149,12 @@ It defines essential properties like `uuid`, `node_type`, `process_type`, and me
142149

143150
The abstract class serves as a __contract__ - any backend implementation must provide all these methods and properties.
144151
This guarantees that switching between PostgreSQL and SQLite backends won't break user code, since both implementations satisfy the same interface.
145-
You can think of it like a universal power adapter: regardless of whether you're plugging into a European, US, or UK outlet, your device gets the same voltage and current.
146-
The ``BackendNode`` abstract base class provides that standardization for database operations.
147152

148153
## The bridge: `SqlaNode`
149154

150-
The `SqlaNode` class bridges the abstract `BackendNode` interface with the concrete SQLAlchemy models:
155+
The
156+
[`SqlaNode`](https://github.com/aiidateam/aiida-core/blob/313f342f5d28eeba5967fec8196ed6fce393a77a/src/aiida/storage/psql_dos/orm/nodes.py#L30)
157+
class bridges the abstract `BackendNode` interface with the concrete SQLAlchemy models:
151158

152159
```python
153160
class SqlaNode(entities.SqlaModelEntity[models.DbNode], ExtrasMixin, BackendNode):
@@ -173,60 +180,10 @@ class SqlaNode(entities.SqlaModelEntity[models.DbNode], ExtrasMixin, BackendNode
173180
arguments['dbcomputer'] = computer.bare_model
174181

175182
self._model = sqla_utils.ModelWrapper(self.MODEL_CLASS(**arguments), backend)
176-
```
177-
178-
This implementation class handles:
179-
180-
- **Model Wrapping**: Encapsulates SQLAlchemy model instances
181-
- **Type Safety**: Ensures proper types for related objects (users, computers)
182-
- **Backend Abstraction**: Provides a consistent interface regardless of database backend
183-
184-
__The `SqlaModelEntity`__
185-
186-
The `SqlaModelEntity` class provides the common foundation for all SQLAlchemy-based backend entities:
187-
188-
```python
189-
class SqlaModelEntity(Generic[ModelType]):
190-
"""A mixin that adds some common SQLA backend entity methods"""
191-
192-
@classmethod
193-
def from_dbmodel(cls, dbmodel, backend):
194-
"""Create an AiiDA Entity from the corresponding SQLA ORM model"""
195-
entity = cls.__new__(cls)
196-
super(SqlaModelEntity, entity).__init__(backend)
197-
entity._model = utils.ModelWrapper(dbmodel, backend)
198-
return entity
199-
200-
@property
201-
def model(self) -> utils.ModelWrapper:
202-
"""Return an ORM model that correctly updates and flushes data"""
203-
return self._model
204-
205-
@property
206-
def bare_model(self):
207-
"""Return the underlying SQLA ORM model for direct access"""
208-
return self.model._model
209-
```
210-
211-
__The `ModelWrapper`__
212-
213-
The `ModelWrapper` is another crucial component that sits between AiiDA's backend entities and raw SQLAlchemy models.
214-
It handles automatic session management, ensuring that changes are properly tracked and committed to the database.
215-
The `bare_model` property provides direct access to the SQLAlchemy model when you need to bypass AiiDA's automatic management.
216-
217-
This design provides:
218-
219-
- __Model Wrapping__: The `ModelWrapper` encapsulates SQLAlchemy model instances with automatic session management
220-
- __Type Safety__: Generic typing ensures proper model relationships
221-
- __Lazy Loading__: Entity creation from database models without immediate session binding
222-
- __Direct Access__: The `bare_model` property allows bypassing AiiDA's update/flush mechanisms when needed
223183

224-
__Storing data in the DB__
184+
...
225185

226-
Actual data storage in the SQL db is achieved through the `store` method of the `SqlaNode`:
227-
228-
```python
229-
def store(self, links=None, clean=True):
186+
def store(self, links=None, clean=True):
230187
session = self.backend.get_session()
231188

232189
if clean:
@@ -250,13 +207,23 @@ def store(self, links=None, clean=True):
250207
return self
251208
```
252209

253-
This approach ensures data consistency while supporting nested transactions for complex operations.
210+
The `SqlaNode` class wraps a raw `DbNode` SQLAlchemy model in a
211+
[`ModelWrapper`](https://github.com/aiidateam/aiida-core/blob/313f342f5d28eeba5967fec8196ed6fce393a77a/src/aiida/storage/psql_dos/orm/utils.py#L27)
212+
that handles session management automatically. This design provides several key capabilities:
213+
214+
* __Model wrapping__: Encapsulates SQLAlchemy model instances with automatic session tracking
215+
* __Type safety__: Generic typing ensures proper relationships between related objects (users, computers)
216+
* __Backend abstraction__: Provides a consistent interface regardless of database backend
217+
* __Direct access__: The `bare_model` property allows bypassing AiiDA's automatic management when needed
254218

255-
!!! info ""
219+
The key insight is that `SqlaNode` provides two levels of access:
220+
221+
* `node.model`: The wrapped model with automatic session tracking
222+
* `node.bare_model`: Direct access to the raw SQLAlchemy mode
256223

257224
## The user interface: `Node`
258225

259-
At the top level, users interact with the `Node` class, which provides a Pythonic interface:
226+
At the top level, users interact with the `Node` class, which uses compososition to contain an `SqlaNode` instance, and provides a Pythonic interface to users:
260227

261228
```python
262229
class Node(Entity['BackendNode', NodeCollection], metaclass=AbstractNodeMeta):
@@ -275,22 +242,38 @@ class Node(Entity['BackendNode', NodeCollection], metaclass=AbstractNodeMeta):
275242
super().__init__(backend_entity)
276243
```
277244

278-
Here, AiiDA uses _composition_ instead of direct _inheritance_ - the `Node` class _contains_ a `BackendNode` instance rather than being one.
279-
The creation magic happens in `backend.nodes.create()` which returns an SqlaNode instance as shown above (automatically selecting the correct backend).
245+
The creation magic happens in `backend.nodes.create()` which returns an `SqlaNode` instance[^2] as shown above (automatically selecting the correct backend).
246+
However, note that from the user's perspective, just a `Node` is created-the backend selection is transparent.
247+
248+
Here's how all of this looks like in action in a `verdi shell`:
280249

281-
<!-- TODO: add note that for psql sqlanode is created, for sqlite an sqlitenode -->
282-
However, note that from the user's perspective, just a `Node` is created.
283-
The backend selection is transparent.
250+
```python
251+
In [1]: n = Int(1).store()
252+
253+
In [2]: n.backend_entity
254+
Out[2]: <aiida.storage.psql_dos.orm.nodes.SqlaNode at 0x7d8ac6d53880>
255+
256+
In [3]: n.backend_entity.model
257+
Out[3]: <aiida.storage.psql_dos.orm.utils.ModelWrapper at 0x7d8ac6c6fcd0>
258+
259+
In [4]: n.backend_entity.bare_model
260+
Out[4]: <DbNode id=103522, uuid=UUID('70cd..., node_type='data.core..., process_type=None, label='', description='', ctime=datetime.d..., mtime=datetime.d..., attributes={'value': ..., extras={'_aiida_h..., repository_metadata={}, dbcomputer_id=None, user_id=1,>
261+
```
284262

285263
__Namespacing__
286264

287265
AiiDA uses a namespace pattern on the `Node` class to organize functionality:
288266

289267
```python
290-
@cached_property
291-
def base(self) -> NodeBase:
292-
"""Return the node base namespace."""
293-
return NodeBase(self)
268+
class Node(Entity['BackendNode', NodeCollection], metaclass=AbstractNodeMeta):
269+
"""Base class for all nodes in AiiDA."""
270+
271+
...
272+
273+
@cached_property
274+
def base(self) -> NodeBase:
275+
"""Return the node base namespace."""
276+
return NodeBase(self)
294277

295278
class NodeBase:
296279
"""A namespace for node related functionality."""
@@ -311,19 +294,14 @@ class NodeBase:
311294
return NodeLinks(self._node)
312295
```
313296

314-
This namespace pattern was introduced because having all properties and methods directly on the `Node` class became overwhelming and to prevent name conflicts.
315-
With dozens of methods for different functionalities (attributes, repository, links, caching, comments), the API became cluttered.
316-
317-
The namespace approach groups related functionality together:
318-
- node.base.attributes - attribute management
319-
- node.base.repository - file repository operations
320-
- node.base.links - provenance links
297+
This namespace pattern was introduced because having all properties and methods directly on the `Node` class became overwhelming, making the API cluttered, and prone to name conflicts.
321298

322-
This makes the API more discoverable and prevents method name conflicts.
323-
It also uses @cached_property to ensure these namespace objects are created only once per node instance.
299+
The namespace approach further groups related functionality together:
300+
- `node.base.attributes` `NodeAttributes` - attribute management
301+
- `node.base.repository -> NodeRepository` - file repository operations
302+
- `node.base.links -> NodeLinks` - provenance links
324303

325-
<!-- TODO: add collection pattern??? -->
326-
<!-- NOTE: how is collection pattern different from /complements QB? -->
304+
and the use of `@cached_property` ensures that these namespace objects are created only once per node instance.[^3]
327305

328306
## Honorable mentions
329307

@@ -355,9 +333,10 @@ The `ondelete='CASCADE'` on the output relationship ensures that when a node is
355333

356334
### The QueryBuilder
357335

358-
The `QueryBuilder` is AiiDA's main API provided to retrieve data from the database.
336+
The `QueryBuilder` is AiiDA's main Python API to retrieve data from the database.
359337
It provides a uniform, backend-agnostic interface:
360338

339+
<!-- TODO: change example -->
361340
```python
362341
# Simple node query
363342
qb = QueryBuilder()
@@ -477,8 +456,20 @@ AiiDA's ORM architecture makes use of several important design principles:
477456

478457
This architecture allows AiiDA to provide a powerful, flexible ORM that adapts to different database backends while maintaining a consistent user experience.
479458

459+
460+
__Footnotes__
461+
[^1]:
462+
To avoid confusion with Python's `id()` function.
463+
[^2]:
464+
For the SQLite backend, an `SqliteNode` is actually created, which, however, inherits from `SqlaNode`.
465+
[^3]:
466+
Although we do understand that the `node.base` approach can be non-intuitive for first time users.
467+
468+
480469
<!-- TODO: -->
481470
<!-- Add statement that the whole infrastructure also had to be implemented for the other, specialized data types of AiiDA -->
482471
<!-- Add some admonititions? -->
483472
<!-- add gh permalinks -->
484473

474+
<!-- NOTE: how is collection pattern different from /complements QB? -->
475+

0 commit comments

Comments
 (0)