From c975eb7b4db47ae426143c8d3f02be392ec4960a Mon Sep 17 00:00:00 2001 From: Julian Geiger Date: Tue, 16 Sep 2025 09:24:16 +0200 Subject: [PATCH] Fix overview order; and autoformatting --- docs/news/posts/2025-09-05-orm.md | 80 ++++++++++++++----------------- 1 file changed, 36 insertions(+), 44 deletions(-) diff --git a/docs/news/posts/2025-09-05-orm.md b/docs/news/posts/2025-09-05-orm.md index 1bf2f5b..4d65f27 100644 --- a/docs/news/posts/2025-09-05-orm.md +++ b/docs/news/posts/2025-09-05-orm.md @@ -20,12 +20,12 @@ AiiDA's ORM follows a four-layer architecture that provides clean separation bet │ User interface │ ← Node (Python ORM class) │ (orm/nodes) │ ├─────────────────────┤ -│ Abstract interface │ ← BackendNode (Abstract base class) -│ (implementation) │ -├─────────────────────┤ │ ORM backend │ ← SqlaNode (SQLAlchemy implementation) │ (psql_dos/sqlite) │ ├─────────────────────┤ +│ Abstract interface │ ← BackendNode (Abstract base class) +│ (implementation) │ +├─────────────────────┤ │ Database models │ ← DBNode (SQLAlchemy models) │ (models) │ └─────────────────────┘ @@ -33,10 +33,10 @@ AiiDA's ORM follows a four-layer architecture that provides clean separation bet Each layer serves a distinct purpose: -* __User interface__ (`Node`): Provides a clean, Pythonic API that hides database complexity -* __Abstract interface__ (`BackendNode`): Defines contracts that all backend implementations must follow -* __ORM backend__ (`SqlaNode`): Backend connection to the different database systems -* __Database model__ (`DbNode`): Defines the actual table schemas and relationships using SQLAlchemy's declarative approach +- **User interface** (`Node`): Provides a clean, Pythonic API that hides database complexity +- **Abstract interface** (`BackendNode`): Defines contracts that all backend implementations must follow +- **ORM backend** (`SqlaNode`): Backend connection to the different database systems +- **Database model** (`DbNode`): Defines the actual table schemas and relationships using SQLAlchemy's declarative approach Importantly, you, the user, will typically not have to interact directly with database-specific code. Instead, you can work with the high-level `Node` class (and its child classes), while AiiDA automatically delegates database operations to the appropriate backend implementation. @@ -105,15 +105,15 @@ This declarative approach means that table schemas, constraints, relationships, Some of the key features of AiiDA's SQLAlchemy models, that can already be seen from the `DbNode` definition above, are: -1. __Use of JSON(B) for flexibility__: The use of PostgreSQL's `JSONB` (and SQLite's `JSON`) type for `attributes`, `extras`, and `repository_metadata` provides schema flexibility while maintaining query performance. - This is crucial for scientific computing where one can't predict what data users will want to store. - Traditional relational databases would require creating new columns for every new property, but JSON columns allow storing arbitrary structured data (given that it's JSON-serializable). -2. __UUID-based identity__: Each node has both an integer primary key (`id` in the table, `pk` in the Python API[^2]) for database efficiency and a UUID for global uniqueness and portability. - The integer `id` is user-friendly, fast for database joins and indexing, while the UUID allows nodes to be moved between different AiiDA installations without conflicts. -3. __Automatic timestamps__: Creation and modification times are automatically managed through SQLAlchemy's `default` and `onupdate` parameters. -4. __Indexing__: Important columns like `node_type`, `process_type`, and timestamps are indexed for query performance. - Without indexes, queries like "find all calculation nodes from last month" would be extremely slow. - The indexes make these common queries fast even with millions of nodes. +1. **Use of JSON(B) for flexibility**: The use of PostgreSQL's `JSONB` (and SQLite's `JSON`) type for `attributes`, `extras`, and `repository_metadata` provides schema flexibility while maintaining query performance. + This is crucial for scientific computing where one can't predict what data users will want to store. + Traditional relational databases would require creating new columns for every new property, but JSON columns allow storing arbitrary structured data (given that it's JSON-serializable). +2. **UUID-based identity**: Each node has both an integer primary key (`id` in the table, `pk` in the Python API[^2]) for database efficiency and a UUID for global uniqueness and portability. + The integer `id` is user-friendly, fast for database joins and indexing, while the UUID allows nodes to be moved between different AiiDA installations without conflicts. +3. **Automatic timestamps**: Creation and modification times are automatically managed through SQLAlchemy's `default` and `onupdate` parameters. +4. **Indexing**: Important columns like `node_type`, `process_type`, and timestamps are indexed for query performance. + Without indexes, queries like "find all calculation nodes from last month" would be extremely slow. + The indexes make these common queries fast even with millions of nodes. ## The abstract interface: `BackendNode` @@ -148,7 +148,7 @@ class BackendNode(BackendEntity, BackendEntityExtrasMixin, metaclass=abc.ABCMeta This abstract base class (ABC) ensures that regardless of which ORM backend is used, the same interface is available to higher-level code (currently, AiiDA uses only SQLAlchemy, but also Django ORM was supported in the past[^3]). It defines essential properties like `uuid`, `node_type`, `process_type`, and methods for managing attributes, links, and storage operations. -The abstract class serves as a __contract__ —any backend implementation must provide all these methods and properties. +The abstract class serves as a **contract** —any backend implementation must provide all these methods and properties. ## The ORM backend: `SqlaNode` @@ -259,7 +259,7 @@ In [5]: n.backend_entity.model Out[5]: ``` -__Namespacing__ +**Namespacing** AiiDA further uses a namespace pattern on the `Node` class to organize functionality: @@ -298,10 +298,10 @@ class NodeBase: This namespace pattern was introduced because having all properties and methods directly on the `Node` class (as was the case in the past) made the API cluttered and prone to name conflicts. The namespace approach further groups related functionality together, e.g.: + - `node.base.attributes -> NodeAttributes` - attribute management - `node.base.repository -> NodeRepository` - file repository operations -- `node.base.links -> NodeLinks` - provenance links -and the use of `@cached_property` ensures that these namespace objects are created only once per node instance[^5]. +- `node.base.links -> NodeLinks` - provenance links and the use of `@cached_property` ensures that these namespace objects are created only once per node instance[^5]. ## Honorable mentions @@ -379,9 +379,10 @@ class Node(Entity['BackendNode', NodeCollection]): ``` The integration of `pydantic` brings various additional features and advantages: -- __Automatic validation__: Pydantic automatically validates data types and constraints based on type annotations -- __Serialization__: ORM objects can be automatically converted to/from `JSON` for export/import -- __API integration__: Pydantic's automatic serialization and validation capability allows for easy creation standard APIs (e.g., [aiida-restapi](https://github.com/aiidateam/aiida-restapi)) and integration with frameworks like FastAPI + +- **Automatic validation**: Pydantic automatically validates data types and constraints based on type annotations +- **Serialization**: ORM objects can be automatically converted to/from `JSON` for export/import +- **API integration**: Pydantic's automatic serialization and validation capability allows for easy creation standard APIs (e.g., [aiida-restapi](https://github.com/aiidateam/aiida-restapi)) and integration with frameworks like FastAPI ### The QueryBuilder @@ -438,23 +439,21 @@ Finally, they also provide essential validation logic, such as preventing deleti AiiDA's ORM architecture provides the foundation for all derived data types, for both, data and processes. More specialized types like `orm.Int`, `orm.CalcJobNode`, and `orm.SinglefileData` all build on `Node` and the functionality it provides. The implementation makes use of several important design principles: -1. __Multi-layer abstraction__: The multi-layer design achieves a clean separation and allows for a uniform API, while leveraging backend-specific optimizations. -1. __Multi-database support__: The architecture supports both PostgreSQL and SQLite backends through SQLAlchemy's dialect system. -1. __Use of `JSON` columns__: JSON(B) columns provide schema flexibility without sacrificing query performance. -1. __Namespace organization__: The namespace pattern on the user-facing `Node` class keeps the API clean by structuring it in distinct, nested categories. -1. __Modern integration__: Pydantic model integration brings modern Python data validation and serialization capabilities to the ORM. -1. __Universal query interface__: The `QueryBuilder` allows for identical query syntax across database backends. -1. __Collection patterns__: The collection system provides a consistent, intuitive interface for data access that complements the `QueryBuilder`. +1. **Multi-layer abstraction**: The multi-layer design achieves a clean separation and allows for a uniform API, while leveraging backend-specific optimizations. +1. **Multi-database support**: The architecture supports both PostgreSQL and SQLite backends through SQLAlchemy's dialect system. +1. **Use of `JSON` columns**: JSON(B) columns provide schema flexibility without sacrificing query performance. +1. **Namespace organization**: The namespace pattern on the user-facing `Node` class keeps the API clean by structuring it in distinct, nested categories. +1. **Modern integration**: Pydantic model integration brings modern Python data validation and serialization capabilities to the ORM. +1. **Universal query interface**: The `QueryBuilder` allows for identical query syntax across database backends. +1. **Collection patterns**: The collection system provides a consistent, intuitive interface for data access that complements the `QueryBuilder`. This architecture allows AiiDA to provide a powerful and flexible ORM that supports different database backends while maintaining a consistent user experience. -__Footnotes__ +**Footnotes** -[^1]: - Note that code snippets in this blog post can be simplified or slightly modified from the actual implementation (e.g., omitting type hints) for better readability or to highlight the specific aspects discussed in the text. +[^1]: Note that code snippets in this blog post can be simplified or slightly modified from the actual implementation (e.g., omitting type hints) for better readability or to highlight the specific aspects discussed in the text. -[^2]: - To avoid confusion with Python's `id()` function. +[^2]: To avoid confusion with Python's `id()` function. [^3]: In practice, the `BackendNode` abstract base class is currently only used as the parent class for `SqlaNode`, making this layer of abstraction appear redundant. @@ -464,13 +463,6 @@ __Footnotes__ It would also facilitate adding new backends in the future, however, the practical value of maintaining this abstraction versus the added complexity is debatable—it represents a classic case of YAGNI (You Aren't Gonna Need It). Disregarding the historical context, one could also go by without the `BackendNode` class. -[^4]: - For the SQLite backend, an `SqliteNode` is actually created, which, however, inherits from `SqlaNode`. - -[^5]: - Although we do understand that access via `node.base.` rather than direct `node.` can be counter-intuitive for new users. +[^4]: For the SQLite backend, an `SqliteNode` is actually created, which, however, inherits from `SqlaNode`. - - - - +[^5]: Although we do understand that access via `node.base.` rather than direct `node.` can be counter-intuitive for new users.