Skip to content

AiiDA 2.0 plugin migration guide

Sebastiaan Huber edited this page Feb 17, 2022 · 63 revisions

This page will contain a summary of the backwards incompatible changes going from v1.0 to v2.0 of aiida-core with, where applicable, more detailed guides on how to migrate existing plugin code or scripts.

verdi

Tab-completion

The library click, which is what verdi is built with, was upgraded. It now comes with tab-completion built-in, which means we could drop the additional dependency click-completion. The completion works the same, except that the string that should be put in the activation script to enable it is now shell-dependent. See the documentation to find out what string you should use for your shell. See this PR for more details.

verdi code setup

There is a small change in verdi code setup where the order of prompts has changed. If you have scripts that use the interactive mode for this command, they might start to fail, since the wrong values are passed for the wrong arguments. However, it is in general not advisable to use the interactive (prompting) mode for automated scripts. Please use the --non-interactive flag to ensure the command doesn't prompt and simply use the various parameter flags to specify the values, .e.g.:

verdi code setup --non-interactive -L label -D "description" .....

Entry points

The entry point system allows external packages to extend the functionality of aiida-core. This concept was formally introduced in v1.0 and since there have been unwritten guidelines and naming conventions for entry points. Particularly, entry points defined by a plugin package are encouraged to be prefixed with the name of the plugin package. For example, the entry points of aiida-quantumespresso all start with the prefix quantumespresso.. This ensures that entry points are properly namespaced and there is minimal risk that the entry points of different plugin packages overlap and therefore cannot be uniquely resolved, rendering them unusable.

To this day, however, aiida-core itself has not been respecting this guideline and provides many entry points that are not namespaced with core.. This not only causes many namespaces to essentially be blocked for use for any potential plugin packages, it also makes it unclear where certain entry points come from. Therefore, the decision was made to change the entry point in aiida-core in v2.0 and properly prefix them with core.. The change was implemented in PR #5073.

This change has been made largely backward compatible, by updating the various plugin factories (imported from the aiida.plugins module) with a special condition that detects the old entry point names. If detected, it emits a deprecation warning and then proceeds to actually load the new entry point. For example, the following code:

from aiida.plugins import DataFactory
Int = DataFactory('int')

will emit the following warning in v2.0:

In [1]: Int = DataFactory('int')
aiida/plugins/factories.py:40: AiidaDeprecationWarning: The entry point `int` is deprecated. Please replace it with `core.int`.

To get rid of the deprecation warning, simply update the entry point by prefixing it with core.:

from aiida.plugins import DataFactory
Int = DataFactory('core.int')

Note that entry point names are also used on the command line. For example, when creating a new computer, let's say the localhost configured with the DirectScheduler, this used to be done with

verdi computer setup -L localhost -T local -S direct

which should now become

verdi computer setup -L localhost -T core.local -S core.direct

The old entry points will continue to work for v2.0, but will also cause the deprecation warning to be printed since the CLI goes through the plugin factories to load the entry points behind the scenes.

Given that entry point names are also stored in the database in certain places (for example the node_type attribute of Data nodes, and the scheduler_type of Computer instances), the data of existing databases will be automatically migrated.

Note that the new entry points do not only apply when they are used as command arguments, but also if the entry point is itself a command, the full entry point name needs to be used. A good example are the subcommands of the verdi data command, which are themselves entry points. For example, what used to be:

verdi data bands list

has now become:

verdi data core.bands list

Remember that you can always use tab-completion to automatically discover the subcommands that are available.

Repository

The file repository arguably underwent the greatest change of all components of AiiDA in v2.0 and as such various backwards incompatible changes had to be introduced.

  • FileType: moved from aiida.orm.utils.repository to aiida.repository.common
  • File: moved from aiida.orm.utils.repository to aiida.repository.common
  • File: changed from namedtuple to class
  • File: can no longer be iterated over
  • File: type attribute was renamed to file_type
  • Node.put_object_from_tree: path argument was renamed to filepath
  • Node.put_object_from_file: path argument was renamed to filepath
  • Node.put_object_from_tree: key argument was renamed to path
  • Node.put_object_from_file: key argument was renamed to path
  • Node.put_object_from_filelike: key argument was renamed to path
  • Node.get_object: key argument was renamed to path
  • Node.get_object_content: key argument was renamed to path
  • Node.open: key argument was renamed to path
  • Node.list_objects: key argument was renamed to path
  • Node.list_object_names: key argument was renamed to path
  • SinglefileData.open: key argument was renamed to path
  • Node.open: can no longer be called without context manager
  • Node.open: only mode r and rb are supported, use put_object_from_ methods instead
  • Node.get_object_content: only mode r and rb are supported
  • Node.put_object_from_tree: argument contents_only was removed
  • Node.put_object_from_tree: argument force was removed
  • Node.put_object_from_file: argument force was removed
  • Node.put_object_from_filelike: argument force was removed
  • Node.delete_object: argument force was removed

Using open in a context manager

In AiiDA v1.0 it was possible to call Node.open without a context manager, for example:

handle = node.open('filename.txt')
content = handle.read()
handle.close()

In AiiDA v2.0, this will raise and instead it should be used in a context manager

with node.open('filename.txt') as handle:
    content = handle.read()

This is good practice in any case, because in this case the file handle will be properly closed even if the read call excepts for some reason. In normal Python, although ill-advised, it is possible to call open on a file on the file system without a context manager, but in AiiDA v2.0 this raises. The reason is that by requiring a context manager, the file repository can be implemented in a more efficient manner, making the reading of files faster.

Writing cross-compatible code

Despite the changes listed above, it should be possible to write code that is compatible with both AiiDA 1.x and 2.x. The most important things to consider are:

  1. Always use .open() with a context manager (as detailed above).

  2. Use key or path as positional arguments, not keyword arguments. For example, write

    with node.open('filename.txt') as in_f:
        <...>

    instead of

    with node.open(key='filename.txt') as in_f:
        <...>
  3. Use try / except clauses to handle imports that have moved. For example:

    try:
        from aiida.orm.utils.repository import FileType
    except ImportError:
        from aiida.repository.common import FileType
  4. To access the type / file_type attribute of a File, you can again use try / except clauses:

    some_file = File(<...>)
    try:
        file_type = some_file.file_type
    except AttributeError:
        file_type = some_file.type

    Or alternatively, getattr chaining:

    some_file = File(<...>)
    file_type = getattr(some_file, 'file_type', getattr(some_file, 'type'))

Points 3 & 4 are needed only for cross-compatibility between AiiDA versions <=1.3, and >=2.0. The 1.4 release is compatible with both the old and new syntax, but will show DeprecationWarning if the old syntax is used.

When using these workarounds (3 & 4), we recommend placing a comment into your code. For example:

# Workaround for compatibility with AiiDA version < 1.4

This will let you know to remove the workaround once your code no longer needs to be compatible with older AiiDA versions. Make sure the comment is always exactly the same, to simplify searching for it.

QueryBuilder

For the Computer class, the attribute name was already deprecated in AiiDA v1.0 and was replaced by label. However, the attribute name remained in the database table. This meant that in the QueryBuilder one had to continue using name. In AiiDA v2.0, the database table is now updated to match the ORM. If before you did the following:

QueryBuilder().append(Computer, filters={'name': 'localhost'}, project=['name']).all()

now you have to use

QueryBuilder().append(Computer, filters={'label': 'localhost'}, project=['label']).all()

REST API

The attribute name for the entity Computer was renamed to label.

Transport plugins

In PR #3787 a change to the API of transport plugins has been introduced, to support also transferring bytes (rather than only Unicode strings) in the stdout/stderr of "remote" commands (via the transport).

The required changes in your plugin (if you wrote a transport plugin) are:

  • rename the exec_command_wait function in your plugin implementation with exec_command_wait_bytes
  • ensure that you have a stdin in the parameters (the signature should be exec_command_wait_bytes(self, command, stdin=None, **kwargs)) and that you (also) accepts bytes in input in the stdin parameter. Ideally, if you get bytes, you shouldn't do any encoding/decoding, to ensure your plugin works also if the stdin contains binary data.
  • return bytes for stdout and stderr (most probably internally you are already getting bytes - just do not decode them to strings)

See e.g. the changes to the local transport plugin to see an example what needs to be changes.

Note that one can still call exec_command_wait that is now defined in the parent Transport class (that now has an encoding optional parameter with default=utf8, as it used to be), and takes care of the decoding. More details can be found in the PR and in the corresponding commit message, including how to support both v1.6 and v2.0 of AiiDA (by still defining also the exec_command_wait in your plugin, during the transition period).

Equality comparison of Dict nodes

Since AiiDA v1.6.0, nodes of all types compare equal when they have the same UUID (See PR #4753). However, most of the Pythonic base data types (Bool, Int, Float, Str and List) already went one step further and also compared equal to other nodes based on the node content. The only base type that was the exception here was Dict. After some discussion (see #5187 for a summary), it was decided to make the way compare equal to be consistent among the base types and hence make Dict nodes compare equal when they have the same content (see PR #5251).

In case your code relies on Dict nodes only comparing equal when it is strictly the same node, you can use the uuid property of the nodes. For example, when you define two different Dict nodes based on the same dictionary:

In [1]: d1 = Dict({'a': 1})

In [2]: d2 = Dict({'a': 1})

They will now be equal according to the == operator:

In [3]: d1 == d2
Out[3]: True

However, you can still see if they are the same node using the uuid property:

In [4]: d1.uuid == d2.uuid
Out[4]: False

Schedulers

Scheduler plugins implementing the Scheduler class, had to implement the _get_submit_script_header method, which was also responsible for writing the environment variable declarations if the job_environment variable was set on the job template. This functionality has now been factored out to the method _get_submit_script_environment_variables (see PR 5283). Instead of formatting the environment variables themselves, it is advised that plugin simply call this function from _get_submit_script_header and include the generated string in the returned string.

Miscellaneous

  • The Transport.get_valid_transports() method has been removed, use get_entry_point_names('aiida.transports') instead, with aiida.plugins.entry_point.get_entry_point_names.
  • The Scheduler.get_valid_transports() method has been removed, use get_entry_point_names('aiida.schedulers') instead, with aiida.plugins.entry_point.get_entry_point_names.

Unit tests

This affects only plugins still using the PluginTestCase class.

Background

Since 2017 (v0.11.0), AiiDA offered a PluginTestCase class that made it easy for plugin developers set up a fully functioning test environment. The test class was originally designed to work with the unittest package, but testing in aiida-core (as well as most plugins) moved to pytest.

The PluginTestCase class could still be run through pytest (and the aiida-plugin-cutter included an example of this), but as testing through unittest is being deprecated, the PluginTestCase only adds extra code to maintain and will be removed.

Migrating to pytest

The canonical way of writing tests in pytest is through simple test functions and pytest fixtures. See the pytest documentation for details.

However, pytest also offers support for test classes with unittest-style setup methods. For a minimalist approach to removing the dependency on the PluginTestCase, see this migration diff from the aiida-plugin-cutter.

Testing the migration

This is a temporary section with instructions for developers to have them test the database migrations that will be released with v2.0. The instructions below hopefully make it as easy as possible to test this.

Preparing environment

  • Checkout the latest develop branch: git checkout develop && git pull
  • Install latest dependencies: pip install -U -e .[tests,pre-commit]
  • Run verdi status: this will update your configuration to the latest schema version

Setting up profile

  • Create a clone of the PostgreSQL database you want to migrate
    • Login as the postgres user: sudo su - postgres
    • Load the postgres program: psql
    • If it is already loaded in postgres, you can clone it in psql directly: CREATE DATABASE aiida_clone WITH TEMPLATE aiida_original_db OWNER aiida; Make sure to change the names of the databases and the owner of course.
    • If the database is on another machine and you want to test the migration on your workstation.
      • Go to the remote machine and dump the database: pg_dump -h localhost -d aiida_original_db -U aiida -W > aiida_original_db.psql
      • Copy over the aiida_original_db.psql file to your workstation
      • Create a new database in psql: CREATE DATABASE aiida_clone OWNER aiida;
      • Load the database dump: psql -h localhost -d aiida_original_db -U aiida -W > aiida_original_db.psql
  • Check statistics of the database (this information should be kept for reporting):
    • Note whether it is Django or SqlAlchemy. If you don't know, run SELECT * FROM alembic_version; in psql. If it returns a value, it is SqlAlchemy, if it errors with ERROR: relation "alembic_version" does not exist it is Django
    • Get database node count: SELECT count(*) FROM db_dbnode;
    • Get database size: SELECT pg_size_pretty(pg_database_size('aiida_clone'));
    • Get database revision:
      • For SqlAlchemy: SELECT * FROM alembic_version;
      • For Django: SELECT name FROM django_migrations WHERE app = 'db' ORDER BY id DESC LIMIT 1;
  • Create a clone of the repository (Note: this is only necessary if your database revision is below a certain revision; the migrations above it will not affect the repository, including the repository migration itself, as it will leave the original repo intact and simply write the new disk object store in parallel.)
    • Django: if you have revision 0027 or above, there is no need to clone the repo
    • SqlAlchemy: if your revision is in the following list, there is no need to clone the repo: ['1de112340b16', '1de112340b17', '1de112340b18', '34a831f4286d', '535039300e4a', '1feaea71bd5a', '7536a82b2cc4', '0edcdd5a30f0', 'bf591f31dd12', '118349c10896', '91b573400be5', '7b38a9e783e7', 'e734dd5e50d7', 'e797afa09270', '26d561acd560', '07fac78e6209', 'de2eaf6978b4', '1830c8430131', '1b8ed3425af9', '3d6190594e19', '5a49629f0d45', '5ddd24e52864', 'd254fdfed416', '61fc0913fae9', 'ce56d84bcc35']
  • Create a profile with the correct database and repository configured
    • Easiest is to open config.json and clone an entry and simply update the name of the database and the location of the repository
    • IMPORTANT: if the database has an old schema version (see the point above) you should have made a clone of the repository and you should make sure that the storage.config.repository_uri key points to the correct path

Running the migration

  • Make sure the daemon is not running
  • Run time verdi -p aiida-profile storage migrate -f. IMPORTANT do not forget the time in front. We would like to gather this information to get an idea of how long the migrations typically take.
  • Copy the log messages from the migrations printed to stdout.

Checks after the migration

  • Rerun the statistics database size and node count in psql:
    • SELECT count(*) FROM db_dbnode;
    • SELECT pg_size_pretty(pg_database_size('aiida_clone'));
  • Run verdi status and check that storage connection is green
  • Open verdi shell and do some tests: queries, opening repository files of some nodes etc.

Reporting

For each database for which you test the migration, please report the following:

  • Database backend (Django or SqlAlchemy)
  • Starting revision
  • Node count before migration
  • Node count after migration
  • Database size before migration
  • Database size after migration
  • Time taken for the actual migration
  • Messages printed to stdout by the migrations
  • Any errors you encountered or problems you noticed afterwards when manually inspecting the data
  • Output of the following command: verdi devel run-sql "SELECT pt.tablename AS TableName, t.indexname AS IndexName, pc.reltuples, pg_size_pretty(pg_relation_size(quote_ident(pt.tablename)::text)), pg_size_pretty(pg_relation_size(quote_ident(t.indexrelname)::text)), t.idx_scan FROM pg_tables AS pt LEFT OUTER JOIN pg_class AS pc ON pt.tablename=pc.relname LEFT OUTER JOIN (SELECT pc.relname AS TableName, pc2.relname AS IndexName, psai.idx_scan, psai.indexrelname FROM pg_index AS pi JOIN pg_class AS pc ON pc.oid = pi.indrelid JOIN pg_class AS pc2 ON pc2.oid = pi.indexrelid JOIN pg_stat_all_indexes AS psai ON pi.indexrelid = psai.indexrelid )AS T ON pt.tablename = T.TableName WHERE pt.schemaname='public';"