Skip to content

AiiDA 1.0 plugin migration guide

Leopold Talirz edited this page Apr 7, 2019 · 45 revisions

Table of contents

Migrating imports

Before you start modifying imports of AiiDA objects, please run the AiiDA plugin migrator (click on the link to find the instructions) in order to take care of a number of tedious search/replace operations.

Note: The plugin migrator will bring you only some of the way. If you discover some replacements that are missing, it's easy to add them!

Currently not covered by plugin migrator

  • DataFactory('parameter') => DataFactory('dict')

Migrating Data subclasses

In aiida-core<=0.12.*, the Data Node class implemented some magic to automatically call certain methods if the corresponding keywords were passed in the constructor (also making sure that those specific keywords were not passed on to the constructor of the parent class).

Take Dict (formerly ParameterData) as an example: if constructed with the keyword dict, the constructor would call set_dict and remove the keyword from the kwargs before calling the constructor of the Data base class.

This magic has been dropped in favor of the standard python approach - you now have the freedom (but also the duty) to implement the constructor as needed by your class. Take as an example the new Dict data sub class:

class Dict(Data):
    """`Data` sub class to represent a dictionary."""

    def __init__(self, **kwargs):
        """Store a dictionary as a `Node` instance.

        :param dict: the dictionary to set
        """
        dictionary = kwargs.pop('dict', None)
        super(Dict, self).__init__(**kwargs)
        if dictionary:
            self.set_dict(dictionary)

Note that we allow the user to pass a dictionary using the dict keyword. We first pop this value from the kwargs, using the pop(key, None) syntax to make sure it will simply return None instead of excepting if the key is not present. Then we pass the remaining kwargs to the parent constructor.

Note: If you overwrite the constructor, don't forget to call the parent constructor.

Finally, after having called super, we can process the dictionary (if it was actually passed). You can choose to continue to do so in a set_dict method but you can in principle choose whatever you like.

Migrating JobCalculation to CalcJob

1. Subclass from aiida.engine.CalcJob instead of JobCalculation

In case the plugin migrator hasn't already taken care of this, replace:

from aiida.orm.calculation.job import JobCalculation

with:

from aiida.engine import CalcJob

You can keep the name of your subclass.

2. Replace _init_internal_params method by define class method

Instead of defining default variables in _init_internal_params:

def _init_internal_params(self):
    super(SomeCalculation, self)._init_internal_params()
    self._INPUT_FILE_NAME = 'aiida.inp'
    self._OUTPUT_FILE_NAME = 'aiida.out'
    self._default_parser = 'quantumespresso.pw'

include them via class variables & the metadata of the input spec:

class SomeCalcJob(engine.CalcJob):

    # Default input and output files
    _DEFAULT_INPUT_FILE = 'aiida.in'
    _DEFAULT_OUTPUT_FILE = 'aiida.out'

    @classmethod
    def define(cls, spec):
        super(SomeCalcJob, cls).define(spec)
        spec.input('metadata.options.input_filename', valid_type=six.string_types, default=cls._DEFAULT_INPUT_FILE)
        spec.input('metadata.options.output_filename', valid_type=six.string_types, default=cls._DEFAULT_OUTPUT_FILE)
        spec.input('metadata.options.parser_name', valid_type=six.string_types, default='quantumespresso.pw')
        # Override default withmpi=False
        spec.input('metadata.options.withmpi', valid_type=bool, default=True)

The parser_name key will be used by the engine to load the correct parser class after the calculation has completed. In this example, the engine will call ParserFactory('quantumespresso.pw') which will load the PwParser class of the aiida-quantumespresso package.

To access the value of the other metadata options:

  • if you are inside a method of the CalcJob class (e.g. in parse_for_submission), you can do self.inputs.metadata.options.output_filename);
  • from a stored CalcJobNode, instead, you can do node.get_option('output_filename').

Note: The code above is just an example - you don't need to expose filenames via medatadat.options if you don't want users to be able to modify them.

2. Replace use_methods with the define class method.

The define method works exactly as for WorkChains, see its documentation for details.

Consider the following use method:

@classproperty
def _use_methods(cls):
    return {
        'structure': {
            'valid_types': StructureData,
            'additional_parameter': None,
            'linkname': 'structure',
            'docstring': 'the input structure',
        }
    }

This translates to the define method:

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.input('some_input', valid_type=orm.Int,
        help='A simple input')

All input ports that are defined via spec.input are required by default. Use required=False in order to make an input port optional.

For use_methods that used the additional_parameter keyword, spec provides input namespaces. Consider the following use_method:

@classproperty
def _use_methods(cls):
    return {
        'structure': {
            'valid_types': UpfData,
            'additional_parameter': 'kind',
            'linkname': 'pseudos',
            'docstring': 'the input pseudo potentials',
        }
    }

This can be translated to the new process spec as follows:

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.input_namespace('pseudos', valid_type=UpfData,
        help='the input pseudo potentials', dynamic=True)

The spec.input_namespace and the dynamic=True keyword lets the engine know that the namespace can receive inputs that are not yet explicitly defined, because at the time of definition we do not know how many or under which keys the UpfData will be passed. Example usage when setting up the calculation:

inputs = {
     ...
     'pseudos': { 'Si': si_upf, 'C': c_upf },
     ...
}

Note: some inputs are pre-defined by CalcJob class. Check here for the full list of default inputs.

3. Change name, signature and implementation of the method _prepare_for_submission

Please remove the leading underscore and adjust to the new signature:

def prepare_for_submission(self, folder):
    """Create the input files from the input nodes passed to this instance of the `CalcJob`.

    :param folder: an `aiida.common.folders.Folder` to temporarily write files on disk
    :return: `aiida.common.datastructures.CalcInfo` instance
    """

Inputs are no longer passed in as a dictionary but retrieved through self.inputs (same as with WorkChains).

Importantly, the inputs provided as well as their type have already been validated - if the spec defined an input as required the input is guaranteed to be present in self.inputs. All boilerplate code for validation of presence and type can be removed in prepare_for_submission.

For example, if the spec defines an input structure of type StructureData that is required, instead of:

try:
    structure = inputdict.pop('structure')
except KeyError:
    raise InputValidationError('No structure was passed in the inputs')

if not isinstance(structure, StructureData):
    raise InputValidationError('the input structure should be a StructureData')

Simply do:

structure = self.inputs.structure

Only for input ports that are not required and do not specify a default you still need to check for the presence of the key in the dictionary.

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.input('optional', valid_type=Int, required=False, help='an optional input')

def prepare_for_submission(self, folder):
    if 'optional' in self.inputs:
        optional = self.inputs.optional
    else:
        optional = None

4. Changes to the local_copy_list

This is an example of adding a SinglefileData to the local_copy_list of the CalcInfo in 0.12:

single_file = SinglefileData()
local_copy_list = [(single_file.get_file_abs_path(), 
    os.path.join('some/relative/folder', single_file.filename)]

The get_file_abs_path method has been removed, and the structure of the local_copy_list has changed to accommodate this. You can now do:

single_file = SinglefileData()
local_copy_list = [(single_file.uuid, single_file.filename, single_file.filename)]

Each tuple in the local_copy_list should have length 3 and contain:

  1. the UUID of the node (SinglefileNode or FolderData)
  2. the relative file path within the node repository (for the SinglefileData this is given by its filename attribute)
  3. the relative path where the file should be copied in the remote folder used for the execution of the CalcJob

Naturally, this procedure also works for subclasses of SinglefileData such as UpfData, CifData etc.

Note: If you are creating an input file inside the Calculation and don't want to go through a node, you can simply use the folder:

    with io.StringIO(configure.configure_string) as handle:
        folder.create_file_from_filelike(handle, filename='input.txt', mode='w')

5. Restarting from a previous calculation

There are two ways of restarting from a previous calculation

  1. Make a symlink from the folder with previous calculation.
  2. Copy the folder from the previous calculation.

The advantage of approach 1 is that symlinking is fast. The disadvantage is that it won't work if the parent calculation was run on a different machine, and you shouldn't use it if your new calculation can modify data in the symlinked folder.

The old way of symlinking was:

if parent_calc_folder is not None:
    comp_uuid = parent_calc_folder.get_computer().uuid
    remote_path = parent_calc_folder.get_remote_path()
    calcinfo.remote_symlink_list.append((comp_uuid, remote_path, self._DEFAULT_PARENT_CALC_FLDR_NAME))

Replace this by:

if 'parent_calc_folder' in self.inputs:
    comp_uuid = self.inputs.parent_calc_folder.computer.uuid
    remote_path = self.inputs.parent_calc_folder.get_remote_path()
    calcinfo.remote_symlink_list.append((comp_uuid, remote_path, self._DEFAULT_PARENT_CALC_FLDR_NAME))

If you want to run the calculation on a different machine or you are afraid that the old data could be modified by a new run you should choose approach 2, taking into account that this requires time and disk space to copy the data. To implement this in your plugin you should:

...to write

Migrating the Parser

1. Change the name and signature of the method parse_from_retrieved

The method has changed name from parse_from_retrieved to parse and the signature is now parse(self, retrieved_temporary_folder, **kwargs).

The retrieved_temporary_folder is an absolute path to a temporary folder on disk (that will be automatically deleted after the parsing is done) that is by default empty (but can contain data if you specify file to temporarily retrieve in the retrieve_temporary_list attribute on the CalcInfo instance in CalcJob.prepare_for_submission).

2. Accessing the raw output files retrieved by the engine

To get the FolderData node with the raw data retrieved by the engine, use the following:

try:
    output_folder = self.retrieved
except exceptions.NotExistent:
    return self.exit_codes.ERROR_NO_RETRIEVED_FOLDER

with output_folder.open('output_file_name') as handle:
    self.out('output_link_label', SinglefileData(file=handle))

3. Signalling errors during parsing

As shown in the above example, the return signature of parse has changed as well. In aiida 0.12., we used to return a boolean (signalling whether parsing was successful) plus the list of output nodes:

def parse(self, **kwargs):
    success = False
    node_list = []

    if some_problem:
        self.logger.error("No retrieved folder found")
        return success, node_list

This has been replaced by returning an aiida.engine.ExitCode - or nothing, if parsing is successful. Adding output nodes is handled by self.out (see next section).

def parse(self, **kwargs):
    if some_error:
        return self.exit_codes.ERROR_NO_RETRIEVED_FOLDER

Here, we are using an exit code defined in the spec of a CalcJob like so:

class SomeCalcJob(engine.CalcJob):

    @classmethod
    def define(cls, spec):
        super(SomeCalcJob, cls).define(spec)
        spec.exit_code(100, 'ERROR_NO_RETRIEVED_FOLDER', message='The retrieved folder data node could not be accessed.')

Note: We recommend defining exit codes in the spec. It is also possible to define them directly in the parser, however:

def parse(self, **kwargs):
    if some_error:
        return aiida.engine.ExitCode(418)

4. Adding output nodes to the calculation

Instead of returning a tuple of outputs after parsing is completed, use the function self.out at any point in the parse function in order to attach an output node to the CalcJobNode representing the execution of the CalcJob class.

For example, this adds a Dict node as an output with the link label results.

output_results = {'some_key': 1}
self.out('results', Dict(dict=output_results))

By default, you will need to declare your outputs in the spec of the Process:

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.output('results', valid_type=Dict, required=True, help='the results of the calculation')
    spec.output('structure', valid_type=StructureData, required=False, help='optional relaxed structure')
    spec.default_output_node = 'results'

Like inputs, outputs can be required or not. If a required output is missing after parsing, the calculation is marked as failed.

You can choose to forego this check to gain flexibility by making the outputs dynamic in the spec:

    spec.outputs.dynamic = True
    spec.outputs.valid_type = Data

5. Accessing the CalcJobNode and its inputs from the parser

In order to access inputs of the original calculation, you can use the property self.node to get the CalcJobNode. Example: self.node.inputs.structure.