dREL is a machine-actionable language describing data relationships and designed to be embedded in DDLm dictionaries. The language is defined both explicitly in the dREL publication [1] and implicitly by the dREL code appearing in the DDLm core CIF dictionary. Note that the code in the core CIF dictionary significantly expands the language presented in the paper, for example, by adding category methods.
The present proposals concern the built-in functions that are supported by dREL. No syntax changes or enhancements are proposed.
It is proposed that a _method.purpose
of Validation
will imply
that all DDLm attribute categories may appear as variables within the
associated dREL method, and that the values of these attributes are
those for the data name being validated. Additional predefined
variables value
and category
are bound to the particular value
and loop being validated.
DDLm requires that each method have an associated _method.purpose
.
The current DDLm attributes dictionary defines Evaluation
, Validation
and Definition
. The Validation
purpose is given as
method compares an evaluation with existing item value
This type of method appears only once in the DDLm core dictionary, to
show how the crystal system can be checked against the cell parameters.
This could equally well be performed by creating a data name whose
Evaluation
dREL method returns True
if the conditions are met.
The present proposal therefore suggests repurposing Validation methods for
more general validation by providing them with access to all attributes
of the definition of any data name that they are used to check. This allows
the methods to confirm, for example, that a value for a data name matches
the list of allowed enumerated values.
Note that currently, all categories from the dictionary in which a dREL method appears can appear as pre-defined variables in that dREL method, with values obtained from an associated data block. This proposal enhances that list with the attributes for the definition of the data name being checked.
The category
and value
pre-bound variables are required in order
to represent the generic value(s) being checked.
The following code finds enumerated values that are not allowed. It
would appear in the definition for a data name
valid.bad_enumerated_values
. The enumeration_set
variable
contains the contents of the enumerated_set
category in the
definition for a given data name, and value
is bound by the
execution environment to the particular value being checked. The
execution environment is responsible for collating values of this
data name for each data name in the data block being checked.:
# Check that a value is listed in an enumeration found = 'False' # Loop over enumerated states in the definition for this # data name loop e as enumeration_set { if (value == e.state) found = 'True'; } valid.bad_enumerated_value = found
It is proposed to add the following functions to the list of those allowed in dREL methods:
- Reference(name,attribute)
- The value of
attribute
in the dictionary definition ofname
is returned. Bothattribute
andname
are either string literals or string-valued variables. Where the result would be a loop, an appropriate dREL category object would be returned. - Instance(category)
- Returns an instance of category
category
in the data block provided to the dREL method. - PacketData(container,object)
- Returns specific data corresponding to
object
incontainer
. The functional equivalent of the syntaxcat.obj
, wherecat
is the value ofcontainer
andobj
is the value ofobject
. Ifcontainer
is a category, the row must be unambiguous from context, if necessary using the resolution rules of the proposals in Part II. - Lookup(category,keys)
- The functional equivalent of
cat[k1=val1,k2=val2,...]
wherecat
is the value ofcategory
andkeys
is the dictionary{'k1':val1,'k2':val2,...}
. - Known(name)
- evaluates to true if a value for the object referenced by
name
can be found, false otherwise. Ifname
does not resolve to acategory.object
reference, or the particular row of a multi-row category is unknown, will return false.
A dREL method for checking conformance to requirements arising out of
DDLm attributes (for example, that a value is drawn from a list of values
of a different data name) cannot have 'hard-coded' <category>.<object>
names, as the method would no longer be applicable to all data names.
The above functions are therefore required to provide access into categories
and data in a generalised way.
Reference('atom_type.symbol','enumerated_set')
- Return the contents
of the
enumerated_set
loop in the definition ofatom_type.symbol
. Loop i as Instance( Reference( name.linked_item_id,'_name.category_id'))'
- Loop over all rows of the category
containing the data name contained in variable
name.linked_item_id
. Note thatname.linked_item_id
is not contained in quotes and therefore will be assigned the value given in the definition of the data name being validated. TheReference
function returns a string naming the category of the linked data name, and theInstance
function takes that string and returns a category object that is populated with the values in the data file.
Finding values that are not child values.
# Find values that are not those of the linked data name. result = 'False' linked_object = Reference(name.linked_item_id,'_name.object_id') loop i as Instance(Reference(name.linked_item_id,'_name.category_id')) { if (PacketData(i,linked_object) == value) result = 'True' } valid.is_child_key = result
Finding and returning repeated values of a key data name as the
value of data name valid.not_unique
. Note the use of variable
category
to refer to the loop being checked.
# find key values that are not unique. not_unique = [] # Accumulate keys keylist = [] # get the object id for each key data name Loop k as category_key { keylist ++= Reference(k.name,'name.object_id') #Append } Loop c as category { new_val = [] for ko in keylist { new_val ++= PacketData(c,ko) #Append } if (new_val in keylist) { not_unique ++= new_val } else { keylist ++= new_val } valid.not_unique = not_unique
It is proposed that the construction <string1> in <string2>
be interpreted
as a boolean statement that returns true if <string1>
is a substring of
<string2>
.
in
in dREL is currently only applied to testing membership in a
List or Array. dREL as published proposes using the Substr
function to test for membership of a string in another string. This
could be more economically performed using the in
keyword without
compromising the use for Lists or Arrays. This also accords with the
use of in
in Python.
The following functions are proposed for removal from the list of provided functions:
- TopLo, TopHi (sorting low->high, high->low)
- functionality duplicated by combinations of sort() and reverse()
- Substr
- functionality replaced by Proposal 6.
[1] Spadaccini et. al, (2012) J. Chem. Inf. Model. **52**(8) pp 1917-1925