Before beginning any development in pyccel, it is important to ensure pyccel is correctly installed from source in development mode as described here. If this step is not followed then any changes made to source will not be used when pyccel
or epyccel
are used.
Pyccel's development is split into 4 main stages:
Pyccel uses Python's ast module to read the input file(s). The ast does not store information in the same way as the rest of Pyccel so this stage exists to convert Python's ast to Pyccel's ast. The related code can be found in parser/syntactic.py.
The syntactic stage also handles parsing header comments. This is managed using textx. The files describing the textx grammar are found in the folder parser/grammar. From these files textx generates instances of the classes found in the folder parser/syntax.
The role of this stage has decreased significantly since we moved from redbaron to Python's ast module. At some point in the future it may therefore be worth asking whether this stage is still pertinent.
This is the most important stage in pyccel. It is here that all the information about types is calculated. This stage strives to be language-agnostic; this means for example, that additional variables required to handle problems appearing in one specific language should not be created here.
When adding functions to this stage the aim is often to create a PyccelAstNode
(see ast/basic.py) and correctly define all of its parameters. This information is sometimes readily available (e.g. the type of a PyccelAdd
can be derived from the type of the variables passed to it), but sometimes the information must be collected from elsewhere (e.g. when creating a Variable
from a PyccelSymbol
(roughly equivalent to a string). In this case information is needed from a Scope
instance which is stored in the scope
.
In computer science, the scope is the area of a program where an item (e.g. variable, function, etc.) is recognised. For example a variable defined in a function will not be recognised outside of that function, therefore the function defines its scope.
In Pyccel, a Scope
is an object defined in parser/base.py which represents this concept. It includes all the functions, imports, variables, and classes which are available at a given point in the code. It also contains pointers to nested and parent scopes. The scope
in the SemanticParser
(SemanticParser._scope
) stores the Scope relevant to the line of code being treated. It must be updated whenever the scope changes (e.g. through the create_new_function_scope
function when entering into a function body). The Scope is also used to avoid naming collisions. See Scope documentation for details.
In this stage the Pyccel nodes are converted into a string which contains the translation into the requested language. Each language has its own printer. The printers are found in the folder codegen/printing.
As in the Semantic stage, the Code Generation stage also stores the current Scope in a scope
variable which must be updated whenever the scope changes.
Finally the generated code is compiled. This is handled in the pipeline. The compilers commands are found in codegen/compiling/compilers.py. Different compilers have different flags and need different libraries. Once pyccel has been executed once on your machine the flags and libraries can be found in json files in the compilers folder
In the syntactic, semantic, and code generation stages a similar strategy is used for traversing the Python objects. This strategy is based on function names. The majority of functions have names of the form: _prefix_ClassName
(in the syntactic and semantic stages the prefix is visit
, in the code generation stages it is print
). These functions are never called directly, but instead are called via a high level function _prefix
(e.g. _visit
for the semantic stage). This strategy avoids large if/elif blocks to handle all possible types.
Suppose we want to generate the code for an object of the class NumpyTanh
, first we collect the inheritance tree of NumpyTanh
. This gives us:
('NumpyTanh', 'NumpyUfuncUnary', 'NumpyUfuncBase', 'PyccelInternalFunction', 'PyccelAstNode', 'Basic')
Therefore the print functions which are acceptable for visiting this object are:
_print_NumpyTanh
_print_NumpyUfuncUnary
_print_NumpyUfuncBase
_print_PyccelInternalFunction
_print_PyccelAstNode
_print_Basic
We run through these possible functions choosing the one which is the most specialised. If none of these methods exist, then an error is raised.
In the case of NumpyTanh
the function which will be selected is _print_NumpyUfuncBase
when translating to C or Fortran, and _print_PyccelInternalFunction
when translating to Python
The objects as understood by pyccel are each described by classes which inherit from pyccel.ast.basic.Basic. These classes are found in the ast folder. The ast is split into several files. There is one file for each supported extension module and files to group concepts, e.g. literals/operators/built-in functions
Pyccel tries to fail cleanly and raise readable errors for users. This is managed using the error handling module found in the errors folder. In order to raise an error 2 things must be done:
- An instance of the singleton class
Errors
must be created - The
report
method of the class must be called (see docstring for details)
If the error prevents further translation (e.g. the type of an object is now unknown) then the severity should be indicated as 'fatal'
.
While discussions within the associated Github issue are often sufficient, should you require more help do not hesitate to ask one of the other developers to add you to our slack: pyccel.slack.com