Skip to content

Commit

Permalink
Docs: Analysis (#1444)
Browse files Browse the repository at this point in the history
* Docs: Analysis

Add a data analysis & visualization section.
This is meant to show entry points and workflows to work with
openPMD data in larger frameworks and compatible ecosystems.

* [Draft] DASK, Pandas, ...

* Doc: DASK

* Pandas

* RAPIDS

* Typos
  • Loading branch information
ax3l authored Jun 25, 2023
1 parent 9ab2ecd commit 32aa2cb
Show file tree
Hide file tree
Showing 7 changed files with 422 additions and 0 deletions.
34 changes: 34 additions & 0 deletions docs/source/analysis/contrib.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
.. _analysis-contrib:

Contributed
===========

This page contains contributed projects and third party integrations to analyze openPMD data.
See the `openPMD-projects <https://github.com/openPMD/openPMD-projects#data-processing-and-visualization>`__ catalog for more community integrations.


.. _analysis-contrib-visualpic:

3D Visualization: VisualPIC
---------------------------

openPMD data can be visualized with the domain-specific VisualPIC renderer.
Please see `the WarpX page for details <https://warpx.readthedocs.io/en/latest/dataanalysis/visualpic.html>`__.


.. _analysis-contrib-visit:

3D Visualization: VisIt
-----------------------

openPMD **HDF5** data can be visualized with VisIt 3.1.0+.
VisIt supports openPMD HDF5 files and requires to rename the files from ``.h5`` to ``.opmd`` to be automatically detected.


.. _analysis-contrib-yt:

yt-project
----------

openPMD **HDF5** data can be visualized with `yt-project <https://yt-project.org>`__.
Please see the `yt documentation <https://yt-project.org/doc/examining/loading_data.html?highlight=openpmd#openpmd-data>`__ for details.
50 changes: 50 additions & 0 deletions docs/source/analysis/dask.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. _analysis-dask:

DASK
====

The Python bindings of openPMD-api provide direct methods to load data into the parallel, `DASK data analysis ecosystem <https://www.dask.org>`__.


How to Install
--------------

Among many package managers, `PyPI <https://pypi.org/project/dask/>`__ ships the latest packages of DASK:

.. code-block:: python
python3 -m pip install -U dask
python3 -m pip install -U pyarrow
How to Use
----------

The central Python API calls to convert to DASK datatypes are the ``ParticleSpecies.to_dask`` and ``Record_Component.to_dask_array`` methods.

.. code-block:: python
s = io.Series("samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]
# the default schedulers are local/threaded. We can also use local
# "processes" or for multi-node "distributed", among others.
dask.config.set(scheduler='processes')
df = electrons.to_dask()
type(df) # ...
E = s.iterations[400].meshes["E"]
E_x = E["x"]
darr_x = E_x.to_dask_array()
type(darr_x) # ...
# note: no series.flush() needed
Example
-------

A detailed example script for particle and field analysis is documented under as ``11_particle_dataframe.py`` in our :ref:`examples <usage-examples>`.

See a video of openPMD on DASK in action in `pull request #963 <https://github.com/openPMD/openPMD-api/pull/963#issuecomment-873350174>`__ (part of openPMD-api v0.14.0 and later).
101 changes: 101 additions & 0 deletions docs/source/analysis/pandas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
.. _analysis-pandas:

Pandas
======

The Python bindings of openPMD-api provide direct methods to load data into the `Pandas data analysis ecosystem <https://pandas.pydata.org>`__.

Pandas computes on the CPU, for GPU-accelerated data analysis see :ref:`RAPIDS <analysis-rapids>`.


.. _analysis-pandas-install:

How to Install
--------------

Among many package managers, `PyPI <https://pypi.org/project/pandas/>`__ ships the latest packages of pandas:

.. code-block:: python
python3 -m pip install -U pandas
.. _analysis-pandas-df:

Dataframes
----------

The central Python API call to convert to openPMD particles to a Pandas dataframe is the ``ParticleSpecies.to_df`` method.

.. code-block:: python
import openpmd_api as io
s = io.Series("samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]
df = electrons.to_df()
type(df) # pd.DataFrame
print(df)
# note: no series.flush() needed
One can also combine all iterations in a single dataframe like this:

.. code-block:: python
import pandas as pd
df = pd.concat(
(
s.iterations[i].particles["electrons"].to_df().assign(iteration=i)
for i in s.iterations
),
axis=0,
ignore_index=True,
)
# like before but with a new column "iteration" and all particles
print(df)
.. _analysis-pandas-ascii:

openPMD to ASCII
----------------

Once converted to a Pandas dataframe, export of openPMD data to text is very simple.
We generally do not recommend this because ASCII processing is slower, uses significantly more space on disk and has less precision than the binary data usually stored in openPMD data series.
Nonetheless, in some cases and especially for small, human-readable data sets this can be helpful.

The central Pandas call for this is `DataFrame.to_csv <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html>`__.

.. code-block:: python
# creates a electrons.csv file
df.to_csv("electrons.csv", sep=",", header=True)
.. _analysis-pandas-sql:

openPMD as SQL Database
-----------------------

Once converted to a Pandas dataframe, one can query and process openPMD data also with `SQL syntax <https://en.wikipedia.org/wiki/SQL>`__ as provided by many databases.

A project that provides such syntax is for instance `pandasql <https://github.com/yhat/pandasql/>`__.

.. code-block:: python
python3 -m pip install -U pandasql
or one can `export into an SQL database <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html>`__.


.. _analysis-pandas-example:

Example
-------

A detailed example script for particle and field analysis is documented under as ``11_particle_dataframe.py`` in our :ref:`examples <usage-examples>`.
55 changes: 55 additions & 0 deletions docs/source/analysis/paraview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. _analysis-paraview:

3D Visualization: ParaView
==========================

openPMD data can be visualized by ParaView, an open source visualization and analysis software.
ParaView can be downloaded and installed from httpshttps://www.paraview.org.
Use the latest version for best results.

Tutorials
---------

ParaView is a powerful, general parallel rendering program.
If this is your first time using ParaView, consider starting with a tutorial.

* https://www.paraview.org/Wiki/The_ParaView_Tutorial
* https://www.youtube.com/results?search_query=paraview+introduction
* https://www.youtube.com/results?search_query=paraview+tutorial


openPMD
-------

openPMD files can be visualized with ParaView 5.9+, using 5.11+ is recommended.
ParaView supports ADIOS1, ADIOS2 and HDF5 files, as it implements against the Python bindings of openPMD-api.

For openPMD output to be recognized, create a small textfile with ``.pmd`` ending per data series, which can be opened with ParaView:

.. code-block:: console
$ cat paraview.pmd
openpmd_%06T.bp
The file contains the same string as one would put in an openPMD ``Series("....")`` object.

.. tip::

When you first open ParaView, adjust its global ``Settings`` (Linux: under menu item ``Edit``).
``General`` -> ``Advanced`` -> Search for ``data`` -> ``Data Processing Options``.
Check the box ``Auto Convert Properties``.

This will simplify application of filters, e.g., contouring of components of vector fields, without first adding a calculator that extracts a single component or magnitude.

.. warning::

As of ParaView 5.11 and older, the axisLabel is not yet read for fields.
See, e.g., `WarpX issue 21162 <https://github.com/ECP-WarpX/WarpX/issues/1803>`__.
Please apply rotation of, e.g., ``0 -90 0`` to mesh data where needed.

.. warning::

`ParaView issue 21837 <https://gitlab.kitware.com/paraview/paraview/-/issues/21837>`__:
In order to visualize particle traces with the ``Temporal Particles To Pathlines``, you need to apply the ``Merge Blocks`` filter first.

If you have multiple species, you may have to extract the species you want with ``Extract Block`` before applying ``Merge Blocks``.
99 changes: 99 additions & 0 deletions docs/source/analysis/rapids.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
.. _analysis-rapids:

RAPIDS
======

The Python bindings of openPMD-api enable easy loading into the GPU-accelerated `RAPIDS.ai datascience & AI/ML ecosystem <https://rapids.ai/>`__.


.. _analysis-rapids-install:

How to Install
--------------

Follow the `official documentation <https://docs.rapids.ai/install>`__ to install RAPIDS.

.. code-block:: python
# preparation
conda update -n base conda
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
# install
conda create -n rapids -c rapidsai -c conda-forge -c nvidia rapids python cudatoolkit openpmd-api pandas
conda activate rapids
.. _analysis-rapids-cudf:

Dataframes
----------

The central Python API call to convert to openPMD particles to a cuDF dataframe is the ``ParticleSpecies.to_df`` method.

.. code-block:: python
import openpmd_api as io
import cudf
s = io.Series("samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]
cdf = cudf.from_pandas(electrons.to_df())
type(cdf) # cudf.DataFrame
print(cdf)
# note: no series.flush() needed
One can also combine all iterations in a single dataframe like this:

.. code-block:: python
cdf = cudf.concat(
(
cudf.from_pandas(s.iterations[i].particles["electrons"].to_df().assign(iteration=i))
for i in s.iterations
),
axis=0,
ignore_index=True,
)
# like before but with a new column "iteration" and all particles
print(cdf)
.. _analysis-rapids-sql:

openPMD as SQL Database
-----------------------

Once converted to a dataframe, one can query and process openPMD data also with `SQL syntax <https://en.wikipedia.org/wiki/SQL>`__ as provided by many databases.

A project that provides such syntax is for instance `BlazingSQL <https://github.com/BlazingDB/blazingsql>`__ (see the `BlazingSQL install documentation <https://github.com/BlazingDB/blazingsql#prerequisites>`__).

.. code-block:: python
import openpmd_api as io
from blazingsql import BlazingContext
s = io.Series("samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]
bc = BlazingContext(enable_progress_bar=True)
bc.create_table('electrons', electrons.to_df())
# all properties for electrons > 3e11 kg*m/s
bc.sql('SELECT * FROM electrons WHERE momentum_z > 3e11')
# selected properties
bc.sql('SELECT momentum_x, momentum_y, momentum_z, weighting FROM electrons WHERE momentum_z > 3e11')
.. _analysis-rapids-example:

Example
-------

A detailed example script for particle and field analysis is documented under as ``11_particle_dataframe.py`` in our :ref:`examples <usage-examples>`.
Loading

0 comments on commit 32aa2cb

Please sign in to comment.