This example demonstrates how one can setup an hybrid python package, e.g. using python code and C++ code via pybind11. Further it shows how one can integrate pyarrow.
The following pybind11 type casters are added:
std::shared_ptr<arrow::DoubleArray>std::shared_ptr<arrow::StringArray>std::shared_ptr<arrow::Table>std::shared_ptr<arrow::RecordBatch>
see pyarrow_casters.hpp.
It also shows how once can forward the columns of an arrow::RecordBatch to your own non arrow function, e.g. there
is a wrapper which applies the following transformations:
| pyarrow Type | Forwarded Type |
|---|---|
| pyarrow.lib.DoubleArray | std::span<const double> |
| pyarrow.lib.StringArray | std::vector<std::string_view> |
| pyarrow.lib.ListArray[list<item: double>] | std::vector<std::span<const double>> |
see bindings.cpp.
The example shows how from a pandas.DataFrame an pyarrow.RecordBatch according to a specific
schema can be extracted and the passed to the internal library via pybind11.
The python/package contains a configuration to build (PyPackageBuild) and install (PyPackageInstall) your package
using the target MyLib. This is convenient during development as you do not have to recompile everything.
The actual setup.py file for distribution is located in the python folder, and you can create/install a wheel file
by running the following command from the python folder:
python setup.py bdist_wheel
pip install dist/pymylib-0.0.1-cp39-cp39-linux_x86_64.whlIn case you need to build the arrow library (and pyarrow) from source have a look at build_arrow.sh.