Skip to content

RADAR-base/radarpipeline

Repository files navigation

RADAR Pipeline

GitHub branch checks state GitHub issues GitHub pull requests GitHub forks GitHub stars GitHub license Code style: black

An open-source python feature generation and visualization package use with RADAR project data.


Radar-pipeline is a Python package that provides a feature-based architecture for building data pipelines. It allows you to easily ingest, process, and export data while leveraging existing features and adding custom functionality.

Installation

Installation using PIP

To install RADAR-pipeline, you can use the following command:

pip install radarpipeline

Installation in a Conda environment

To install RADAR-pipeline using Conda, you can use the following command:

conda create -n radarpipeline python=3.10
conda activate radarpipeline
pip install radarpipeline

Installation from source

To install RADAR-pipeline from source, follow the steps below. This is the recommended way to install RADAR-pipeline if you want to contribute to the project or if you want to use the latest features that are not yet released on PyPI.

Note

If you are using Windows, please install Spark and set environment variables as mentioned here before going through the installation below. You'll need to set the environment variables given here.

Clone the repository (with all the submodules):

git clone --recurse-submodules https://github.com/RADAR-base/radarpipeline.git

Change the directory to radarpipeline:

$ cd radarpipeline

Checkout the development branch:

$ git checkout dev

Create a Conda environment and activate it.

```bash
conda create -n radarpipeline python=3.10
conda activate radarpipeline
```

Install the dependencies:

$ python -m pip install -r requirements.txt

Install the module as a python package by running the command

$ python -m pip install -e .

To verify the installations, run the following command in the project root directory to run the pipeline:

$ python .

The pipeline would do a mock run and ingest the data in the mock-data directory. You can see some outputs in the CLI and if the project is installed correctly, the mock pipeline would run without errors and save the data to the output directory.

Docs

Radar pipeline as a library

RADAR-pipeline can be used as a library in a python script or a jupyter notebook. You can use the radarpipeline module to run the pipeline, validate the configuration file, read the radar data locally, download the data from Radar-base sftp server, convert the data to another format such as parquet, compute any features from a featurepipeline and get the output in return in pandas, and list all the available feature pipelines.

To run a feature pipeline using the config.yaml file, you can use the following command:

import radarpipeline
radarpipeline.run(config_file="config.yaml", variables)

To validate the configuration file, you can use the following command:

import radarpipeline
radarpipeline.validate(config_file="config.yaml")

To read the radar data locally, you can use the following command:

import radarpipeline
radarpipeline.read(source_path, )

To download the data from the sftp server, you can use the following command:

import radarpipeline
input_config = {
    "input": {
        "source_type": "sftp",
        "config": {
            "sftp_host": "",
            "sftp_source_path": "",
            "sftp_username": "",
            "sftp_private_key": "",
            "sftp_target_path": "/path/to/data",
        },
        "data_format": csv
        }
    }
radarpipeline.fetch(input_config)

To convert the data to another format such as parquet, you can use the following command:

import radarpipeline
data_format='parquet'
radarpipeline.convert(source_path, destination_path, variables, data_format)

To compute any features from a featurepipeline and get the output in return in pandas, you can use the following command:

import radarpipeline
input_config={
            "source_type": "local",
            "config": {
                "source_path": "mockdata/mockdata"
            },
            "data_format": "csv"
        }
feature_config={
            "location": "custom",
            "feature_groups": ["Tabularize"],
            "feature_names": [["android_phone_battery_level"]]
        }
data = radarpipeline.compute_features(input_config, feature_config)

To list all the available feature pipelines, you can use the following command:

import radarpipeline
print(radarpipeline.show_available_pipelines())

Radar pipeline as a command line tool

RADAR-pipeline can be used as a command line tool. You can use the radarpipeline command to run the pipeline, validate the configuration file, read the radar data locally, download the data from Radar-base sftp server, convert the data to another format such as parquet, compute any features from a featurepipeline and get the output in return in pandas, and list all the available feature pipelines.

To list all the available commands, you can use the following command:

radarpipeline -h

Output:

A CLI interface for radarpipeline

positional arguments:
  {run,validate,generate,fetch,convert,list}
                        Sub-command help
    run                 Runs radarpipeline
    validate            Validate config file to run radarpipeline
    generate            Generates a mock config file to run radarpipeline
    fetch               Fetch data using config file
    convert             Convert radar data to custom format
    list                List available Pipelines

options:
  -h, --help            show this help message and exit

To run a feature pipeline using the config.yaml file, you can use the following command:

radarpipeline run --config config.yaml

License

This project is licensed under the Apache License, Version 2.0.

Citation & Acknowledgment

Please use citation DOI or or see CITATION.cff

Pushkar patel has done a great amount of work under Google Summer of Code 2022. His work report can be found here. We would like to thank Pushkar for all his contribution and GSoC for giving us this opportunity.

Wiki

Please visit the RADAR Pipeline Wiki to learn more about RADAR Pipeline. Also see the RADAR-base Analytics Catalogue for available pipelines for processing RADAR-base data.

About

A python feature generation and visualzation package use with RADAR project data.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •