RADAR Pipeline

An open-source python feature generation and visualization package use with RADAR project data.

Radar-pipeline is a Python package that provides a feature-based architecture for building data pipelines. It allows you to easily ingest, process, and export data while leveraging existing features and adding custom functionality.

Installation

Installation using PIP

To install RADAR-pipeline, you can use the following command:

pip install radarpipeline

Installation in a Conda environment

To install RADAR-pipeline using Conda, you can use the following command:

conda create -n radarpipeline python=3.10
conda activate radarpipeline
pip install radarpipeline

Installation from source

To install RADAR-pipeline from source, follow the steps below. This is the recommended way to install RADAR-pipeline if you want to contribute to the project or if you want to use the latest features that are not yet released on PyPI.

Note

If you are using Windows, please install Spark and set environment variables as mentioned here before going through the installation below. You'll need to set the environment variables given here.

Clone the repository (with all the submodules):

git clone --recurse-submodules https://github.com/RADAR-base/radarpipeline.git

Change the directory to radarpipeline:

$ cd radarpipeline

Checkout the development branch:

$ git checkout dev

Create a Conda environment and activate it.

```bash
conda create -n radarpipeline python=3.10
conda activate radarpipeline
```

Install the dependencies:

$ python -m pip install -r requirements.txt

Install the module as a python package by running the command

$ python -m pip install -e .

To verify the installations, run the following command in the project root directory to run the pipeline:

$ python .

The pipeline would do a mock run and ingest the data in the mock-data directory. You can see some outputs in the CLI and if the project is installed correctly, the mock pipeline would run without errors and save the data to the output directory.

Docs

Radar pipeline as a library

RADAR-pipeline can be used as a library in a python script or a jupyter notebook. You can use the radarpipeline module to run the pipeline, validate the configuration file, read the radar data locally, download the data from Radar-base sftp server, convert the data to another format such as parquet, compute any features from a featurepipeline and get the output in return in pandas, and list all the available feature pipelines.

To run a feature pipeline using the config.yaml file, you can use the following command:

import radarpipeline
radarpipeline.run(config_file="config.yaml", variables)

To validate the configuration file, you can use the following command:

import radarpipeline
radarpipeline.validate(config_file="config.yaml")

To read the radar data locally, you can use the following command:

import radarpipeline
radarpipeline.read(source_path, )

To download the data from the sftp server, you can use the following command:

import radarpipeline
input_config = {
    "input": {
        "source_type": "sftp",
        "config": {
            "sftp_host": "",
            "sftp_source_path": "",
            "sftp_username": "",
            "sftp_private_key": "",
            "sftp_target_path": "/path/to/data",
        },
        "data_format": csv
        }
    }
radarpipeline.fetch(input_config)

To convert the data to another format such as parquet, you can use the following command:

import radarpipeline
data_format='parquet'
radarpipeline.convert(source_path, destination_path, variables, data_format)

To compute any features from a featurepipeline and get the output in return in pandas, you can use the following command:

import radarpipeline
input_config={
            "source_type": "local",
            "config": {
                "source_path": "mockdata/mockdata"
            },
            "data_format": "csv"
        }
feature_config={
            "location": "custom",
            "feature_groups": ["Tabularize"],
            "feature_names": [["android_phone_battery_level"]]
        }
data = radarpipeline.compute_features(input_config, feature_config)

To list all the available feature pipelines, you can use the following command:

import radarpipeline
print(radarpipeline.show_available_pipelines())

Radar pipeline as a command line tool

RADAR-pipeline can be used as a command line tool. You can use the radarpipeline command to run the pipeline, validate the configuration file, read the radar data locally, download the data from Radar-base sftp server, convert the data to another format such as parquet, compute any features from a featurepipeline and get the output in return in pandas, and list all the available feature pipelines.

To list all the available commands, you can use the following command:

radarpipeline -h

Output:

A CLI interface for radarpipeline

positional arguments:
  {run,validate,generate,fetch,convert,list}
                        Sub-command help
    run                 Runs radarpipeline
    validate            Validate config file to run radarpipeline
    generate            Generates a mock config file to run radarpipeline
    fetch               Fetch data using config file
    convert             Convert radar data to custom format
    list                List available Pipelines

options:
  -h, --help            show this help message and exit

To run a feature pipeline using the config.yaml file, you can use the following command:

radarpipeline run --config config.yaml

License

This project is licensed under the Apache License, Version 2.0.

Citation & Acknowledgment

Please use citation or or see CITATION.cff

Pushkar patel has done a great amount of work under Google Summer of Code 2022. His work report can be found here. We would like to thank Pushkar for all his contribution and GSoC for giving us this opportunity.

Wiki

Please visit the RADAR Pipeline Wiki to learn more about RADAR Pipeline. Also see the RADAR-base Analytics Catalogue for available pipelines for processing RADAR-base data.

Name		Name	Last commit message	Last commit date
Latest commit History 271 Commits
.github		.github
bin		bin
docs		docs
mockdata @ 17e9ff9		mockdata @ 17e9ff9
radarpipeline		radarpipeline
tests		tests
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
config.yaml		config.yaml
config.yaml.template		config.yaml.template
conftest.py		conftest.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RADAR Pipeline

Installation

Installation using PIP

Installation in a Conda environment

Installation from source

Docs

Radar pipeline as a library

Radar pipeline as a command line tool

License

Citation & Acknowledgment

Wiki

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

RADAR-base/radarpipeline

Folders and files

Latest commit

History

Repository files navigation

RADAR Pipeline

Installation

Installation using PIP

Installation in a Conda environment

Installation from source

Docs

Radar pipeline as a library

Radar pipeline as a command line tool

License

Citation & Acknowledgment

Wiki

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages