Kubernetes-native platform to run massively parallel data/streaming jobs
-
Updated
Jun 30, 2024 - Go
Kubernetes-native platform to run massively parallel data/streaming jobs
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A light-weight, flexible, and expressive statistical data testing library
R script for cleaning and preparing a canine health dataset used in a workshop on creating a data pre-analysis plans using Canva.
The main scope of this app is to collect raw data, processing them into usable data, measure varius indicators and export final results to create useful conclusions concerning of the needs of the reaserch..
dataDisk is a Python package designed to simplify the creation and execution of data processing pipelines. It provides a flexible framework for defining sequential tasks, applying transformations, and validating data. Additionally, it includes a ParallelProcessor for efficient parallel execution.
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
Data sources used by the Big Data Innovation Team
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
SQL-like interface to tabular structured data
Framework for processing and filtering datasets
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
Remote Sensing and GIS Software Library; python module tools for processing spatial data.
GlassFlow Python SDK to publish and consume data to your pipelines at Glassflow.dev
DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."