big-data

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science machine-learning caffe theano big-data spark deep-learning hadoop tensorflow numpy scikit-learn keras pandas kaggle scipy matplotlib mapreduce

Updated Mar 20, 2024
Python

apache / flink

Star

Apache Flink

python java scala sql big-data flink

Updated Oct 17, 2025
Java

amark / gun

Sponsor

Star

An open source cybersecurity protocol for syncing decentralized graph data.

Updated Jul 25, 2025
JavaScript

heibaiying / BigData-Notes

Star

大数据入门指南 ⭐

phoenix scala kafka big-data spark yarn hive hadoop storm bigdata hbase zookeeper hdfs mapreduce flume azkaban sqoop

Updated Jan 5, 2024
Java

prestodb / presto

Star

The official home of the Presto distributed SQL query engine for big data

java data query sql big-data presto hive hadoop lakehouse

Updated Oct 19, 2025
Java

andkret / Cookbook

Star

The Data Engineering Cookbook

big-data best-practices cookbook data-engineering data-engineer

Updated Oct 6, 2025
Python

apache / predictionio

Star

PredictionIO, a machine learning server for developers and ML engineers.

scala big-data predictionio

Updated Jan 9, 2021
Scala

trinodb / trino

Star

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated Oct 18, 2025
Java

yahoo / CMAK

Star

CMAK is a tool for managing Apache Kafka clusters

scala kafka big-data cluster-management

Updated Aug 2, 2023
Scala

vesoft-inc / nebula

Star

A distributed, fast open-source graph database featuring horizontal scalability and high availability

distributed-systems database big-data cpp graph raft scalability distributed graph-database graphdb hacktoberfest nebula nebula-graph nebulagraph

Updated Oct 9, 2025
C++

provectus / kafka-ui

Star

Open-Source Web UI for Apache Kafka Management

opensource kafka big-data web-ui streams kafka-connect apache-kafka kafka-producer kafka-client kafka-streams hacktoberfest streaming-data kafka-manager kafka-cluster event-streaming cluster-management kafka-ui kafka-brokers

Updated Jul 26, 2024
Java

StarRocks / starrocks

Star

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Updated Oct 19, 2025
Java

quickwit-oss / quickwit

Star

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

rust open-source search-engine big-data logs cloud-storage cloud-native log-management distributed-tracing tantivy

Updated Oct 17, 2025
Rust

cython / cython

Star

The most widely used Python to C compiler

python c performance big-data cpp cython cpython cpython-extensions

Updated Oct 19, 2025
Python

catboost / catboost

Star

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated Oct 18, 2025
C++

delta-io / delta

Star

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Oct 18, 2025
Scala

apache / beam

Star

Apache Beam is a unified programming model for Batch and Streaming data processing.

python java golang streaming sql big-data beam batch

Updated Oct 19, 2025
Java

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big-data

Here are 4,952 public repositories matching this topic...

binhnguyennus / awesome-scalability

ClickHouse / ClickHouse

apache / spark

donnemartin / data-science-ipython-notebooks

apache / flink

amark / gun

heibaiying / BigData-Notes

prestodb / presto

andkret / Cookbook

apache / predictionio

trinodb / trino

yahoo / CMAK

vesoft-inc / nebula

provectus / kafka-ui

StarRocks / starrocks

quickwit-oss / quickwit

cython / cython

catboost / catboost

delta-io / delta

apache / beam

Improve this page

Add this topic to your repo