-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
47 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,32 +17,44 @@ | |
<img src="https://dcbadge.vercel.app/api/server/s3tBEn7Ptg?compact=true&style=flat" alt ="Oxen.ai Discord"> | ||
</a> | ||
<a href="https://twitter.com/oxen_ai" style="padding: 2px;"> | ||
<img src="https://img.shields.io/twitter/url/https/twitter.com/oxen_ai | ||
.svg?style=social&label=Follow%20%40Oxen.ai" alt ="Oxen.ai Twitter"> | ||
<img src="https://img.shields.io/twitter/url/https/twitter.com/oxenai.svg?style=social&label=Follow%20%40Oxen.ai" alt ="Oxen.ai Twitter"> | ||
</a> | ||
<br/> | ||
</div> | ||
|
||
# | ||
|
||
# 🐂 Oxen.ai | ||
![Oxen.ai Logo](/images/oxen-no-margin-white.svg#gh-dark-mode-only) | ||
![Oxen.ai Logo](/images/oxen-no-margin-black.svg#gh-light-mode-only) | ||
|
||
Oxen is a lightning fast unstructured data version control system for machine learning datasets. | ||
## 🐂 What is Oxen? | ||
|
||
<p align="center"> | ||
<img src="https://github.com/Oxen-AI/oxen-release/blob/main/images/space-ox.png?raw=true"> | ||
</p> | ||
Oxen is a lightning fast data version control system for structured and unstructured machine learning datasets. The interface mirror git, so that it is easy to learn if you are a software engineer, but it optimized from the ground up to work with large datasets. | ||
|
||
## 🌾 Why Build Oxen? | ||
```bash | ||
oxen init | ||
oxen add images/ | ||
oxen add annotations/*.parquet | ||
oxen commit "Adding 200k images and their corresponding annotations" | ||
oxen push origin main | ||
``` | ||
|
||
Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like. | ||
As well as a [command line interface](https://docs.oxen.ai/getting-started/cli), there are bindings for [Rust](https://github.com/Oxen-AI/Oxen) 🦀, [Python](https://docs.oxen.ai/getting-started/python) 🐍, and [HTTP interfaces](https://docs.oxen.ai/http-api) 🌎. | ||
|
||
If you have ever tried [git lfs](https://git-lfs.com/) to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning. | ||
## ✅ Features | ||
|
||
If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name: | ||
Oxen was optimized to be fast on structured and unstructured data types. Unlike traditional version control systems that are optimized for text files and code, Oxen was built from the [ground up to be fast](https://github.com/Oxen-AI/oxen-release/blob/main/Performance.md) on images, video, audio, text, and more. | ||
|
||
`s3://data/images_july_2022_final_2_no_really_final.tar.gz` | ||
* 🔥 Fast (efficient indexing and syncing of data) | ||
* 🧠 Easy to learn (same commands as git) | ||
* 🗄️ Index lots of files (millions of images? no problem) | ||
* 🎥 Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc) | ||
* 📊 Native DataFrame processing (index, compare and serve up DataFrames) | ||
* 📈 Tracks changes over time (never worry about losing the state of your data) | ||
* 🤝 Collaborate with your team (sync to an oxen-server) | ||
* 🌎 [Remote Workspaces](https://docs.oxen.ai/concepts/remote-workspace) to interact with the data without downloading it | ||
* 👀 Better data visualization on [OxenHub](https://oxen.ai) | ||
|
||
We built Oxen to be the tool we wish we had. | ||
|
||
## 📚 Familiar Workflow | ||
|
||
|
@@ -52,37 +64,15 @@ The Oxen Command Line Interface (CLI) mirrors [git](https://git-scm.com/) in man | |
|
||
The difference is Oxen is built for data. It is optimized to handle large files, and large datasets. It is built to be fast, and easy to use. | ||
|
||
<p align="center"> | ||
<a href="https://docs.oxen.ai/getting-started/intro#getting-started">🐮 Learn The Basics</a> | ||
</p> | ||
|
||
<p align="center"> | ||
<img src="https://github.com/Oxen-AI/oxen-release/raw/main/images/cli-celeba.gif?raw=true" alt="oxen cli demo" /> | ||
</p> | ||
|
||
## 🤖 Built for AI | ||
|
||
If you are building an AI application, data is the lifeblood. Data is constantly changing over time, and data differentiates your model from the competition. | ||
|
||
Whether you are building your own model from scratch, fine-tuning a pre-trained model, or using a model as a service, you will need to manage and compare the inputs and outputs over time to ensure your model is improving. | ||
|
||
[We version our code, why not our data?](https://blog.oxen.ai/we-version-our-code-why-not-our-data/) | ||
|
||
Versioning your data means you can experiment on models in parallel with different data. The more experiments you run, the smarter your model becomes, and more robust models lead to better products. | ||
|
||
## ✅ Features | ||
## 🐮 Learn The Basics | ||
|
||
Oxen was optimized to be fast on structured and unstructured data types. Unlike traditional version control systems that are optimized for text files and code, Oxen was built from the [ground up to be fast](https://github.com/Oxen-AI/oxen-release/blob/main/Performance.md) on images, video, audio, text, and more. | ||
To learn what everything Oxen can do, the full documentation can be found at [https://docs.oxen.ai](https://docs.oxen.ai). | ||
|
||
* 🔥 Fast (10-100x faster than existing tools) | ||
* 🧠 Easy to learn (same commands as git) | ||
* 🗄️ Index lots of files (millions of images? no problem) | ||
* 🎥 Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc) | ||
* 📊 Native DataFrame processing ([oxen df](https://github.com/Oxen-AI/oxen-release/blob/main/DataFrames.md) command for data exploration) | ||
* 📈 Tracks changes over time (never worry about losing the state of your data) | ||
* 🤝 Collaborate with your team (sync to an oxen-server) | ||
* 🌎 [Remote Workspaces](https://docs.oxen.ai/concepts/remote-workspace) to interact with the data without downloading it | ||
* 👀 Better data visualization on [OxenHub](https://oxen.ai) | ||
|
||
## 🧑💻 Getting Started | ||
|
||
|
@@ -111,11 +101,7 @@ Clone your first Oxen repository from the [OxenHub](https://oxen.ai/explore). | |
oxen clone https://hub.oxen.ai/ox/CatDogBBox | ||
``` | ||
|
||
## 🐮 Learn The Basics | ||
|
||
To learn everything else, the full documentation can be found at [https://docs.oxen.ai](https://docs.oxen.ai). | ||
|
||
## ⭐️ Every GitHub Star Gives an Ox its Wings | ||
## ⭐️ Every GitHub star gives an ox its wings | ||
|
||
No really. | ||
|
||
|
@@ -125,11 +111,11 @@ We hooked up the GitHub webhook for stars to an [OxenHub Repository](https://www | |
<img src="https://github.com/Oxen-AI/oxen-release/blob/main/images/ox-with-wings.png?raw=true" alt="oxen repo with wings" /> | ||
</p> | ||
|
||
## Support | ||
## 🤝 Support | ||
|
||
If you have any questions, comments, suggestions, or just want to get in contact with the team, feel free to email us at `[email protected]` | ||
|
||
## Contributing | ||
## 👥 Contributing | ||
|
||
This repository contains the Python library that wraps the core Rust codebase. We would love help extending out the python interfaces, the documentation, or the core rust library. | ||
|
||
|
@@ -139,8 +125,24 @@ Code bases to contribute to: | |
* 🐍 [Python Interface](https://github.com/Oxen-AI/oxen-release/tree/main/oxen) | ||
* 📚 [Documentation](https://github.com/Oxen-AI/docs) | ||
|
||
If you are building anything with Oxen.ai or have any questions we would love to hear from you in our [discord](https://discord.gg/8PKjB9Dz). | ||
If you are building anything with Oxen.ai or have any questions we would love to hear from you in our [discord](https://discord.gg/s3tBEn7Ptg). | ||
|
||
## Why build Oxen? | ||
|
||
Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like. | ||
|
||
If you have ever tried [git lfs](https://git-lfs.com/) to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning. | ||
|
||
If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name: | ||
|
||
`s3://data/images_july_2022_final_2_no_really_final.tar.gz` | ||
|
||
We built Oxen to be the tool we wish we had. | ||
|
||
## Why the name Oxen? | ||
|
||
"Oxen" 🐂 comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🌾. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level ML problems that matter to your product. | ||
|
||
<!----------------------------------------------------------------------------> | ||
|
||
[Learn The Basics]: https://img.shields.io/badge/Learn_The_Basics-37a779?style=for-the-badge |