Skip to content

Commit

Permalink
organize README
Browse files Browse the repository at this point in the history
  • Loading branch information
gschoeni committed Nov 5, 2023
1 parent 676d288 commit 56e7d2b
Showing 1 changed file with 47 additions and 45 deletions.
92 changes: 47 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,32 +17,44 @@
<img src="https://dcbadge.vercel.app/api/server/s3tBEn7Ptg?compact=true&style=flat" alt ="Oxen.ai Discord">
</a>
<a href="https://twitter.com/oxen_ai" style="padding: 2px;">
<img src="https://img.shields.io/twitter/url/https/twitter.com/oxen_ai
.svg?style=social&label=Follow%20%40Oxen.ai" alt ="Oxen.ai Twitter">
<img src="https://img.shields.io/twitter/url/https/twitter.com/oxenai.svg?style=social&label=Follow%20%40Oxen.ai" alt ="Oxen.ai Twitter">
</a>
<br/>
</div>

#

# 🐂 Oxen.ai
![Oxen.ai Logo](/images/oxen-no-margin-white.svg#gh-dark-mode-only)
![Oxen.ai Logo](/images/oxen-no-margin-black.svg#gh-light-mode-only)

Oxen is a lightning fast unstructured data version control system for machine learning datasets.
## 🐂 What is Oxen?

<p align="center">
<img src="https://github.com/Oxen-AI/oxen-release/blob/main/images/space-ox.png?raw=true">
</p>
Oxen is a lightning fast data version control system for structured and unstructured machine learning datasets. The interface mirror git, so that it is easy to learn if you are a software engineer, but it optimized from the ground up to work with large datasets.

## 🌾 Why Build Oxen?
```bash
oxen init
oxen add images/
oxen add annotations/*.parquet
oxen commit "Adding 200k images and their corresponding annotations"
oxen push origin main
```

Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.
As well as a [command line interface](https://docs.oxen.ai/getting-started/cli), there are bindings for [Rust](https://github.com/Oxen-AI/Oxen) 🦀, [Python](https://docs.oxen.ai/getting-started/python) 🐍, and [HTTP interfaces](https://docs.oxen.ai/http-api) 🌎.

If you have ever tried [git lfs](https://git-lfs.com/) to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.
## ✅ Features

If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:
Oxen was optimized to be fast on structured and unstructured data types. Unlike traditional version control systems that are optimized for text files and code, Oxen was built from the [ground up to be fast](https://github.com/Oxen-AI/oxen-release/blob/main/Performance.md) on images, video, audio, text, and more.

`s3://data/images_july_2022_final_2_no_really_final.tar.gz`
* 🔥 Fast (efficient indexing and syncing of data)
* 🧠 Easy to learn (same commands as git)
* 🗄️ Index lots of files (millions of images? no problem)
* 🎥 Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc)
* 📊 Native DataFrame processing (index, compare and serve up DataFrames)
* 📈 Tracks changes over time (never worry about losing the state of your data)
* 🤝 Collaborate with your team (sync to an oxen-server)
* 🌎 [Remote Workspaces](https://docs.oxen.ai/concepts/remote-workspace) to interact with the data without downloading it
* 👀 Better data visualization on [OxenHub](https://oxen.ai)

We built Oxen to be the tool we wish we had.

## 📚 Familiar Workflow

Expand All @@ -52,37 +64,15 @@ The Oxen Command Line Interface (CLI) mirrors [git](https://git-scm.com/) in man

The difference is Oxen is built for data. It is optimized to handle large files, and large datasets. It is built to be fast, and easy to use.

<p align="center">
<a href="https://docs.oxen.ai/getting-started/intro#getting-started">🐮 Learn The Basics</a>
</p>

<p align="center">
<img src="https://github.com/Oxen-AI/oxen-release/raw/main/images/cli-celeba.gif?raw=true" alt="oxen cli demo" />
</p>

## 🤖 Built for AI

If you are building an AI application, data is the lifeblood. Data is constantly changing over time, and data differentiates your model from the competition.

Whether you are building your own model from scratch, fine-tuning a pre-trained model, or using a model as a service, you will need to manage and compare the inputs and outputs over time to ensure your model is improving.

[We version our code, why not our data?](https://blog.oxen.ai/we-version-our-code-why-not-our-data/)

Versioning your data means you can experiment on models in parallel with different data. The more experiments you run, the smarter your model becomes, and more robust models lead to better products.

## ✅ Features
## 🐮 Learn The Basics

Oxen was optimized to be fast on structured and unstructured data types. Unlike traditional version control systems that are optimized for text files and code, Oxen was built from the [ground up to be fast](https://github.com/Oxen-AI/oxen-release/blob/main/Performance.md) on images, video, audio, text, and more.
To learn what everything Oxen can do, the full documentation can be found at [https://docs.oxen.ai](https://docs.oxen.ai).

* 🔥 Fast (10-100x faster than existing tools)
* 🧠 Easy to learn (same commands as git)
* 🗄️ Index lots of files (millions of images? no problem)
* 🎥 Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc)
* 📊 Native DataFrame processing ([oxen df](https://github.com/Oxen-AI/oxen-release/blob/main/DataFrames.md) command for data exploration)
* 📈 Tracks changes over time (never worry about losing the state of your data)
* 🤝 Collaborate with your team (sync to an oxen-server)
* 🌎 [Remote Workspaces](https://docs.oxen.ai/concepts/remote-workspace) to interact with the data without downloading it
* 👀 Better data visualization on [OxenHub](https://oxen.ai)

## 🧑‍💻 Getting Started

Expand Down Expand Up @@ -111,11 +101,7 @@ Clone your first Oxen repository from the [OxenHub](https://oxen.ai/explore).
oxen clone https://hub.oxen.ai/ox/CatDogBBox
```

## 🐮 Learn The Basics

To learn everything else, the full documentation can be found at [https://docs.oxen.ai](https://docs.oxen.ai).

## ⭐️ Every GitHub Star Gives an Ox its Wings
## ⭐️ Every GitHub star gives an ox its wings

No really.

Expand All @@ -125,11 +111,11 @@ We hooked up the GitHub webhook for stars to an [OxenHub Repository](https://www
<img src="https://github.com/Oxen-AI/oxen-release/blob/main/images/ox-with-wings.png?raw=true" alt="oxen repo with wings" />
</p>

## Support
## 🤝 Support

If you have any questions, comments, suggestions, or just want to get in contact with the team, feel free to email us at `[email protected]`

## Contributing
## 👥 Contributing

This repository contains the Python library that wraps the core Rust codebase. We would love help extending out the python interfaces, the documentation, or the core rust library.

Expand All @@ -139,8 +125,24 @@ Code bases to contribute to:
* 🐍 [Python Interface](https://github.com/Oxen-AI/oxen-release/tree/main/oxen)
* 📚 [Documentation](https://github.com/Oxen-AI/docs)

If you are building anything with Oxen.ai or have any questions we would love to hear from you in our [discord](https://discord.gg/8PKjB9Dz).
If you are building anything with Oxen.ai or have any questions we would love to hear from you in our [discord](https://discord.gg/s3tBEn7Ptg).

## Why build Oxen?

Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.

If you have ever tried [git lfs](https://git-lfs.com/) to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.

If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:

`s3://data/images_july_2022_final_2_no_really_final.tar.gz`

We built Oxen to be the tool we wish we had.

## Why the name Oxen?

"Oxen" 🐂 comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🌾. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level ML problems that matter to your product.

<!---------------------------------------------------------------------------->

[Learn The Basics]: https://img.shields.io/badge/Learn_The_Basics-37a779?style=for-the-badge

0 comments on commit 56e7d2b

Please sign in to comment.