Skip to content

Commit

Permalink
bump v0.4.1
Browse files Browse the repository at this point in the history
  • Loading branch information
gschoeni committed Jan 6, 2023
1 parent f2206b4 commit f207447
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 6 deletions.
6 changes: 4 additions & 2 deletions Performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,16 @@

Using the CelebA dataset as an example using Oxen

`oxen add images` takes ~8 sec
`oxen add images` takes ~10 sec
`oxen commit -m "adding images"` takes ~41 sec

Compare this to a system like [git lfs](https://git-lfs.github.com/) on the same dataset

`git lfs track images` takes ~17 sec
`git add images` takes ~136 sec
`git commit -m "addimg images"` takes ~44 sec
`git commit -m "adding images"` takes ~44 sec
`git remote add origin https://huggingface.co/datasets/gschoeni/CelebA`
`git push origin master`

If you add this up oxen takes ~49 sec to git's ~197 sec which is about a 4x speed improvement for adding and committing.

Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# 🐂 oxen-release

Oxen is command line tooling for working with large machine learning datasets.
Oxen helps you version on your machine learning datasets like you version your code.

The name Oxen comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🐂 🌾. Let Oxen take care of grunt work of your infrastructure so you can focus on the higher level ML problems that matter to your product.
In a world of [Software 2.0](https://karpathy.medium.com/software-2-0-a64152b37c35) where we are replacing lines with neural networks and large datasets, we need better tooling to keep track of changes to the data and models over time.

Versioning datasets with `git` or `git lfs` is slow and painful 😩. Git was built for code repositories, not data. Oxen is built from the ground up for speed and large datasets 🐂 💨 and is 10-100x faster than using git.

It is built from the ground up to be fast 🔥 and easy to learn 🧠

Expand All @@ -17,6 +19,10 @@ It is built from the ground up to be fast 🔥 and easy to learn 🧠

Sign up [here](https://airtable.com/shril5UTTVvKVZAFE) for more information and to stay updated on the progress.

# Why the name Oxen?

"Oxen" comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🐂 🌾. Let Oxen take care of grunt work of your infrastructure so you can focus on the higher level ML problems that matter to your product.

# Overview

The Oxen Command Line Interface (CLI) mirrors [git](https://git-scm.com/) in many ways, so if you are comfortable versioning code with git, it will be straight forward to version your datasets with Oxen.
Expand Down
9 changes: 7 additions & 2 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# v0.4.1

* Features
* Faster data download 🔥
* Chunked download data APIs

# v0.4.0

* Features
* Faster data upload 🔥
* Chunked data APIs
* Chunked upload data APIs

* Breaking Changes
* Removed CADF
* Remove `oxen index` commands


# v0.3.0

* Features
Expand Down

0 comments on commit f207447

Please sign in to comment.