diff --git a/Performance.md b/Performance.md index eb75589..a63bfda 100644 --- a/Performance.md +++ b/Performance.md @@ -2,14 +2,16 @@ Using the CelebA dataset as an example using Oxen -`oxen add images` takes ~8 sec +`oxen add images` takes ~10 sec `oxen commit -m "adding images"` takes ~41 sec Compare this to a system like [git lfs](https://git-lfs.github.com/) on the same dataset `git lfs track images` takes ~17 sec `git add images` takes ~136 sec -`git commit -m "addimg images"` takes ~44 sec +`git commit -m "adding images"` takes ~44 sec +`git remote add origin https://huggingface.co/datasets/gschoeni/CelebA` +`git push origin master` If you add this up oxen takes ~49 sec to git's ~197 sec which is about a 4x speed improvement for adding and committing. diff --git a/README.md b/README.md index 0da863b..42310f5 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ # 🐂 oxen-release -Oxen is command line tooling for working with large machine learning datasets. +Oxen helps you version on your machine learning datasets like you version your code. -The name Oxen comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🐂 🌾. Let Oxen take care of grunt work of your infrastructure so you can focus on the higher level ML problems that matter to your product. +In a world of [Software 2.0](https://karpathy.medium.com/software-2-0-a64152b37c35) where we are replacing lines with neural networks and large datasets, we need better tooling to keep track of changes to the data and models over time. + +Versioning datasets with `git` or `git lfs` is slow and painful 😩. Git was built for code repositories, not data. Oxen is built from the ground up for speed and large datasets 🐂 💨 and is 10-100x faster than using git. It is built from the ground up to be fast 🔥 and easy to learn 🧠 @@ -17,6 +19,10 @@ It is built from the ground up to be fast 🔥 and easy to learn 🧠 Sign up [here](https://airtable.com/shril5UTTVvKVZAFE) for more information and to stay updated on the progress. +# Why the name Oxen? + +"Oxen" comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields 🐂 🌾. Let Oxen take care of grunt work of your infrastructure so you can focus on the higher level ML problems that matter to your product. + # Overview The Oxen Command Line Interface (CLI) mirrors [git](https://git-scm.com/) in many ways, so if you are comfortable versioning code with git, it will be straight forward to version your datasets with Oxen. diff --git a/ReleaseNotes.md b/ReleaseNotes.md index bd14fd5..da95129 100644 --- a/ReleaseNotes.md +++ b/ReleaseNotes.md @@ -1,14 +1,19 @@ +# v0.4.1 + +* Features + * Faster data download 🔥 + * Chunked download data APIs + # v0.4.0 * Features * Faster data upload 🔥 - * Chunked data APIs + * Chunked upload data APIs * Breaking Changes * Removed CADF * Remove `oxen index` commands - # v0.3.0 * Features