Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Add bash based build on Ubuntu for faiss library and prerequisites #64

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rstarmer
Copy link
Member

add_dev.sh: ensure disk space for build and python prerequisites
build-prereqs.sh: install prerequisites (intel, nvidia)
build-faiss.sh: build faiss libs and python (and python swig) libs

Produces a python .whl
Produces a .tar.gz of the generated libraries

add_dev.sh: ensure disk space for build and python prerequisites
build-prereqs.sh: install prerequisites (intel, nvidia)
build-faiss.sh: build faiss libs and python (and python swig) libs

Produces a python .whl
Produces a .tar.gz of the generated libraries
@rstarmer rstarmer requested a review from sdake as a code owner July 12, 2023 18:51
@cla-bot cla-bot bot added the CLA CLA signed? label Jul 12, 2023
@rstarmer rstarmer marked this pull request as draft July 12, 2023 18:52
@sdake
Copy link
Member

sdake commented Jul 15, 2023

This is not specifically related to this draft; its the next step in the build pipeline.

https://github.com/artificialwisdomai/apt.oifv.ai/blob/main/build.sh

build.sh - build as "root" configured for docker/podman
README.md - instructions for running podman build
add_dev/build-prereqs.sh - shuffle python install
Copy link
Member

@sdake sdake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of repetition and redundancy in this set of scripts. If this is for your personal use, I don't mind merging it with a README.md stating as such. If its for general developer use, it should be cleaned up significantly.

The following activities should be isolated:

  • base os installation
  • upstream base os dependency installation
  • upstream package build (per package)
  • custom package build installation
  • custom package build test

There may be other things. The jist is that these actions should not be strongly coupled, as they are in this PR set.

All that said, happy to merge as is, once the top level dir makes sense. Such as

platform/packaging/dev or platform/dev/. I am not sure what you are after as this PR has many directions, so I am not sure what to suggest as a top level dir. platform/build-faiss doesn't make sense, as follow-on work to build a new package (e.g. autofaiss) may set an expectation that all of these shell scripts are needed.

Which I want to avoid.

@@ -1 +1,4 @@
build*/
build/
faiss*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the need for these .gitignore changes. With this, a git status will not show the dirty state of the repository in these directories.

I don't think that is right.

sudo apt install podman -y
```

Run build in podman:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am familiar with podman. Dan Walsh is a great fella, although, with Red Hat's recent changes, it is not clear to me that we should accept a dependency on any single-vendor project, especially those sponsored directly by Red Hat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I personally would recommend ociTools if we want to efficiently declare containers, which would let us rely on a much better-supported project with a more diverse user base. For now, I view Dockerfiles as a happy intermediate way to use either Docker or Podman, and we can always just remember to type "docker" on Docker-based machines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is where its ↩ https://github.com/containerd/nerdctl

We could have continuous battles about what tools we use. These battle, ie. religious wars, are why (successful) tech firms build their own toolchain, top to bottom.

Are we, as a team of founding engineers, willing to adopt a pragmatic perspective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Pragmatically, containers run on k8s, Oracle, and Podman. That's it. Docker is a fading bad dream. For me, what runs the containers is uninteresting, thanks to OCI. And what builds the containers is also uninteresting, as long as it's reproducible; I can learn whatever builder you want to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You state it well. We should, as a firm, attempt to eliminate ambiguity aversion. We do that by working on high-value areas. Packaging is not high value unless it directly relates to distribution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we understand our differing points of view. I also believe we all agree that containerd is the runtime we will use, if we need to run container workloads. All daemon (docker and a million clones including cog. In addition there are rootless variants (podman, nerdctl, and many more).

containerd is one of many OCI runtimes. This runtime is widely used and has wide investment across the technology ecosystem. This component provides:

  • Lifecycle Management (LCM):

    • start
    • stop
    • delete
  • Runtime:

    • exec
    • logs

FWIIW,I agree, docker is and was a massive swing and a miss.


# mount nvme disk on /models
sudo mkdir /models
sudo mkfs.xfs /dev/nvme0n1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of destructive operations in shell scripts. Especially those that could eat the entire filesystem if in error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your assumption in line 30 sort of covers this. Sort of, as in no destructive operation should be done anywhere without extreme clarity.. I would prefer you pull this destructive operatoin out, as I could never run add_dev.sh on beast, as it would wipe my main disk...........

mkdir /models/dev
ln -s /models/dev
mkdir /models/local
ln -s /models/local ~/.local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this script is a little mysterious. It appears like it is meant to create local storage for development? Is that faiss specific? If not, could you pull this file out, and put it in a different PR? Also, a comment at the top of the file describing the purpose would help.

git clone https://github.com/facebookresearch/faiss
cd faiss

# Configure paths and set environment variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pointless comment. The problem with these types of comments is, over time, they introduce code smells.


# Add a couple Python prerequisites
pip install -U pip setuptools wheel
pip install numpy swig torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 !



sudo -E apt update
sudo -E apt install dkms intel-basekit -y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is dkms explicitly needed? I know intel-basekit installs drivers, although I don't think we require all of OneAPI. The part we do require does not possess drivers. Is the problem that dkms is an unspecified dependency by intel?


## Get CUDA and install it

curl -sLO https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use distro packaging built by NVIDIA?

platform/build-faiss/build-prereqs.sh Show resolved Hide resolved

#Verify python and pytorch work

python3 -c 'import torch; print(f"Is CUDA Available: {torch.cuda.is_available()}")'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the comment (f"Pytorch and CUDA are operating correctly). The problem is pytorch is not installed...

@sdake
Copy link
Member

sdake commented Jul 22, 2023

#77

@sdake
Copy link
Member

sdake commented Jul 22, 2023

here is my go.

#78

These two PRs represent two forward passes.

Thanks
-steve

Copy link
Member

@sdake sdake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-home as follows:

/origin/platform
/origin/platform/build/faiss
/origin/workloads/retrofromer

Separately we should move packaging to build as packaging is a type of build process.

Besides the destructive operation on the disk, this looks great @rstarmer !

sudo apt install podman -y
```

Run build in podman:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we understand our differing points of view. I also believe we all agree that containerd is the runtime we will use, if we need to run container workloads. All daemon (docker and a million clones including cog. In addition there are rootless variants (podman, nerdctl, and many more).

containerd is one of many OCI runtimes. This runtime is widely used and has wide investment across the technology ecosystem. This component provides:

  • Lifecycle Management (LCM):

    • start
    • stop
    • delete
  • Runtime:

    • exec
    • logs

FWIIW,I agree, docker is and was a massive swing and a miss.


Run build in podman:

```sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

console is the only markup in .md that github understands.


a python wheel for faiss deployment

* faiss-libs.tgz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I am pulling this PR now to see it in action!


# mount nvme disk on /models
sudo mkdir /models
sudo mkfs.xfs /dev/nvme0n1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your assumption in line 30 sort of covers this. Sort of, as in no destructive operation should be done anywhere without extreme clarity.. I would prefer you pull this destructive operatoin out, as I could never run add_dev.sh on beast, as it would wipe my main disk...........

platform/build-faiss/build-faiss.sh Show resolved Hide resolved
export PATH=$HOME/.local/bin:$PATH
export DEBIAN_FRONTEND=noninteractive

cat <<EOF | sudo tee /etc/apt/sources.list.d/debian-contrib.list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this approach, I am borrowing it!

platform/build-faiss/build-prereqs.sh Show resolved Hide resolved
platform/build-faiss/build-prereqs.sh Show resolved Hide resolved

## Get CUDA and install it

curl -sLO https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the apt packaging? Sure, the run packaging works, was this just testing?

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt-get update
apt-get install cmake cuda-toolkit -y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strongly conflicts with 45.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA CLA signed?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants