-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Add bash based build on Ubuntu for faiss library and prerequisites #64
base: main
Are you sure you want to change the base?
Conversation
add_dev.sh: ensure disk space for build and python prerequisites build-prereqs.sh: install prerequisites (intel, nvidia) build-faiss.sh: build faiss libs and python (and python swig) libs Produces a python .whl Produces a .tar.gz of the generated libraries
This is not specifically related to this draft; its the next step in the build pipeline. https://github.com/artificialwisdomai/apt.oifv.ai/blob/main/build.sh |
build.sh - build as "root" configured for docker/podman README.md - instructions for running podman build add_dev/build-prereqs.sh - shuffle python install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of repetition and redundancy in this set of scripts. If this is for your personal use, I don't mind merging it with a README.md
stating as such. If its for general developer use, it should be cleaned up significantly.
The following activities should be isolated:
- base os installation
- upstream base os dependency installation
- upstream package build (per package)
- custom package build installation
- custom package build test
There may be other things. The jist is that these actions should not be strongly coupled, as they are in this PR set.
All that said, happy to merge as is, once the top level dir makes sense. Such as
platform/packaging/dev
or platform/dev/
. I am not sure what you are after as this PR has many directions, so I am not sure what to suggest as a top level dir. platform/build-faiss
doesn't make sense, as follow-on work to build a new package (e.g. autofaiss
) may set an expectation that all of these shell scripts are needed.
Which I want to avoid.
@@ -1 +1,4 @@ | |||
build*/ | |||
build/ | |||
faiss* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the need for these .gitignore
changes. With this, a git status
will not show the dirty state of the repository in these directories.
I don't think that is right.
sudo apt install podman -y | ||
``` | ||
|
||
Run build in podman: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am familiar with podman
. Dan Walsh is a great fella, although, with Red Hat's recent changes, it is not clear to me that we should accept a dependency on any single-vendor project, especially those sponsored directly by Red Hat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I personally would recommend ociTools if we want to efficiently declare containers, which would let us rely on a much better-supported project with a more diverse user base. For now, I view Dockerfiles as a happy intermediate way to use either Docker or Podman, and we can always just remember to type "docker" on Docker-based machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this is where its ↩ https://github.com/containerd/nerdctl
We could have continuous battles about what tools we use. These battle, ie. religious wars, are why (successful) tech firms build their own toolchain, top to bottom.
Are we, as a team of founding engineers, willing to adopt a pragmatic perspective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Pragmatically, containers run on k8s, Oracle, and Podman. That's it. Docker is a fading bad dream. For me, what runs the containers is uninteresting, thanks to OCI. And what builds the containers is also uninteresting, as long as it's reproducible; I can learn whatever builder you want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You state it well. We should, as a firm, attempt to eliminate ambiguity aversion. We do that by working on high-value areas. Packaging is not high value unless it directly relates to distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we understand our differing points of view. I also believe we all agree that containerd
is the runtime we will use, if we need to run container workloads. All daemon (docker and a million clones including cog. In addition there are rootless variants (podman, nerdctl, and many more).
containerd
is one of many OCI
runtimes. This runtime is widely used and has wide investment across the technology ecosystem. This component provides:
-
Lifecycle Management (LCM):
- start
- stop
- delete
-
Runtime:
- exec
- logs
FWIIW,I agree, docker is and was a massive swing and a miss.
|
||
# mount nvme disk on /models | ||
sudo mkdir /models | ||
sudo mkfs.xfs /dev/nvme0n1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of destructive operations in shell scripts. Especially those that could eat the entire filesystem if in error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your assumption in line 30 sort of covers this. Sort of, as in no destructive operation should be done anywhere without extreme clarity.. I would prefer you pull this destructive operatoin out, as I could never run add_dev.sh
on beast, as it would wipe my main disk...........
mkdir /models/dev | ||
ln -s /models/dev | ||
mkdir /models/local | ||
ln -s /models/local ~/.local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this script is a little mysterious. It appears like it is meant to create local storage for development? Is that faiss
specific? If not, could you pull this file out, and put it in a different PR? Also, a comment at the top of the file describing the purpose would help.
git clone https://github.com/facebookresearch/faiss | ||
cd faiss | ||
|
||
# Configure paths and set environment variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pointless comment. The problem with these types of comments is, over time, they introduce code smells.
|
||
# Add a couple Python prerequisites | ||
pip install -U pip setuptools wheel | ||
pip install numpy swig torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 !
|
||
|
||
sudo -E apt update | ||
sudo -E apt install dkms intel-basekit -y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is dkms
explicitly needed? I know intel-basekit
installs drivers, although I don't think we require all of OneAPI
. The part we do require does not possess drivers. Is the problem that dkms
is an unspecified dependency by intel?
|
||
## Get CUDA and install it | ||
|
||
curl -sLO https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use distro packaging built by NVIDIA?
|
||
#Verify python and pytorch work | ||
|
||
python3 -c 'import torch; print(f"Is CUDA Available: {torch.cuda.is_available()}")' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the comment (f"Pytorch and CUDA are operating correctly)
. The problem is pytorch
is not installed...
here is my go. These two PRs represent two forward passes. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-home as follows:
/origin/platform
/origin/platform/build/faiss
/origin/workloads/retrofromer
Separately we should move packaging
to build
as packaging is a type of build process.
Besides the destructive operation on the disk, this looks great @rstarmer !
sudo apt install podman -y | ||
``` | ||
|
||
Run build in podman: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we understand our differing points of view. I also believe we all agree that containerd
is the runtime we will use, if we need to run container workloads. All daemon (docker and a million clones including cog. In addition there are rootless variants (podman, nerdctl, and many more).
containerd
is one of many OCI
runtimes. This runtime is widely used and has wide investment across the technology ecosystem. This component provides:
-
Lifecycle Management (LCM):
- start
- stop
- delete
-
Runtime:
- exec
- logs
FWIIW,I agree, docker is and was a massive swing and a miss.
|
||
Run build in podman: | ||
|
||
```sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console
is the only markup in .md
that github understands.
|
||
a python wheel for faiss deployment | ||
|
||
* faiss-libs.tgz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! I am pulling this PR now to see it in action!
|
||
# mount nvme disk on /models | ||
sudo mkdir /models | ||
sudo mkfs.xfs /dev/nvme0n1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your assumption in line 30 sort of covers this. Sort of, as in no destructive operation should be done anywhere without extreme clarity.. I would prefer you pull this destructive operatoin out, as I could never run add_dev.sh
on beast, as it would wipe my main disk...........
export PATH=$HOME/.local/bin:$PATH | ||
export DEBIAN_FRONTEND=noninteractive | ||
|
||
cat <<EOF | sudo tee /etc/apt/sources.list.d/debian-contrib.list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like this approach, I am borrowing it!
|
||
## Get CUDA and install it | ||
|
||
curl -sLO https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use the apt packaging? Sure, the run packaging works, was this just testing?
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb | ||
dpkg -i cuda-keyring_1.1-1_all.deb | ||
apt-get update | ||
apt-get install cmake cuda-toolkit -y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strongly conflicts with 45.
add_dev.sh: ensure disk space for build and python prerequisites
build-prereqs.sh: install prerequisites (intel, nvidia)
build-faiss.sh: build faiss libs and python (and python swig) libs
Produces a python .whl
Produces a .tar.gz of the generated libraries