This is a sample implementation of MLFlow on a tensorflow training workflow, where its logs is stored in a sql database. As I intend to build upon this to be a full fledged mlops workflow, this readme will change and the purpose of this repo will ultimately change as well
- Make sure mlflow server is running by
mlflow server
- Entrypoint with `Training/u2net_trainig_TF.py
- You may use
scripts/init_mlflow.sh
to run both
- tensorflow == 2.10.1
- CUDA version == 11.2
- CuDNN version == 8.6.*
- Tensorflow dataset
- Mlflow == 2.1.1
- Tensorboard == 2.10.1
- Matplotlib
pip install -r requirements.txt
pip install -e .
. scripts/init_mlflow.sh
- Adding CVAT and Fiftyone for dataset management and annotation capabilities
- Adding REST API to mlflow for external deployments
- Adding model comparisons and auto-stage changes and deployments
- Adding model serving via REST API
- Adding CI and test cases
- Dockerized mlflow server and training sequence. To run them, now run `docker compose -f deployment/docker-compose.yaml up -d'
- mlflow server will use 51.0.0.4:5000 as its internal IP and port.
- Terraform is now supported for kubernetes deployment. Make sure you have terraform installed see how here
- Support for kubernetes complete. Tested with minikube.
- Make sure you build the docker images into minikube's docker daemon
- Make sure the docker image names are the same as listed in
training-development.yaml
andmlflow-development.yaml
- Apply for mlflow-development.yaml for pod
- Apply for mlflow-service to fix internal IP
- port-forward with kubectl, then run localhost:5000 to ensure mlflow is running
- Apply for
training-development.yaml
- Give it a while to downloade necessary dataset. After ~10minutes, see logs to check training, and a new experiment should be made available in the localhost:5000