diff --git a/docs/src/_config.yml b/docs/src/_config.yml index e55f144..508aff9 100644 --- a/docs/src/_config.yml +++ b/docs/src/_config.yml @@ -31,3 +31,7 @@ html: use_issues_button: false use_repository_button: true favicon : "favicon.ico" + +sphinx: + config: + html_show_copyright: false \ No newline at end of file diff --git a/docs/src/part1/getting_started.md b/docs/src/part1/getting_started.md index 1d02a71..099b977 100644 --- a/docs/src/part1/getting_started.md +++ b/docs/src/part1/getting_started.md @@ -15,8 +15,9 @@ The project is structured as follows: ├── detrex # fork of detrex ├── docs # documentation ├── logs -├── notebooks # jupyter notebooks +├── notebooks # experimental jupyter notebooks ├── output # [Training] `scripts.train_net` outputs (tensorboard logs, weights, etc) +├── outputs # [Compilation] `scripts.export_tensorrt` outputs (exported model, logs, etc) ├── projects # configurations and model definitions ├── scripts # utility scripts ├── src # python source code @@ -96,4 +97,16 @@ To point the `detectron2` library to the dataset directory, we need to set the ` ```bash conda env config vars set DETECTRON2_DATASETS=~/datasets conda activate cu124 -``` \ No newline at end of file +``` + + +(part1:downloadmodel)= +## Downloading trained model (for compilation and evaluation) + +To download the final trained model, download the trained model weights from HuggingFace and place them on `artifacts/model_final.pth` with the following command: + +```bash +wget https://huggingface.co/dgcnz/dinov2_vitdet_DINO_12ep/resolve/main/model_final.pth -O artifacts/model_final.pth ⁠ +``` + +This is a necessary step for compilation and running benchmarks later on. \ No newline at end of file diff --git a/docs/src/part1/knowledgetransfer.png b/docs/src/part1/knowledgetransfer.png new file mode 100644 index 0000000..ce66fcb Binary files /dev/null and b/docs/src/part1/knowledgetransfer.png differ diff --git a/docs/src/part1/mim.png b/docs/src/part1/mim.png new file mode 100644 index 0000000..e94872f Binary files /dev/null and b/docs/src/part1/mim.png differ diff --git a/docs/src/part1/problem.md b/docs/src/part1/problem.md index 995a5b9..006eaae 100644 --- a/docs/src/part1/problem.md +++ b/docs/src/part1/problem.md @@ -1,25 +1,37 @@ # Problem Definition -TODO -- [ ] Rewrite slides in text format +```{contents} +``` ---- +## Motivation -Training a model from scratch requires a labeled dataset that covers the target domain (🌧️, 🌓, 👁️, …). +Currently, the most popular approach for deploying an object detection model in a production environment is to use YOLO because it's fast and easy to use. However, to make it usable for your specific task you need to recollect a domain-specific dataset and fine-tune or train the model from scratch. This is a time-consuming and expensive process because such a dataset needs to be comprehensive enough to cover most of the possible scenarios that the model will encounter in the real world (weather conditions, different object scales and textures, etc). -Curating such a dataset is hard and expensive. +Recently, a new paradigm shift has emerged in the field as described by {cite}`bommasani2022`: instead of training a model from scratch for a specific task, you can use a model that was pre-trained on a generic task with massive data and compute as a backbone for your model and only fine-tune a decoder/head for your specific task (see {numref}`Figure {number} `). These pre-trained models are called Foundation Models and are great at capturing features that are useful for a wide range of tasks. -However, what if we could rely on a model pre-trained on 141 million images? +For example, we can train a vision model to predict missing patches in an image (see {numref}`Figure {number} `) and then fine-tune it for pedestrian detection. ---- +:::::{grid} 2 +:::{grid-item-card} +:::{figure-md} knowledgetransfer +knowledgetransfer -Recent paradigm shift, from deep learning to foundation models (Bommasani, 2022): -DL: Train a model for a specific task and dataset (e.g. object detection of blood cells) +Transfer Learning is the process of reusing the knowledge acquired by a network trained in a source task for a network's target task. {cite}`hatakeyama2023` +::: +::: +:::{grid-item-card} +:::{figure-md} mim +mim -FOMO: Pre-train a large model with a huge unlabeled dataset on a generic task (e.g. predicting missing patches on an image) and then adapt it for downstream tasks -Adaptations include: fine-tuning, decoder training, distillation, quantization, sparsification, etc. +Masked Image Modelling is a self-supervised objective that consists in predicting missing patches from an image. {cite}`mae2021` +::: +::: +::::: ---- +The drawback of using these foundation models is that they are large and computationally expensive, which makes them unsuitable for deployment in production environments, especially on edge devices. To address this issue, we need to optimize these models. + + +## Objectives According to {cite}`mcip`, practitioners at Apple do the following when asked to deploy a model to some edge device. Find a feasibility model A. @@ -30,5 +42,5 @@ Compress model B to reach production-ready model C. :::{figure-md} apple_practice -Hi +Caption ::: \ No newline at end of file diff --git a/docs/src/part2/choosing.md b/docs/src/part2/choosing.md index 2ee7ea5..a30e4c6 100644 --- a/docs/src/part2/choosing.md +++ b/docs/src/part2/choosing.md @@ -1,5 +1,8 @@ # Choosing a candidate model +```{contents} +``` + Our task in this chapter is to choose a candidate architecture that allows us to use pre-trained vision foundation models as their backbone's feature extractor. diff --git a/docs/src/part3/compilation.ipynb b/docs/src/part3/compilation.ipynb index 4599030..60486a8 100644 --- a/docs/src/part3/compilation.ipynb +++ b/docs/src/part3/compilation.ipynb @@ -725,11 +725,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Before compilation, download the trained model weights from HuggingFace and place them on `artifacts/model_final.pth` or configure the path in the config file. To download the weights, run the following command:\n", - "\n", - "```sh\n", - "!wget https://huggingface.co/dgcnz/dinov2_vitdet_DINO_12ep/resolve/main/model_final.pth -O artifacts/model_final.pth ⁠\n", - "```\n", + "Before compilation, make sure you have followed the instructions at {ref}`part1:downloadmodel`.\n", "\n", "The main script to compile our model with the TensorRT backend is `scripts.export_tensorrt`.\n", "\n", diff --git a/docs/src/part3/results.md b/docs/src/part3/results.md index 642ca12..65dda10 100644 --- a/docs/src/part3/results.md +++ b/docs/src/part3/results.md @@ -1,5 +1,8 @@ # Benchmarks and Results +```{contents} +``` + ## Running the benchmarks Download the model: diff --git a/docs/src/references.bib b/docs/src/references.bib index 6fe0c6a..8bc3687 100644 --- a/docs/src/references.bib +++ b/docs/src/references.bib @@ -123,4 +123,32 @@ @misc{pytorchTorchexportSpecification howpublished = {\url{https://pytorch.org/docs/main/export.ir_spec.html}}, year = {}, note = {[Accessed 25-10-2024]}, +} + +@misc{bommasani2022, + title={On the Opportunities and Risks of Foundation Models}, + author={Rishi Bommasani and Drew A. Hudson and Ehsan Adeli and Russ Altman and Simran Arora and Sydney von Arx and Michael S. Bernstein and Jeannette Bohg and Antoine Bosselut and Emma Brunskill and Erik Brynjolfsson and Shyamal Buch and Dallas Card and Rodrigo Castellon and Niladri Chatterji and Annie Chen and Kathleen Creel and Jared Quincy Davis and Dora Demszky and Chris Donahue and Moussa Doumbouya and Esin Durmus and Stefano Ermon and John Etchemendy and Kawin Ethayarajh and Li Fei-Fei and Chelsea Finn and Trevor Gale and Lauren Gillespie and Karan Goel and Noah Goodman and Shelby Grossman and Neel Guha and Tatsunori Hashimoto and Peter Henderson and John Hewitt and Daniel E. Ho and Jenny Hong and Kyle Hsu and Jing Huang and Thomas Icard and Saahil Jain and Dan Jurafsky and Pratyusha Kalluri and Siddharth Karamcheti and Geoff Keeling and Fereshte Khani and Omar Khattab and Pang Wei Koh and Mark Krass and Ranjay Krishna and Rohith Kuditipudi and Ananya Kumar and Faisal Ladhak and Mina Lee and Tony Lee and Jure Leskovec and Isabelle Levent and Xiang Lisa Li and Xuechen Li and Tengyu Ma and Ali Malik and Christopher D. Manning and Suvir Mirchandani and Eric Mitchell and Zanele Munyikwa and Suraj Nair and Avanika Narayan and Deepak Narayanan and Ben Newman and Allen Nie and Juan Carlos Niebles and Hamed Nilforoshan and Julian Nyarko and Giray Ogut and Laurel Orr and Isabel Papadimitriou and Joon Sung Park and Chris Piech and Eva Portelance and Christopher Potts and Aditi Raghunathan and Rob Reich and Hongyu Ren and Frieda Rong and Yusuf Roohani and Camilo Ruiz and Jack Ryan and Christopher Ré and Dorsa Sadigh and Shiori Sagawa and Keshav Santhanam and Andy Shih and Krishnan Srinivasan and Alex Tamkin and Rohan Taori and Armin W. Thomas and Florian Tramèr and Rose E. Wang and William Wang and Bohan Wu and Jiajun Wu and Yuhuai Wu and Sang Michael Xie and Michihiro Yasunaga and Jiaxuan You and Matei Zaharia and Michael Zhang and Tianyi Zhang and Xikun Zhang and Yuhui Zhang and Lucia Zheng and Kaitlyn Zhou and Percy Liang}, + year={2022}, + eprint={2108.07258}, + archivePrefix={arXiv}, + primaryClass={cs.LG}, + url={https://arxiv.org/abs/2108.07258}, +} + +@article{hatakeyama2023, +author = {Hatakeyama, Tomoyuki and Wang, Xueting and Yamasaki, Toshihiko}, +year = {2023}, +month = {08}, +pages = {1-15}, +title = {Transferability prediction among classification and regression tasks using optimal transport}, +volume = {83}, +journal = {Multimedia Tools and Applications}, +doi = {10.1007/s11042-023-15852-6} +} + +@Article{mae2021, + author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick}, + journal = {arXiv:2111.06377}, + title = {Masked Autoencoders Are Scalable Vision Learners}, + year = {2021}, } \ No newline at end of file