Skip to content

Commit e0ae47a

Browse files
committed
readme file update
1 parent 26b961d commit e0ae47a

19 files changed

+710
-70
lines changed

Diff for: CONTRIBUTING.md

+10-7
Original file line numberDiff line numberDiff line change
@@ -22,40 +22,43 @@ Please use the following style guidelines when making contributions.
2222

2323
### Jupyter Notebooks & Markdown
2424
* When they appear inline with the text; directive names, clauses, function or subroutine names, variable names, file names, commands and command-line arguments should appear between two backticks.
25-
* Code blocks should begin with three backticks and either 'cpp' or 'fortran' to enable appropriate source formatting and end with three backticks.
25+
* Code blocks should begin with three backticks to enable appropriate source formatting and end with three backticks.
2626
* Leave an empty line before and after the codeblock.
2727
Emphasis, including quotes made for emphasis and introduction of new terms should be highlighted between a single pair of asterisks
2828
* A level 1 heading should appear at the top of the notebook as the title of the notebook.
2929
* A horizontal rule should appear between sections that begin with a level 2 heading.
3030

31-
Please refer to the following template for jupyter notebook styling in the github repository:misc/jupyter_lab_template
31+
3232

3333
## Contributing Labs/Modules
3434

35+
* Fundermantals of NeMo Megatron
36+
* P-tuning and Prompt tuning within NeMo-Megatron
37+
38+
3539
### Directory stucture for Github
3640

3741
Before starting to work on new lab it is important to follow the recommended git structure as shown below to avoid reformatting.
3842

3943
Each lab will have following files/directories consisting of training material for the lab.
4044
* jupyter_notebook folder: Consists of jupyter notebooks and its corresponding images.
41-
* source_code folder: Source codes are stored in a separate directory because sometime not all clusters may support jupyter notebooks. During such bootcamps, we should be able to use the source codes directly from this directory. Source code folder may optionally contain Makefile especially for HPC labs.
45+
* source_code folder: Source codes are stored in a separate directory because sometime not all clusters may support jupyter notebooks. During such bootcamps, we should be able to use the source codes directly from this directory.
4246
* presentations: Consists of presentations for the labs ( pdf format is preferred )
4347
* Dockerfile and Singularity: Each lab should have both Docker and Singularity recipes.
4448

45-
The lab optionally may also add custom license in case of any deviation from the top level directory license ( Apache 2.0 ). The base of the module contains individual subdirectory containing versions of the module for languages respectively(C/C++/Fortran…). Each of these directories should contain a directory for individual language translation provided (English, for instance). Each lab translation and programming language combination should have a solutions directory containing correct solutions
49+
The lab optionally may also add custom license in case of any deviation from the top level directory license ( Apache 2.0 ).
4650

47-
Additionally there are two folders "experimental" and "archived" for labs covering features which are in early access phase ( not stable ) or deprecated features repectively.
4851

4952
### Git Branching
5053

5154
Adding a new feature/lab will follow a forking workflow. Which means a feature branch development will happen on a forked repo which later gets merged into our original project (GPUHackathons.org) repository.
5255

53-
![Git Branching Workflow](workspace/jupyter_notebook/images/git_branching.jpg)
56+
![Git Branching Workflow](misc/images/git_branching.jpg)
5457

5558
The 5 main steps depicted in image above are as follows:
5659
1. Fork: To create a new lab/feature the GPUHackathons.org repository must be forked. Fork will create a snapshot of GPUHackathons.org repository at the time it was forked. Any new feature/lab that will be developed should be based on the develop branch of the repository.
5760
2. Clone: Developer can than clone this new repository to local machine
58-
Create Feature Branch: Create a new branch with a feature name in which your changes will be done. Recommend naming convention of feature branch is naming convention for branch: hpc-<feature_name>,hpc-ai-<feature_name>, ai-<feature_name>. The new changes that developer makes can be added, committed and pushed
61+
Create Feature Branch: Create a new branch with a feature name in which your changes will be done. Recommend naming convention of feature branch is naming convention for branch: end2end-nlp-<feature_name>. The new changes that developer makes can be added, committed and pushed
5962
3. Push: After the changes are committed, the developer pushes the changes to the remote branch. Push command helps the local changes to github repository
6063
4. Pull: Submit a pull request. Upon receiving pull request a Hackathon team reviewer/owner will review the changes and upon accepting it can be merged into the develop branch of GpuHacakthons.org
6164

Diff for: README.md

+3-31
Original file line numberDiff line numberDiff line change
@@ -17,36 +17,8 @@ The total bootcamp material would take approximately 8 hours. It is recommended
1717

1818
## Running using Singularity
1919

20-
To run the material using Singularity containers, follow the steps below.
20+
Update coming soon
2121

22-
To build the TAO Toolkit Singularity container, run: `singularity build --fakeroot --sandbox tao_e2enlp.simg Singularity_tao`
23-
24-
To build the RIVA client Singularity container for the Client, run:
25-
26-
27-
To download the Riva Speech Server Singularity container for the Server run: `singularity pull riva-speech:2.6.0.sif docker://nvidia/nvcr.io/nvidia/riva/riva-speech:2.6.0`
28-
29-
### Run data preprocessing and TAO notebooks
30-
31-
Run the first container with: `singularity run --fakeroot --nv -B workspace:/workspace tao_e2enlp.simg jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace`
32-
33-
The `-B` flag mounts local directories in the container filesystem and ensures changes are stored locally in the project folder. Open jupyter lab in browser: http://localhost:8888
34-
35-
You may now start working on the lab by clicking on the `Start_here.ipynb` notebook.
36-
37-
When you are done with `Data preprocessing Lab` and `2.Transfer learning with TAO Lab`, shut down jupyter lab by selecting `File > Shut Down` in the top left corner, then shut down the Singularity container by typing `exit` or pressing `ctrl + d` in the terminal window.
38-
39-
### Run Riva Speech Server
40-
41-
To activate the Riva Server container, run:
42-
```
43-
singularity run \
44-
--nv \
45-
46-
```
47-
48-
49-
### Run Riva
5022

5123

5224

@@ -165,9 +137,9 @@ tasks:
165137

166138
### Run All Notebooks
167139

168-
Activate virtualvenwrapper launcher `workon launcher` (you may be required to export path as executed in 4. above)
140+
Activate virtualvenwrapper launcher `workon launcher` (you may be required to export path as executed in 4. above)
169141

170-
You are to run the first ALL notebooks in the `launcher` environment.
142+
You are to run the ALL notebooks in the `launcher` environment.
171143

172144
Launch the jupyter lab with:
173145

Diff for: misc/images/git_branching.jpg

92 KB
Loading

Diff for: workspace/Start_Here.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
"source": [
1818
"## Overview \n",
1919
"\n",
20-
"End-to-End NLP material is designed from a real-world perspective that follows Data processing, development, and deployment pipeline paradigm. The material consist of three labs and the goal is to walk you through the a single flow of raw text `data preprocessing` and how to build a SQuAD dataset format for Question Answering, train the dataset via `NVIDIA TAO` transfer learning BERT model, and deploy using `RIVA`. Furthermore, a challenge notebook is introduced to test your understanding of the material and solidify your experience in the Question Answering (QA) domain.\n",
20+
"End-to-End NLP material is designed from a real-world perspective that follows Data processing, development, and deployment pipeline paradigm. The material consist of three labs and the goal is to walk you through the a single flow of raw text `data preprocessing` and how to build a SQuAD dataset format for Question Answering, train the dataset via `NVIDIA® TAO` transfer learning BERT model, and deploy using `RIVA`. Furthermore, a challenge notebook is introduced to test your understanding of the material and solidify your experience in the Question Answering (QA) domain.\n",
2121
"\n",
2222
"### Why End-to-End NLP?\n",
2323
"\n",
@@ -119,7 +119,7 @@
119119
"metadata": {},
120120
"outputs": [],
121121
"source": [
122-
"!nvidia-smi"
122+
"#!nvidia-smi"
123123
]
124124
},
125125
{

Diff for: workspace/jupyter_notebook/.ipynb_checkpoints/QandA_data_processing-checkpoint.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@
145145
"-\tAlready mined data from web\n",
146146
"-\tRaw data scrapping from webpages\n",
147147
"\n",
148-
"can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. \n",
148+
"A training dataset is generated based input data format of the training model. \n",
149149
"\n",
150150
"**Text document**: This could be a file containing either a description or definitions, or an essay about a domain / topic. The content is usually in sentences arranged in paragraphs. Each paragraph may form a context where questions and answers can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. The document text does not require further processing for questions and answer to be extracted.\n",
151151
"\n",

Diff for: workspace/jupyter_notebook/.ipynb_checkpoints/qa-riva-deployment-checkpoint.ipynb

+22-3
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@
102102
"#SIGULARITY_CONTAINER = \"riva-speech:2.6.0-servicemaker.sif\"\n",
103103
"\n",
104104
"# Directory where the .riva model is stored $MODEL_LOC/*.riva\n",
105-
"MODEL_LOC = \"~/Documents/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
105+
"MODEL_LOC = \"~/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
106106
"\n",
107107
"# Name of the .riva file\n",
108108
"MODEL_NAME = \"qa-model.riva\"\n",
@@ -293,7 +293,7 @@
293293
"metadata": {},
294294
"outputs": [],
295295
"source": [
296-
"RIVA_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
296+
"RIVA_DIR = \"~/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
297297
]
298298
},
299299
{
@@ -928,6 +928,20 @@
928928
"!docker stop $(docker ps -a -q)"
929929
]
930930
},
931+
{
932+
"cell_type": "markdown",
933+
"metadata": {},
934+
"source": [
935+
"---"
936+
]
937+
},
938+
{
939+
"cell_type": "markdown",
940+
"metadata": {},
941+
"source": [
942+
"**It is advisable not to run the section below in a Bootcamp session as it takes lot of time to get executed**"
943+
]
944+
},
931945
{
932946
"cell_type": "markdown",
933947
"metadata": {},
@@ -1145,7 +1159,12 @@
11451159
"\n",
11461160
"https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html\n",
11471161
"\n",
1148-
"https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html"
1162+
"https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html\n",
1163+
"\n",
1164+
"---\n",
1165+
"## Licensing\n",
1166+
"\n",
1167+
"Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
11491168
]
11501169
},
11511170
{

Diff for: workspace/jupyter_notebook/.ipynb_checkpoints/question-answering-training-checkpoint.ipynb

+18-8
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,7 @@
315315
"source": [
316316
"# IMPORTANT NOTE: Set path to a folder where you want you data to be saved\n",
317317
"#DATA_DOWNLOAD_DIR = \"/workspace/data\"\n",
318-
"DATA_DOWNLOAD_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/data/\""
318+
"DATA_DOWNLOAD_DIR = \"~/End-to-End-NLP/workspace/data/\""
319319
]
320320
},
321321
{
@@ -460,19 +460,19 @@
460460
"{\n",
461461
" \"Mounts\":[\n",
462462
" {\n",
463-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/data\",\n",
463+
" \"source\": \"~/End-to-End-NLP/workspace/data\",\n",
464464
" \"destination\": \"/data\"\n",
465465
" },\n",
466466
" {\n",
467-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/specs\",\n",
467+
" \"source\": \"~/End-to-End-NLP/workspace/specs\",\n",
468468
" \"destination\": \"/specs\"\n",
469469
" },\n",
470470
" {\n",
471-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/results\",\n",
471+
" \"source\": \"~/End-to-End-NLP/workspace/results\",\n",
472472
" \"destination\": \"/results\"\n",
473473
" },\n",
474474
" {\n",
475-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/.cache\",\n",
475+
" \"source\": \"~/End-to-End-NLP/workspace/.cache\",\n",
476476
" \"destination\": \"/root/.cache\"\n",
477477
" }\n",
478478
" ]\n",
@@ -494,9 +494,9 @@
494494
"outputs": [],
495495
"source": [
496496
"# Make sure the source directories exist, if not, create them\n",
497-
"! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/specs\n",
498-
"! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/results\n",
499-
"! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/.cache\n"
497+
"! mkdir ~/End-to-End-NLP/workspace/specs\n",
498+
"! mkdir ~/End-to-End-NLP/workspace/results\n",
499+
"! mkdir ~/End-to-End-NLP/workspace/.cache\n"
500500
]
501501
},
502502
{
@@ -1101,6 +1101,16 @@
11011101
"You could use TAO to build custom models for your own applications, or you could deploy the custom model to Nvidia Riva!"
11021102
]
11031103
},
1104+
{
1105+
"cell_type": "markdown",
1106+
"metadata": {},
1107+
"source": [
1108+
"---\n",
1109+
"## Licensing\n",
1110+
"\n",
1111+
"Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
1112+
]
1113+
},
11041114
{
11051115
"cell_type": "markdown",
11061116
"metadata": {},

Diff for: workspace/jupyter_notebook/QandA_data_processing.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@
145145
"-\tAlready mined data from web\n",
146146
"-\tRaw data scrapping from webpages\n",
147147
"\n",
148-
"can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. \n",
148+
"A training dataset is generated based on input data format of the training model. \n",
149149
"\n",
150150
"**Text document**: This could be a file containing either a description or definitions, or an essay about a domain / topic. The content is usually in sentences arranged in paragraphs. Each paragraph may form a context where questions and answers can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. The document text does not require further processing for questions and answer to be extracted.\n",
151151
"\n",

Diff for: workspace/jupyter_notebook/challenge.ipynb

+10
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,16 @@
590590
"You could train your own custom models in TAO and deploy them in Riva! You could scale up your deployment using Kubernetes with the Riva AI Services Helm Chart, which will pull the relevant Images and download model artifacts from NGC, generate the model repository, start and expose the Riva speech services."
591591
]
592592
},
593+
{
594+
"cell_type": "markdown",
595+
"metadata": {},
596+
"source": [
597+
"---\n",
598+
"## Licensing\n",
599+
"\n",
600+
"Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
601+
]
602+
},
593603
{
594604
"cell_type": "markdown",
595605
"metadata": {},

Diff for: workspace/jupyter_notebook/qa-riva-deployment.ipynb

+22-3
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@
102102
"#SIGULARITY_CONTAINER = \"riva-speech:2.6.0-servicemaker.sif\"\n",
103103
"\n",
104104
"# Directory where the .riva model is stored $MODEL_LOC/*.riva\n",
105-
"MODEL_LOC = \"~/Documents/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
105+
"MODEL_LOC = \"~/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
106106
"\n",
107107
"# Name of the .riva file\n",
108108
"MODEL_NAME = \"qa-model.riva\"\n",
@@ -293,7 +293,7 @@
293293
"metadata": {},
294294
"outputs": [],
295295
"source": [
296-
"RIVA_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
296+
"RIVA_DIR = \"~/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
297297
]
298298
},
299299
{
@@ -928,6 +928,20 @@
928928
"!docker stop $(docker ps -a -q)"
929929
]
930930
},
931+
{
932+
"cell_type": "markdown",
933+
"metadata": {},
934+
"source": [
935+
"---"
936+
]
937+
},
938+
{
939+
"cell_type": "markdown",
940+
"metadata": {},
941+
"source": [
942+
"**It is advisable not to run the section below in a Bootcamp session as it takes lot of time to get executed**"
943+
]
944+
},
931945
{
932946
"cell_type": "markdown",
933947
"metadata": {},
@@ -1145,7 +1159,12 @@
11451159
"\n",
11461160
"https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html\n",
11471161
"\n",
1148-
"https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html"
1162+
"https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html\n",
1163+
"\n",
1164+
"---\n",
1165+
"## Licensing\n",
1166+
"\n",
1167+
"Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
11491168
]
11501169
},
11511170
{

Diff for: workspace/jupyter_notebook/question-answering-training.ipynb

+18-8
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,7 @@
315315
"source": [
316316
"# IMPORTANT NOTE: Set path to a folder where you want you data to be saved\n",
317317
"#DATA_DOWNLOAD_DIR = \"/workspace/data\"\n",
318-
"DATA_DOWNLOAD_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/data/\""
318+
"DATA_DOWNLOAD_DIR = \"~/End-to-End-NLP/workspace/data/\""
319319
]
320320
},
321321
{
@@ -460,19 +460,19 @@
460460
"{\n",
461461
" \"Mounts\":[\n",
462462
" {\n",
463-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/data\",\n",
463+
" \"source\": \"~/End-to-End-NLP/workspace/data\",\n",
464464
" \"destination\": \"/data\"\n",
465465
" },\n",
466466
" {\n",
467-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/specs\",\n",
467+
" \"source\": \"~/End-to-End-NLP/workspace/specs\",\n",
468468
" \"destination\": \"/specs\"\n",
469469
" },\n",
470470
" {\n",
471-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/results\",\n",
471+
" \"source\": \"~/End-to-End-NLP/workspace/results\",\n",
472472
" \"destination\": \"/results\"\n",
473473
" },\n",
474474
" {\n",
475-
" \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/.cache\",\n",
475+
" \"source\": \"~/End-to-End-NLP/workspace/.cache\",\n",
476476
" \"destination\": \"/root/.cache\"\n",
477477
" }\n",
478478
" ]\n",
@@ -494,9 +494,9 @@
494494
"outputs": [],
495495
"source": [
496496
"# Make sure the source directories exist, if not, create them\n",
497-
"! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/specs\n",
498-
"! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/results\n",
499-
"! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/.cache\n"
497+
"! mkdir ~/End-to-End-NLP/workspace/specs\n",
498+
"! mkdir ~/End-to-End-NLP/workspace/results\n",
499+
"! mkdir ~/End-to-End-NLP/workspace/.cache\n"
500500
]
501501
},
502502
{
@@ -1101,6 +1101,16 @@
11011101
"You could use TAO to build custom models for your own applications, or you could deploy the custom model to Nvidia Riva!"
11021102
]
11031103
},
1104+
{
1105+
"cell_type": "markdown",
1106+
"metadata": {},
1107+
"source": [
1108+
"---\n",
1109+
"## Licensing\n",
1110+
"\n",
1111+
"Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
1112+
]
1113+
},
11041114
{
11051115
"cell_type": "markdown",
11061116
"metadata": {},

0 commit comments

Comments
 (0)