openhackathons-org
diff --git a/Diff for: ‎CONTRIBUTING.md
+10-7 b/Diff for: ‎CONTRIBUTING.md
+10-7
diff --git a/Diff for: ‎README.md
+3-31 b/Diff for: ‎README.md
+3-31
diff --git a/Diff for: ‎misc/images/git_branching.jpg
92 KB b/Diff for: ‎misc/images/git_branching.jpg
92 KB
diff --git a/Diff for: ‎workspace/Start_Here.ipynb
+2-2 b/Diff for: ‎workspace/Start_Here.ipynb
+2-2
diff --git a/Diff for: ‎workspace/jupyter_notebook/.ipynb_checkpoints/QandA_data_processing-checkpoint.ipynb
+1-1 b/Diff for: ‎workspace/jupyter_notebook/.ipynb_checkpoints/QandA_data_processing-checkpoint.ipynb
+1-1
diff --git a/Diff for: ‎workspace/jupyter_notebook/.ipynb_checkpoints/qa-riva-deployment-checkpoint.ipynb
+22-3 b/Diff for: ‎workspace/jupyter_notebook/.ipynb_checkpoints/qa-riva-deployment-checkpoint.ipynb
+22-3
diff --git a/Diff for: ‎workspace/jupyter_notebook/.ipynb_checkpoints/question-answering-training-checkpoint.ipynb
+18-8 b/Diff for: ‎workspace/jupyter_notebook/.ipynb_checkpoints/question-answering-training-checkpoint.ipynb
+18-8
diff --git a/Diff for: ‎workspace/jupyter_notebook/QandA_data_processing.ipynb
+1-1 b/Diff for: ‎workspace/jupyter_notebook/QandA_data_processing.ipynb
+1-1
diff --git a/Diff for: ‎workspace/jupyter_notebook/challenge.ipynb
+10 b/Diff for: ‎workspace/jupyter_notebook/challenge.ipynb
+10
diff --git a/Diff for: ‎workspace/jupyter_notebook/qa-riva-deployment.ipynb
+22-3 b/Diff for: ‎workspace/jupyter_notebook/qa-riva-deployment.ipynb
+22-3
diff --git a/Diff for: ‎workspace/jupyter_notebook/question-answering-training.ipynb
+18-8 b/Diff for: ‎workspace/jupyter_notebook/question-answering-training.ipynb
+18-8
@@ -22,40 +22,43 @@ Please use the following style guidelines when making contributions.
 
 ### Jupyter Notebooks & Markdown
 * When they appear inline with the text; directive names, clauses, function or subroutine names, variable names, file names, commands and command-line arguments should appear between two backticks.
-* Code blocks should begin with three backticks and either 'cpp' or 'fortran' to enable appropriate source formatting and end with three backticks.
+* Code blocks should begin with three backticks to enable appropriate source formatting and end with three backticks.
 * Leave an empty line before and after the codeblock.
 Emphasis, including quotes made for emphasis and introduction of new terms should be highlighted between a single pair of asterisks
 * A level 1 heading should appear at the top of the notebook as the title of the notebook.
 * A horizontal rule should appear between sections that begin with a level 2 heading.
 
-Please refer to the following template for jupyter notebook styling in the github repository:misc/jupyter_lab_template 
+
 
 ## Contributing Labs/Modules
 
+* Fundermantals of NeMo Megatron
+* P-tuning and Prompt tuning within NeMo-Megatron
+
+
 ### Directory stucture for Github
 
 Before starting to work on new lab it is important to follow the recommended git structure as shown below to avoid reformatting.
 
 Each lab will have following files/directories consisting of training material for the lab.
 * jupyter_notebook folder: Consists of jupyter notebooks and its corresponding images.  
-* source_code folder: Source codes are stored in a separate directory because sometime not all clusters may support jupyter notebooks. During such bootcamps, we should be able to use the source codes directly from this directory. Source code folder may optionally contain Makefile especially for HPC labs. 
+* source_code folder: Source codes are stored in a separate directory because sometime not all clusters may support jupyter notebooks. During such bootcamps, we should be able to use the source codes directly from this directory. 
 * presentations: Consists of presentations for the labs ( pdf format is preferred )
 * Dockerfile and Singularity: Each lab should have both Docker and Singularity recipes.
 
-The lab optionally may also add custom license in case of any deviation from the top level directory license ( Apache 2.0 ). The base of the module contains individual subdirectory containing versions of the module for languages respectively(C/C++/Fortran…). Each of these directories should contain a directory for individual language translation provided (English, for instance). Each lab translation and programming language combination should have a solutions directory containing correct solutions
+The lab optionally may also add custom license in case of any deviation from the top level directory license ( Apache 2.0 ).
 
-Additionally there are two folders "experimental" and  "archived" for labs covering features which are in early access phase ( not stable ) or deprecated features repectively.
 
 ### Git Branching
 
 Adding a new feature/lab will follow a forking workflow. Which means a feature branch development will happen on a forked repo which later gets merged into our original project (GPUHackathons.org) repository.
 
-![Git Branching Workflow](workspace/jupyter_notebook/images/git_branching.jpg)
+![Git Branching Workflow](misc/images/git_branching.jpg)
 
 The 5 main steps depicted in image above are as follows:
 1. Fork: To create a new lab/feature the GPUHackathons.org repository must be forked. Fork will create a snapshot of GPUHackathons.org repository at the time it was forked. Any new feature/lab that will be developed should be based on the develop branch of the repository.
 2.  Clone: Developer can than clone this new repository to local machine
-Create Feature Branch: Create a new branch with a feature name in which your changes will be done. Recommend naming convention of feature branch is naming convention for branch: hpc-<feature_name>,hpc-ai-<feature_name>, ai-<feature_name>. The new changes that developer makes can be added, committed and pushed
+Create Feature Branch: Create a new branch with a feature name in which your changes will be done. Recommend naming convention of feature branch is naming convention for branch: end2end-nlp-<feature_name>. The new changes that developer makes can be added, committed and pushed
 3. Push: After the changes are committed, the developer pushes the changes to the remote branch. Push command helps the local changes to github repository
 4. Pull: Submit a pull request. Upon receiving pull request a Hackathon team reviewer/owner will review the changes and upon accepting it can be merged into the develop branch of GpuHacakthons.org
 
 
@@ -17,36 +17,8 @@ The total bootcamp material would take approximately 8 hours. It is recommended
 
 ## Running using Singularity
 
-To run the material using Singularity containers, follow the steps below.
+Update coming soon
 
-To build the TAO Toolkit Singularity container, run: `singularity build --fakeroot --sandbox tao_e2enlp.simg Singularity_tao`
-
-To build the RIVA client Singularity container for the Client, run: 
-
-
-To download the Riva Speech Server Singularity container for the Server run: `singularity pull riva-speech:2.6.0.sif docker://nvidia/nvcr.io/nvidia/riva/riva-speech:2.6.0`
-
-### Run data preprocessing and TAO notebooks
-
-Run the first container with: `singularity run --fakeroot --nv -B workspace:/workspace tao_e2enlp.simg jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace`
-
-The `-B` flag mounts local directories in the container filesystem and ensures changes are stored locally in the project folder. Open jupyter lab in browser: http://localhost:8888 
-
-You may now start working on the lab by clicking on the `Start_here.ipynb` notebook.
-
-When you are done with `Data preprocessing Lab` and `2.Transfer learning with TAO Lab`, shut down jupyter lab by selecting `File > Shut Down` in the top left corner, then shut down the Singularity container by typing `exit` or pressing `ctrl + d` in the terminal window.
-
-### Run Riva Speech Server
-
-To activate the Riva Server container, run:
-```
-singularity run \
-  --nv \
-  
-```
- 
-
-### Run Riva 
 
 
 
@@ -165,9 +137,9 @@ tasks:
 
 ### Run All Notebooks
 
-Activate virtualvenwrapper launcher `workon launcher` (you may be required to export path as executed in 4. above)
+Activate virtualvenwrapper launcher `workon launcher` (you may be required to export path as executed in 4. above) 
 
-You are to run the first ALL notebooks in the `launcher` environment.
+You are to run the ALL notebooks in the `launcher` environment.
 
 Launch the jupyter lab with:
 
 
@@ -17,7 +17,7 @@
    "source": [
     "## Overview  \n",
     "\n",
-    "End-to-End NLP material is designed from a real-world perspective that follows Data processing, development, and deployment pipeline paradigm. The material consist of three labs and the goal is to walk you through the a single flow of raw text `data preprocessing` and how to build a SQuAD dataset format for Question Answering, train the dataset via `NVIDIA TAO` transfer learning BERT model, and deploy using `RIVA`. Furthermore, a challenge notebook is introduced to test your understanding of the material and solidify your experience in the Question Answering (QA) domain.\n",
+    "End-to-End NLP material is designed from a real-world perspective that follows Data processing, development, and deployment pipeline paradigm. The material consist of three labs and the goal is to walk you through the a single flow of raw text `data preprocessing` and how to build a SQuAD dataset format for Question Answering, train the dataset via `NVIDIA® TAO` transfer learning BERT model, and deploy using `RIVA`. Furthermore, a challenge notebook is introduced to test your understanding of the material and solidify your experience in the Question Answering (QA) domain.\n",
     "\n",
     "### Why End-to-End NLP?\n",
     "\n",
@@ -119,7 +119,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!nvidia-smi"
+    "#!nvidia-smi"
    ]
   },
   {
 
@@ -145,7 +145,7 @@
     "-\tAlready mined data from web\n",
     "-\tRaw data scrapping from webpages\n",
     "\n",
-    "can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. \n",
+    "A training dataset is generated based input data format of the training model. \n",
     "\n",
     "**Text document**: This could be a file containing either a description or definitions, or an essay about a domain / topic. The content is usually in sentences arranged in paragraphs. Each paragraph may form a context where questions and answers can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. The document text does not require further processing for questions and answer to be extracted.\n",
     "\n",
 
@@ -102,7 +102,7 @@
     "#SIGULARITY_CONTAINER = \"riva-speech:2.6.0-servicemaker.sif\"\n",
     "\n",
     "# Directory where the .riva model is stored $MODEL_LOC/*.riva\n",
-    "MODEL_LOC = \"~/Documents/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
+    "MODEL_LOC = \"~/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
     "\n",
     "# Name of the .riva file\n",
     "MODEL_NAME = \"qa-model.riva\"\n",
@@ -293,7 +293,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "RIVA_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
+    "RIVA_DIR = \"~/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
    ]
   },
   {
@@ -928,6 +928,20 @@
     "!docker stop $(docker ps -a -q)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**It is advisable not to run the section below in a Bootcamp session as it takes lot of time to get executed**"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1145,7 +1159,12 @@
     "\n",
     "https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html\n",
     "\n",
-    "https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html"
+    "https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html\n",
+    "\n",
+    "---\n",
+    "## Licensing\n",
+    "\n",
+    "Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
    ]
   },
   {
 
@@ -315,7 +315,7 @@
    "source": [
     "# IMPORTANT NOTE: Set path to a folder where you want you data to be saved\n",
     "#DATA_DOWNLOAD_DIR = \"/workspace/data\"\n",
-    "DATA_DOWNLOAD_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/data/\""
+    "DATA_DOWNLOAD_DIR = \"~/End-to-End-NLP/workspace/data/\""
    ]
   },
   {
@@ -460,19 +460,19 @@
     "{\n",
     "   \"Mounts\":[\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/data\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/data\",\n",
     "           \"destination\": \"/data\"\n",
     "       },\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/specs\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/specs\",\n",
     "           \"destination\": \"/specs\"\n",
     "       },\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/results\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/results\",\n",
     "           \"destination\": \"/results\"\n",
     "       },\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/.cache\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/.cache\",\n",
     "           \"destination\": \"/root/.cache\"\n",
     "       }\n",
     "   ]\n",
@@ -494,9 +494,9 @@
    "outputs": [],
    "source": [
     "# Make sure the source directories exist, if not, create them\n",
-    "! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/specs\n",
-    "! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/results\n",
-    "! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/.cache\n"
+    "! mkdir ~/End-to-End-NLP/workspace/specs\n",
+    "! mkdir ~/End-to-End-NLP/workspace/results\n",
+    "! mkdir ~/End-to-End-NLP/workspace/.cache\n"
    ]
   },
   {
@@ -1101,6 +1101,16 @@
     "You could use TAO to build custom models for your own applications, or you could deploy the custom model to Nvidia Riva!"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Licensing\n",
+    "\n",
+    "Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
 
@@ -145,7 +145,7 @@
     "-\tAlready mined data from web\n",
     "-\tRaw data scrapping from webpages\n",
     "\n",
-    "can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. \n",
+    "A training dataset is generated based on input data format of the training model. \n",
     "\n",
     "**Text document**: This could be a file containing either a description or definitions, or an essay about a domain / topic. The content is usually in sentences arranged in paragraphs. Each paragraph may form a context where questions and answers can be drawn from. Depending on the choice of target training model input data format, training dataset is generated. The document text does not require further processing for questions and answer to be extracted.\n",
     "\n",
 
@@ -590,6 +590,16 @@
     "You could train your own custom models in TAO and deploy them in Riva! You could scale up your deployment using Kubernetes with the Riva AI Services Helm Chart, which will pull the relevant Images and download model artifacts from NGC, generate the model repository, start and expose the Riva speech services."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Licensing\n",
+    "\n",
+    "Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
 
@@ -102,7 +102,7 @@
     "#SIGULARITY_CONTAINER = \"riva-speech:2.6.0-servicemaker.sif\"\n",
     "\n",
     "# Directory where the .riva model is stored $MODEL_LOC/*.riva\n",
-    "MODEL_LOC = \"~/Documents/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
+    "MODEL_LOC = \"~/End-to-End-NLP/workspace/results/questions_answering/export_riva\"\n",
     "\n",
     "# Name of the .riva file\n",
     "MODEL_NAME = \"qa-model.riva\"\n",
@@ -293,7 +293,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "RIVA_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
+    "RIVA_DIR = \"~/End-to-End-NLP/workspace/source_code/riva_quickstart_v2.6.0\""
    ]
   },
   {
@@ -928,6 +928,20 @@
     "!docker stop $(docker ps -a -q)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**It is advisable not to run the section below in a Bootcamp session as it takes lot of time to get executed**"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1145,7 +1159,12 @@
     "\n",
     "https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html\n",
     "\n",
-    "https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html"
+    "https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html\n",
+    "\n",
+    "---\n",
+    "## Licensing\n",
+    "\n",
+    "Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
    ]
   },
   {
 
@@ -315,7 +315,7 @@
    "source": [
     "# IMPORTANT NOTE: Set path to a folder where you want you data to be saved\n",
     "#DATA_DOWNLOAD_DIR = \"/workspace/data\"\n",
-    "DATA_DOWNLOAD_DIR = \"/home/tosin/Documents/End-to-End-NLP/workspace/data/\""
+    "DATA_DOWNLOAD_DIR = \"~/End-to-End-NLP/workspace/data/\""
    ]
   },
   {
@@ -460,19 +460,19 @@
     "{\n",
     "   \"Mounts\":[\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/data\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/data\",\n",
     "           \"destination\": \"/data\"\n",
     "       },\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/specs\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/specs\",\n",
     "           \"destination\": \"/specs\"\n",
     "       },\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/results\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/results\",\n",
     "           \"destination\": \"/results\"\n",
     "       },\n",
     "       {\n",
-    "           \"source\": \"/home/tosin/Documents/End-to-End-NLP/workspace/.cache\",\n",
+    "           \"source\": \"~/End-to-End-NLP/workspace/.cache\",\n",
     "           \"destination\": \"/root/.cache\"\n",
     "       }\n",
     "   ]\n",
@@ -494,9 +494,9 @@
    "outputs": [],
    "source": [
     "# Make sure the source directories exist, if not, create them\n",
-    "! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/specs\n",
-    "! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/results\n",
-    "! mkdir /home/tosin/Documents/End-to-End-NLP/workspace/.cache\n"
+    "! mkdir ~/End-to-End-NLP/workspace/specs\n",
+    "! mkdir ~/End-to-End-NLP/workspace/results\n",
+    "! mkdir ~/End-to-End-NLP/workspace/.cache\n"
    ]
   },
   {
@@ -1101,6 +1101,16 @@
     "You could use TAO to build custom models for your own applications, or you could deploy the custom model to Nvidia Riva!"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Licensing\n",
+    "\n",
+    "Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},