Skip to content

Commit

Permalink
Merge pull request #87 from MonashDeepNeuron/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
linton2000 committed Apr 13, 2024
2 parents 3b7cb16 + 0a61247 commit 568997f
Show file tree
Hide file tree
Showing 133 changed files with 1,809 additions and 735 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ $ mdbook serve --open
- Jaspar Martin
- Yuki Kume
- Osman Haji
- Duc Thanh Vinh Nguyen
- Linton Charles

## Code of Conduct, License & Contributing

Expand Down
1 change: 1 addition & 0 deletions src/.chapter7/challenges.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Challenges
73 changes: 48 additions & 25 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@

[Welcome](home.md)

- [Getting Started](./chapter1/getting-started.md)
- [Installation & Set-up](./chapter1/getting-started.md)
- [GitHub](./chapter1/github.md)
- [Windows](./chapter1/windows.md)
- [Mac](./chapter1/mac.md)
- [Linux](./chapter1/linux.md)
- [WSL](./chapter1/wsl.md)
- [M3 MASSIVE](./chapter1/m3.md)
- [Nectar Cloud](./chapter1/nectar.md)
- [Challenges](./chapter1/challenges.md)

- [Brief Introduction to C](./chapter2/intro-to-c.md)
- [Intro to C](./chapter2/intro-to-c.md)
- [Hello World](./chapter2/helloworld.md)
- [Compilation](./chapter2/compilation.md)
- [Types & Variables](./chapter2/vars.md)
Expand All @@ -20,34 +22,55 @@
- [Control Flow](./chapter2/ctrl-flow.md)
- [Loops](./chapter2/loops.md)
- [Functions](./chapter2/functions.md)
- [Pointers](./chapter2/pointers.md)
- [Dynamic Memory](./chapter2/memory.md)
- [Structures](./chapter2/structs.md)
- [Macros & The Preprocessor](./chapter2/macros.md)
- [Challenges](./chapter2/challenges.md)

- [M3](./chapter3/chapter3.md)
- [Getting Started](./chapter3/start.md)
- [Logging In](./chapter3/login.md)
- [Linux Commands](./chapter3/linux-cmds.md)
- [M3's Shared Filesystem](./chapter3/shared-fs.md)
- [Software and Tooling](./chapter3/software-tooling.md)
- [Bash Scripts](./chapter3/bash.md)
- [Job batching & SLURM](./chapter3/slurm.md)
- [Strudel](./chapter3/strudel.md)
- [Operating Systems](./chapter3/chapter3.md)
- [Computer Architecture](./chapter3/computer-architecture.md)
- [Pointers & Memory](./chapter3/memory-pointers.md)
- [Intro to Linux](./chapter3/linux-intro.md)
- [Threading & Concurrency](./chapter3/threads-concurrency.md)
- [Processes](./chapter3/processes.md)
- [Scheduling Algorithms](./chapter3/scheduling.md)
- [Challenges](./chapter3/challenges.md)

- [Parallel Computing](./chapter4/chapter4.md)
- [What is Parallel Computing?](./chapter4/parallel-computing.md)
- [Multithreading](./chapter4/multithreading.md)
- [OpenMP](./chapter4/openmp.md)
- [More C](./chapter4/chapter4.md)
- [Dynamic Memory](./chapter4/memory.md)
- [Structures](./chapter4/structs.md)
- [Macros & The Preprocessor](./chapter4/macros.md)
- [System Calls](./chapter4/syscalls.md)
- [Spawning Processes & Threads](./chapter4/spawn-procs.md)
- [Challenges](./chapter4/challenges.md)

- [Distributed Computing](./chapter5/chapter5.md)
- [Refresher on Parallelism](./chapter5/parallel-refresher.md)
- [What is Distributed Computing](./chapter5/distributed-computing.md)
- [Message Passing](./chapter5/message-passing.md)
- [OpenMPI](./chapter5/openmpi.md)
- [M3 & SLURM](./chapter5/chapter5.md)

- [Batch Processing vs. Cloud Computing](./chapter5/batch-cloud.md)
- [Parallel & Distributed Computing](./chapter5/parallel-distributed.md)
- [M3 Login - SSH & Strudel](./chapter5/login.md)
- [Intro to SLURM](./chapter5/slurm_intro.md)
- [M3 Interface & Usage](./chapter5/m3-interface.md)
- [Software & Tooling](./chapter5/software-tooling.md)
- [Challenges](./chapter5/challenges.md)

[Acknowledgements](./acknowledgements.md)
- [Introduction to Parallel Computing](./chapter6/chapter6.md)
- [Multithreading](./chapter6/multithreading.md)
- [Synchronisation](./chapter6/synchronisation.md)
- [Locks](./chapter6/locks.md)
- [Message Passing](./chapter6/message-passing.md)
- [Challenges](./chapter6/challenges.md)

- [Parallellisation of Algorithms](./chapter7/chapter7.md)
- [Parallel Search](./chapter7/parallel-search.md)
- [Parallel Sort](./chapter7/parallel-sort.md)
- [Other Parallel Algorithms](./chapter7/other-parallel-algos.md)
- [Machine Learning & HPC](./chapter7/machine-learning-and-hpc.md)
- [Optimisation Algorithms](./chapter7/optim-algos.md)
- [Challenges](./chapter7/challenges.md)

- [Apache Spark](./chapter8/chapter8.md)
- [Installation & Cluster Set-up](./chapter8/set-up.md)
- [Internal Architecture](./chapter8/internals.md)
- [Data Processing](./chapter8/data-processing.md)
- [Job Batching](./chapter8/job-batching.md)
- [Challenges](./chapter8/challenges.md)

[Acknowledgements](./acknowledgements.md)
2 changes: 2 additions & 0 deletions src/acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ This book is part of Monash DeepNeurons collection of technical information and
- [Osman Haji](https://github.com/Ozzywap)
- [Yuki Kume](https://github.com/UnciaBit)
- [Jaspar Martin](https://github.com/jasparm)
- [Duc Thanh Vinh Nguyen](https://github.com/VincentNguyenDuc)
- [Linton Charles](https://github.com/linton2000)

## Contributors

Expand Down
Binary file added src/chapter1/aaf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/chapter1/hpcid.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/chapter1/join_project.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 13 additions & 6 deletions src/chapter3/start.md → src/chapter1/m3.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,29 @@
# Getting Started
# M3 MASSIVE

MASSIVE (Multi-modal Australian ScienceS Imaging and Visualisation Environment) is a HPC supercomputing cluster that you will have access to as an MDN member. In this page we will set you up with access before you learn how to use it in Chapter 5. Feel free to go through the docs to learn about the [hardware config](https://docs.massive.org.au/M3/m3users.html) of M3 (3rd version of MASSIVE) and it's [institutional governance](https://massive.org.au/about/about.html#governance).

## Request an account

In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this:
In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this:


![HPC ID](./imgs/aaf.png)
![HPC ID](./aaf.png)

Type in Monash, as you can see here. Select Monash University, and tick the Remember my organisation box down the bottom. Once you continue to your organisation, it will take you to the Monash Uni SSO login page. You will need to login with your Monash credentials.

You should now see something like this: ![HPC ID System](./imgs/hpcid.png)
You should now see something like this:

![HPC ID System](./hpcid.png)

Once you are here, there are a couple things you will need to do. The first, and most important is to set your HPC password. This is the password you will use to login to M3. To do this, go to home, then click on Change Linux Password. This will take you through the steps of setting your password.

Once you have done this, you can move on to requesting access to the MDN project and getting access to gurobi.

## Add to project

To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this: ![Join Project](./imgs/join_project.png)
To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this:

![Join Project](./join_project.png)

In the text box type `vf38` and click search. This is the project code for MDN. Then select the project and click submit. You will now have to wait for the project admins to approve your request. Once they have done this, you will be able to access the project. This should not take longer than a few days, and you will get an email telling you when you have access.

Expand Down Expand Up @@ -47,4 +54,4 @@ cat ~/.ssh/id_ed25519.pub

Then, go to your github account, go to settings, and click on the SSH and GPG keys tab. Click on New SSH key, and paste the key into the box. Give it a name, and click Add SSH key.

You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular git cloning.
You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular git cloning.
Binary file added src/chapter1/nectar-login.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions src/chapter1/nectar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Nectar Cloud

The ARDC Nectar Research Cloud (Nectar) is Australia’s national research cloud, specifically designed for research computing. Like with M3, we will set you up with access now before you learn about it in later chapters. [This webpage](https://ardc.edu.au/services/ardc-nectar-research-cloud/) explains what it is if you're curious.

## Connect Monash Account to Nectar Cloud
To create an [identity](https://medium.com/@ciente/identity-and-access-management-iam-in-cloud-computing-2777481525a4) (account) in Nectar Cloud, all you have to do is login using your Monash student account. Click [this link](https://dashboard.rc.nectar.org.au) to access Nectar's landing page.

You will see the following. Make sure to click "Login via AAF (Australia)".

![nectar](./nectar-login.png)

You will be redirected to enter your Monash credentials after which you will see the Nectar Cloud dashboard for your trial project (your project name will be pt-xxxxx).

## Cloud Starter Series

ARDC has provided [this cloud starter tutorial series](https://tutorials.rc.nectar.org.au/cloud-starter/01-overview) for people new to Nectar Cloud. You should be able to follow these tutorials using your trial project. If you need more SUs (service units aka. cloud credits) in order to provision more cloud resources for MDN-related work, you should message your HPC Lead with that request.
12 changes: 0 additions & 12 deletions src/chapter2/challenges.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ The challenges for this chapter can found in the [HPC Training Challenges](https
- [Challenge 4 - GCD \& LCM](#challenge-4---gcd--lcm)
- [Challenge 5 - Bitwise Add](#challenge-5---bitwise-add)
- [Challenge 6 - Bitwise Multiply](#challenge-6---bitwise-multiply)
- [Challenge 7 - Sum and Product Algorithms](#challenge-7---sum-and-product-algorithms)
- [Challenge 8 - Array Concatenation](#challenge-8---array-concatenation)

## Challenge 1 - Hello World

Expand All @@ -44,13 +42,3 @@ For this challenge you have to implement a function called `bitwise_add()` which
This challenge is similar to the last but instead of implementing `+` you must implement `*` (product). Your implementation should be contained in a function called `bitwise_multiply()`. You can use any bitwise or conditional operators.

> Note: If you need `+` you can reimplement it internally in `bitwise_multiply` based on your solution from the previous challenge, import it to a header in this challenges folder and include it or copy it to this folder. Ask a trainer if you get stuck with this.
## Challenge 7 - Sum and Product Algorithms

This challenge involves implementing the sum and product reductions on an array or memory block of integers. As a bonus challenge, try and make the algorithms more generic and work with any binary operator.

## Challenge 8 - Array Concatenation

In this challenge you have to implement an array concatenation function. This should join two arrays of the same type into a single array, similar to `strcat()`. You will need to allocate a new block of memory and in order to store the concatenated arrays which will requires the sizes of the two input arrays to be known by the function. This function should return a pointer to the resulting array.

> Note: The type of the array this function concatenates can be any type except `char`.
15 changes: 14 additions & 1 deletion src/chapter2/printing.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,21 @@ int main()

> Question: Notice how we used `double` for the type of `sum`. What would happen if `sum` type was `int`?
If you want to have a play with `printf()`, copy the following code snippet run it on your own device. The command will be identically to 'Hello World!'.
If you want to have a play with `printf()`, copy the following code snippet run it on your own device. The command line will output different varieties of 'Hello World!'.

```c
#include <stdio.h>

int main() {
printf("%30s\n", "Hello World!"); // Padding added
printf("%40s%10s%20s%15s\n", "Hell", "o", "World", "!");
printf("%10.7s\n", "Hello World!"); // Print only the first 7 characters with padding
printf("%100c%c%c%c%c %c%c%c%c%c%c%c\n",
72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33, '\n'); // Hex values
return 0;
}

```
### Formatting Specification

You'll notice we used a different character after the `%` for each argument. This is because `printf()` needs to know the type of the incoming arguments so that it can format the string appropriately. For example floating point types have to use a decimal point when transformed into a text format while integers do not.
Expand Down
1 change: 0 additions & 1 deletion src/chapter2/stdlib.md

This file was deleted.

42 changes: 0 additions & 42 deletions src/chapter3/bash.md

This file was deleted.

46 changes: 2 additions & 44 deletions src/chapter3/challenges.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,3 @@
# M3 Challenges
# Challenges

## Challenge 1

Navigate to your scratch directory and, using vim (or your chosen in-terminal editor) create a file called `hello.txt` that contains the text "Hello World". Once you have created the file, use the `cat` command to print the contents of the file to the screen.

## Challenge 2

Write a bash script that prints the contents of the above hello.txt file to the screen and run it locally (on your login node).

## Challenge 3

Submit the above script to the queue by writing another SLURM bash script. Check the status of the job using `squeue`. Once the job has finished, check the output using `cat`. You can find the output file in the directory you submitted the job from.

## Challenge 4

Request an interactive node and attach to it. Once you have done this, install python 3.7 using conda.

## Challenge 5

Clone and run [this](./dl_on_m3/alexnet_stl10.py) script. You will need to first install the dependencies for it. You don't need to wait for it to finish, just make sure it is working. You will know its working if it starts listing out the loss and accuracy for each epoch. You can stop it by pressing `ctrl + c`.

Once you have confirmed that it is working, deactivate and delete the conda environment, and then end the interactive session.

> Hint: I have included the dependencies and their versions (make sure you install the right version) in the `requirements.txt` file. You will need python 3.7 to run this script.
## Challenge 6

Go back to the login node. Now you are going to put it all together. Write a bash script that does the following:

- (1) requests a compute node
- (2) installs python using conda
- (3) clones and runs the above script

Let this run fully. Check the output of the script to make sure it ran correctly. Does it match the output of the script you ran in challenge 5?
> Hint: You can check the output of the script at any time by `cat`ing the output file. The script does not need to have finished running for you to do this.
## Challenge 7

Edit your submission script so that you get a gpu node, and run the script using the gpu.
> Hint: Use the m3h partition
## Challenge 8

Now you want to clean up your working directory. First, push your solutions to your challenges repo. Then, delete the challenges directory, as well as the conda environment you created in challenge 6.
![under-const](../imgs/under-const.gif)
10 changes: 6 additions & 4 deletions src/chapter3/chapter3.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# M3
# Operating Systems

[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides as majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores.
A decent chunk of HPC involves using low-level tools and techniques to find optimisations and make software run faster. The main reason we use C is that it gives us access to deeper parts of the computer that are normally hidden away and managed on your behalf by your Python or Java interpreter.

M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on.
![comp-levels](./imgs/programming-levels.jpg)

This book will take you through the basics of connecting to M3, submitting jobs, transferring data to and from the system and some other things. If you want to learn more about M3, you can read the [M3 documentation](https://docs.massive.org.au/M3/index.html). This will give you a more in-depth look at the system, and how to use it.
> **Note:** Not all low-level, machine (Assembly) code is faster than high-level code. The primary reason that lower level coding tends to be faster is that it avoids a lot of the overhead (eg. garbage collection) involved in executing higher level code.
If you have done FIT2100 Operating Systems, this chapter would mostly be a refresher for you. It's intended to provide you with a crash course intro to operating systems theory so that you are capable of using low-level tools and implementing things like cache optimisations.
Loading

0 comments on commit 568997f

Please sign in to comment.