diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 7e95777..cedb798 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -3,6 +3,7 @@ [Welcome](home.md) - [Installation & Set-up](./chapter1/getting-started.md) + - [GitHub](./chapter1/github.md) - [Windows](./chapter1/windows.md) - [Mac](./chapter1/mac.md) @@ -13,6 +14,7 @@ - [Challenges](./chapter1/challenges.md) - [Intro to C](./chapter2/intro-to-c.md) + - [Hello World](./chapter2/helloworld.md) - [Compilation](./chapter2/compilation.md) - [Types & Variables](./chapter2/vars.md) @@ -25,52 +27,47 @@ - [Challenges](./chapter2/challenges.md) - [Operating Systems](./chapter3/chapter3.md) + - [Computer Architecture](./chapter3/computer-architecture.md) - - [Pointers & Memory](./chapter3/memory-pointers.md) + - [Pointers](./chapter3/memory-pointers.md) + - [Dynamic Memory](./chapter3/dynamic-memory.md) + - [Structures & Macros](./chapter3/structs-macros.md) - [Intro to Linux](./chapter3/linux-intro.md) - - [Threading & Concurrency](./chapter3/threads-concurrency.md) - - [Processes](./chapter3/processes.md) - - [Scheduling Algorithms](./chapter3/scheduling.md) + - [VMs & Containers](./chapter3/vms-containers.md) - [Challenges](./chapter3/challenges.md) -- [More C](./chapter4/chapter4.md) - - [Dynamic Memory](./chapter4/memory.md) - - [Structures](./chapter4/structs.md) - - [Macros & The Preprocessor](./chapter4/macros.md) - - [System Calls](./chapter4/syscalls.md) - - [Spawning Processes & Threads](./chapter4/spawn-procs.md) +- [M3 & SLURM](./chapter4/chapter4.md) + + - [Batch Processing vs. Cloud Computing](./chapter4/batch-cloud.md) + - [Parallel & Distributed Computing](./chapter4/parallel-distributed.md) + - [M3 Login - SSH & Strudel](./chapter4/login.md) + - [Intro to SLURM](./chapter4/slurm_intro.md) + - [M3 Interface & Usage](./chapter4/m3-interface.md) + - [Software & Tooling](./chapter4/software-tooling.md) - [Challenges](./chapter4/challenges.md) -- [M3 & SLURM](./chapter5/chapter5.md) +- [Introduction to Parallel Computing](./chapter5/chapter5.md) - - [Batch Processing vs. Cloud Computing](./chapter5/batch-cloud.md) - - [Parallel & Distributed Computing](./chapter5/parallel-distributed.md) - - [M3 Login - SSH & Strudel](./chapter5/login.md) - - [Intro to SLURM](./chapter5/slurm_intro.md) - - [M3 Interface & Usage](./chapter5/m3-interface.md) - - [Software & Tooling](./chapter5/software-tooling.md) + - [Multithreading](./chapter5/multithreading.md) + - [Synchronisation](./chapter5/synchronisation.md) + - [Locks](./chapter5/locks.md) + - [Message Passing](./chapter5/message-passing.md) - [Challenges](./chapter5/challenges.md) -- [Introduction to Parallel Computing](./chapter6/chapter6.md) - - [Multithreading](./chapter6/multithreading.md) - - [Synchronisation](./chapter6/synchronisation.md) - - [Locks](./chapter6/locks.md) - - [Message Passing](./chapter6/message-passing.md) +- [Parallellisation of Algorithms](./chapter6/chapter6.md) + + - [Parallel Search](./chapter6/parallel-search.md) + - [Parallel Sort](./chapter6/parallel-sort.md) + - [Other Parallel Algorithms](./chapter6/other-parallel-algos.md) + - [Machine Learning & HPC](./chapter6/machine-learning-and-hpc.md) + - [Optimisation Algorithms](./chapter6/optim-algos.md) - [Challenges](./chapter6/challenges.md) -- [Parallellisation of Algorithms](./chapter7/chapter7.md) - - [Parallel Search](./chapter7/parallel-search.md) - - [Parallel Sort](./chapter7/parallel-sort.md) - - [Other Parallel Algorithms](./chapter7/other-parallel-algos.md) - - [Machine Learning & HPC](./chapter7/machine-learning-and-hpc.md) - - [Optimisation Algorithms](./chapter7/optim-algos.md) +- [Apache Spark](./chapter7/chapter7.md) + - [Installation & Cluster Set-up](./chapter7/set-up.md) + - [Internal Architecture](./chapter7/internals.md) + - [Data Processing](./chapter7/data-processing.md) + - [Job Batching](./chapter7/job-batching.md) - [Challenges](./chapter7/challenges.md) -- [Apache Spark](./chapter8/chapter8.md) - - [Installation & Cluster Set-up](./chapter8/set-up.md) - - [Internal Architecture](./chapter8/internals.md) - - [Data Processing](./chapter8/data-processing.md) - - [Job Batching](./chapter8/job-batching.md) - - [Challenges](./chapter8/challenges.md) - [Acknowledgements](./acknowledgements.md) \ No newline at end of file diff --git a/src/chapter3/challenges.md b/src/chapter3/challenges.md index a31dd95..10575e6 100644 --- a/src/chapter3/challenges.md +++ b/src/chapter3/challenges.md @@ -1,3 +1,19 @@ # Challenges -![under-const](../imgs/under-const.gif) \ No newline at end of file +## Challenge 1 - Sum and Product Algorithms + +This challenge involves implementing the sum and product reductions on an array or memory block of integers. As a bonus challenge, try and make the algorithms more generic and work with any binary operator. + +## Challenge 2 - Array Concatenation + +In this challenge you have to implement an array concatenation function. This should join two arrays of the same type into a single array, similar to `strcat()`. You will need to allocate a new block of memory and in order to store the concatenated arrays which will requires the sizes of the two input arrays to be known by the function. This function should return a pointer to the resulting array. + +> Note: The type of the array this function concatenates can be any type except `char`. + +## Challenge 3 - Doing it in Docker + +Pull an Ubuntu image from the Docker registry, install any required dependencies and execute the same C code that you wrote for challenge 2 within that running container instance. You will have to consult the Docker documentation and be resourceful in learning and completing this task. + +## Challenge 4 - Launch a VM instance on Nectar Cloud + +This is an extension challenge in which you will have to go through the [ARDC's cloud starter tutorial series](https://tutorials.rc.nectar.org.au/cloud-starter/01-overview) in order to launch a VM instance and connect to it. \ No newline at end of file diff --git a/src/chapter4/memory.md b/src/chapter3/dynamic-memory.md similarity index 98% rename from src/chapter4/memory.md rename to src/chapter3/dynamic-memory.md index 6cb891b..755437c 100644 --- a/src/chapter4/memory.md +++ b/src/chapter3/dynamic-memory.md @@ -48,6 +48,8 @@ int main() } ``` +> If you recall the previous subchapter, `calloc()` allocates memory in the BSS segment instead of the heap/stack. + ## Reallocated Memory We can also reallocate data to fit a larger or smaller amount. The elements from the old block will be copied to the new location until the new array is full or there are no more elements to copy. `realloc()` my not actual allocate memory in a new locating if there is free space next to the existing array. `realloc()` also works like `malloc()` where the new memory is left uninitialised. `realloc()` takes two parameters, the old pointer address and the new size. diff --git a/src/chapter3/imgs/bit-byte-word.jpg b/src/chapter3/imgs/bit-byte-word.jpg new file mode 100644 index 0000000..2fc05f2 Binary files /dev/null and b/src/chapter3/imgs/bit-byte-word.jpg differ diff --git a/src/chapter3/imgs/cache-hit-miss.jpg b/src/chapter3/imgs/cache-hit-miss.jpg new file mode 100644 index 0000000..e1a20f1 Binary files /dev/null and b/src/chapter3/imgs/cache-hit-miss.jpg differ diff --git a/src/chapter3/imgs/containers.png b/src/chapter3/imgs/containers.png new file mode 100644 index 0000000..988f715 Binary files /dev/null and b/src/chapter3/imgs/containers.png differ diff --git a/src/chapter3/imgs/cpu-cache.jpg b/src/chapter3/imgs/cpu-cache.jpg new file mode 100644 index 0000000..2612375 Binary files /dev/null and b/src/chapter3/imgs/cpu-cache.jpg differ diff --git a/src/chapter3/imgs/docker-architecture.jpg b/src/chapter3/imgs/docker-architecture.jpg new file mode 100644 index 0000000..695c1be Binary files /dev/null and b/src/chapter3/imgs/docker-architecture.jpg differ diff --git a/src/chapter3/imgs/file-system-arch.png b/src/chapter3/imgs/file-system-arch.png new file mode 100644 index 0000000..628ca21 Binary files /dev/null and b/src/chapter3/imgs/file-system-arch.png differ diff --git a/src/chapter3/imgs/linux-distros.png b/src/chapter3/imgs/linux-distros.png new file mode 100644 index 0000000..bf1c3b6 Binary files /dev/null and b/src/chapter3/imgs/linux-distros.png differ diff --git a/src/chapter3/imgs/memory-cells.jpg b/src/chapter3/imgs/memory-cells.jpg new file mode 100644 index 0000000..c886f3b Binary files /dev/null and b/src/chapter3/imgs/memory-cells.jpg differ diff --git a/src/chapter3/imgs/memory-segments.png b/src/chapter3/imgs/memory-segments.png new file mode 100644 index 0000000..d135101 Binary files /dev/null and b/src/chapter3/imgs/memory-segments.png differ diff --git a/src/chapter3/imgs/paging-basic-scheme.jpg b/src/chapter3/imgs/paging-basic-scheme.jpg new file mode 100644 index 0000000..5c4d688 Binary files /dev/null and b/src/chapter3/imgs/paging-basic-scheme.jpg differ diff --git a/src/chapter3/imgs/pointers-in-c.jpg b/src/chapter3/imgs/pointers-in-c.jpg new file mode 100644 index 0000000..b5b1291 Binary files /dev/null and b/src/chapter3/imgs/pointers-in-c.jpg differ diff --git a/src/chapter3/imgs/process-states.png b/src/chapter3/imgs/process-states.png new file mode 100644 index 0000000..c69d2b6 Binary files /dev/null and b/src/chapter3/imgs/process-states.png differ diff --git a/src/chapter3/imgs/program-process.png b/src/chapter3/imgs/program-process.png new file mode 100644 index 0000000..4764191 Binary files /dev/null and b/src/chapter3/imgs/program-process.png differ diff --git a/src/chapter3/imgs/spatial-vs-temporal.gif b/src/chapter3/imgs/spatial-vs-temporal.gif new file mode 100644 index 0000000..c65d8c3 Binary files /dev/null and b/src/chapter3/imgs/spatial-vs-temporal.gif differ diff --git a/src/chapter3/imgs/vms.png b/src/chapter3/imgs/vms.png new file mode 100644 index 0000000..e121e2a Binary files /dev/null and b/src/chapter3/imgs/vms.png differ diff --git a/src/chapter3/linux-intro.md b/src/chapter3/linux-intro.md index a143ee7..acceea1 100644 --- a/src/chapter3/linux-intro.md +++ b/src/chapter3/linux-intro.md @@ -1,10 +1,100 @@ # Introduction to Linux -Linux is one of the most popular versions of the UNIX operating System. It is open source as its source code is freely available. It is free to use. Linux was designed considering UNIX compatibility. Its functionality list is quite similar to that of UNIX. +Linux is a freely available open-source operating system built by Linux Torvalds back in 1991. It's based off of Bell Lab's development of Unix in the 1970s which also forms the basis of Android and MacOS. It's an extremely popular operating system, especially for servers (nearly all nodes in M3 use Linux). We will be learning about it and using it to gain both low-level programming skills and an understanding of Operating Systems theory. -Linux Operating System has primarily three components: -- **Kernel:** The kernel is the core part of Linux. It is responsible for all major activities of this operating system. It consists of various modules and it interacts directly with the underlying hardware. Kernel provides the required abstraction to hide low level hardware details to system or application programs. -- **System Library:** System libraries are special functions or programs using which application programs or system utilities access Kernel’s features. These libraries implement most of the functionalities of the operating system and do not require kernel module’s code access rights. -- **System Utility:** System Utility programs are responsible to do specialised, individual level tasks. +There are various implementations (distributions) of Linux. We won't go into detail on them but here's a comparison of some of the popular ones. -![linux-struct](./imgs/Structure-Of-Linux-Operating-System.png) \ No newline at end of file +![linux-distros](./imgs/linux-distros.png) + +You can think of Linux as having 3 layers or components. Here they are from the highest to the lowest level (how removed they are from the hardware): +- **System Libraries:** System libraries are special functions or programs using which application programs or system utilities access the Kernel’s features. This is the topmost layer of the operating system. It allows access to the deeper parts of the machine without exposing them to direct user access. One such example for Linux and other Unix-like operating systems is `unistd.h` which provides C functions to access the POSIX (Portable Operating System Interface) API. +- **Kernel:** The kernel is the core part of Linux. It is responsible for all major activities of this operating system. It consists of various modules which are all hidden and protected from the user. The only way that a user can access the kernel is through the system library. I encourage you all to check out the [Linux kernel code on GitHub](https://github.com/torvalds/linux) (you can see Linus still actively approving all the PRs into master). +- **Device Drivers (Kernel Modules)**: If you wondered how an operating system actually controls the hardware this is how they do it. They use device drivers/kernel modules which are software programs written to act as an interface between the OS kernel and the device's firmware. + +![linux-struct](./imgs/Structure-Of-Linux-Operating-System.png) + + +## What does an operating system actually do? +Pretty much all interactions you have with your machine is facilitated by an operating system. I find it useful to break it down into 2 functional areas - compute (algorithms/processes/dynamic stuff) and storage (data structures/memory/static stuff). + +### Compute +The operating system is what ultimately controls the CPU. Time on the CPU is a scarce resource for user applications that need to get code/instructions executed as quick as possible. Hence why compute is considered a "resource" and the OS is responsible for the fair and efficient allocation of this and all resources. + +Some more terminology - a program is a file with some non-executing (static) code while a process can be thought of as a live program that's being executed on the CPU. + +![program-vs-process](./imgs/program-process.png) + +Operating systems have to enable the creation of new processes, schedule them time on a CPU, manage and keep track of processes while also handling their completion. To do this there are a lot of attributes, structures and behaviours implemented by the Linux kernel. + +#### Process Characteristics & Attributes +A process has the following attributes: + +- **Process Id:** A unique identifier assigned by the operating system. +- **Process State:** Can be ready, running, etc. +- **CPU registers:** Like the Program Counter (CPU registers must be saved and restored when a process is swapped in and out of the CPU) +- **Accounts information:** Amount of CPU used for process execution, time limits, execution ID, etc +- **I/O (input/output) status information:** For example, devices allocated to the process, open files, etc +- **CPU scheduling information:** For example, Priority (Different processes may have different priorities, for example, a shorter process assigned high priority in the shortest job first scheduling) + +A process is in one of the following states at any given time: + +- **New:** Newly Created Process (or) being-created process. +- **Ready:** After the creation process moves to the Ready state, i.e. the process is ready for execution. +- **Run:** Currently running process in CPU (only one process at a time can be under execution in a single processor) +- **Wait (or Block):** When a process requests I/O access. +- **Complete (or Terminated):** The process completed its execution. +- **Suspended Ready:** When the ready queue becomes full, some processes are moved to a suspended ready state +- **Suspended Block:** When the waiting queue becomes full. + +![proc-states](./imgs/process-states.png) + +#### Context Switching +The process of saving the context of one process and loading the context of another process is known as Context Switching. In simple terms, it is unloading a running process into the ready state in order to load another ready process into the running state. + +*When Does Context Switching Happen?* + +1. When a high-priority process comes to a ready state (i.e. with higher priority than the running process). +2. An Interrupt occurs (some I/O device tells the kernel that it needs CPU time). +3. User and kernel-mode switch. +4. Preemptive CPU scheduling is used (context switches at regular time intervals). + +There is a lot more involved in how compute is managed by the OS (eg. process scheduling, threading, etc...) which will be covered in a later chapter. + +### Storage +If you recall chapter 3.1, this area can be further subdivided into two - temporary storage (main memory i.e. RAM) and permenant storage (hard drives and SSDs). + +#### Linux File Systems +As you all know, computers manage the permenant storage of information using a system of files and directories. The Linux file system is a multifaceted structure comprised of three essential layers. At its foundation, the **Logical File System** serves as the interface between user applications and the file system, managing operations like opening, reading, and closing files. Below this layer, the **Virtual File System** facilitates the concurrent operation of multiple physical file systems, providing a standardized interface for compatibility. Finally, the **Physical File System** is responsible for the tangible management and storage of physical memory blocks on the disk, ensuring efficient data allocation and retrieval. Together, these layers form a cohesive architecture, orchestrating the organized and efficient handling of data in the Linux operating system. + +![linux-file-sys](./imgs/file-system-arch.png) + +#### Paging & Memory Allocation + +Paging is a memory management technique in operating systems that enables processes to access more memory than is physically available. The system improves performance and resource utilization using virtual memory. A process has access to the pages it needs without waiting for them to be loaded into physical memory. The technique stores and retrieves data from a computer's secondary or virtual storage (hard drive, SSD, etc.) to the primary storage (RAM). + +When a process tries to access a page that is not in RAM, the OS raises a page fault and brings in the page from virtual memory. + +![paging](./imgs/paging-basic-scheme.jpg) + +Paging improves the efficiency of memory management. By dividing memory into pages, the operating system moves pages in and out of memory as needed. Keeping only the frequently used pages reduces the number of page faults, which improves system performance and responsiveness. This is a key HPC optimisation concept known as **locality of reference**. + +#### Cache Optimisation +A lot of you must be familiar with the concept of caching. It basically means storing data temporarily in an easily accessible place in order to be more efficient when accessing it. Nearly all modern PCs use caches for efficiency. If you recall the memory heirarchy in chapter 3.1, caches sit between CPU registers and main memory (RAM) in terms of speed and cost. There are usually 3 levels of caches (depending on computer architecture) - L1, L2 and L3 with L1 being the smallest, most expensive, fastest and closest to the CPU. + +![cpu-caches](./imgs/cpu-cache.jpg) + +In the above figure, you can see that the CPU wants to read or fetch the data or instruction. First, it will access the cache memory as it is near to it and provides very fast access. If the required data or instruction is found, it will be fetched. This situation is known as a cache hit. But if the required data or instruction is not found in the cache memory then this situation is known as a cache miss. + +![cpu-caches2](./imgs/cache-hit-miss.jpg) + +The aim is to store data that any given process is likely to access in the future, in the cache. Cache optimisation involves minimising the no. of cache misses while maximizing cache hits. The benefits are obvious - reduced memory access times resulting in a faster program. Cache optimisation is done by implementing locality of reference and there are two localities: + +1. **Temporal locality** is when current data or instruction that is being fetched frequently may be needed soon. It's based on the same assumption that if a program is accessing the same location (using pointers) again and again then it's likely to access it in the immediate future as well. + +2. **Spatial locality**, on the other hand, assumes that memory addresses that are closer to currently accessed addresses are more likely to be accessed again. + +![localities](./imgs/spatial-vs-temporal.gif) + +### Accessing the Kernel's API + +As mentioned earlier, user space programs (code that a programmer writes for an application/script) will need to use a system library to access the kernel and it's lower-level functionality. For Linux, the main library is `unistd.h` which only runs on POSIX-compatible (Unix-like) operating systems and unfortunately Windows is not one of them. To get around this, we will be using a Docker container with an Ubuntu image. But first let's finish this chapter by learning about Virtual Machines and Containers. \ No newline at end of file diff --git a/src/chapter3/memory-pointers.md b/src/chapter3/memory-pointers.md index 3bb3b71..5c841b1 100644 --- a/src/chapter3/memory-pointers.md +++ b/src/chapter3/memory-pointers.md @@ -1,15 +1,25 @@ -# Pointers +# Pointers & Memory Memory is one of the most important concepts in all of computing. Memory is the primary resource utilised in all programs and when it comes to large scale applications and programs it can easily be depleted. Being able to fine tune and control memory usage is one the best ways to optimize programs to ensure they are efficient and fast. However, this has the downside the programmer must control exactly how memory is used at all times increasing the cognitive complexity of a program which increases the likelihood that memory is misused programs leaking the resource. Many languages hide the details of memory usage and control to help reduce this cognitive complexity and reduce the risks of manual memory management. This can be done a variety of ways, from interpreters and virtual machines (Python, Java and C#) to using abstractions and semantics to hide the details while still allowing control when needed (C++, Rust) to straight up using a completely unique memory and data models (Haskell) however, C's memory model is the closest to how memory is truly laid out in hardware, largely because C and computer architecture have evolved together for so many decades. This is also because C is compiled end-to-end meaning source code is compiled directly into the machine language of the target machine not an intermediate bytecode or otherwise. This means that it is far simpler for C to model a machines memory architecture than create its own. This also simplifies C concept of memory greatly giving programmers the greatest level of control of memory (and other compute resources). ## Brief Introduction into Memory -So what is memory? Memory; in its most abstract notion, is an 'infinite' sequence of fixed size cells. The size of these cells is (generally) 8-bits or a byte. On almost every computer, bytes are the smallest addressable unit of memory ie. they are the atoms of data. Any data you can build with a computer ultimately becomes some combination of bytes. But wait, what is a bit? A bit is a _binary digit_, thing of a regular (decimal) digit. It has 10 possible states (0..9) until it overflows and you need another digit (9 -> 10). A bit has only two possible states, 0 and 1. Bits are used as the extremely effective at modelling logical circuits where a wire is either on or off. Bits form the foundation for all of computing. However, inspecting and manipulating individual bits is tedious and only useful for small scale interactions. The goal of computing is to increase the computational power and thus reduce the time it takes to perform certain operations. This is why memory uses bytes. They are far easier to manipulate and are able to represent far larger data sets than a single bit (\\(2^{8}=256\\) combinations to be exact). However, while we can address individual bytes in memory this can be quite limiting in the number possible _memory locations_ a CPU can address if we used a byte to represent the numerical address location of memory (a byte). Instead many machines use a machine _word_ which represents the size of data a CPU is able to understand/read. The size of a word will correspond to the size of a CPU's registers, memory and IO buses and arithmetic manipulation hardware. Most machines have a word size of 64-bits or 8 bytes which dramatically increases the size of the instruction set used by a CPU, the amount of data it can transfer on buses and the amount of memory a CPU is able to address (\\(2^{8}=256\\) vs. \\(2^{64}=1.844674407371 × 10^{19}\\)). This is the largest integral value a machine is able to handle for most operations (ignoring specialised hardware). +So what is memory? Memory; in its most abstract notion, is an 'infinite' sequence of fixed size cells. The size of these cells is (generally) 8-bits or a byte. On almost every computer, bytes are the smallest addressable unit of memory ie. they are the atoms of data. Any data you can build with a computer ultimately becomes some combination of bytes. But wait, what is a bit? A bit is a _binary digit_, thing of a regular (decimal) digit. It has 10 possible states (0..9) until it overflows and you need another digit (9 -> 10). A bit has only two possible states, 0 and 1. + +![memory-cells](./imgs/memory-cells.jpg) + +Bits are used as the extremely effective at modelling logical circuits where a wire is either on or off. Bits form the foundation for all of computing. However, inspecting and manipulating individual bits is tedious and only useful for small scale interactions. The goal of computing is to increase the computational power and thus reduce the time it takes to perform certain operations. This is why memory uses bytes. They are far easier to manipulate and are able to represent far larger data sets than a single bit (\\(2^{8}=256\\) combinations to be exact). However, while we can address individual bytes in memory this can be quite limiting in the number possible _memory locations_ a CPU can address if we used a byte to represent the numerical address location of memory (a byte). Instead many machines use a machine _word_ which represents the size of data a CPU is able to understand/read. The size of a word will correspond to the size of a CPU's registers, memory and IO buses and arithmetic manipulation hardware. Most machines have a word size of 64-bits or 8 bytes which dramatically increases the size of the instruction set used by a CPU, the amount of data it can transfer on buses and the amount of memory a CPU is able to address (\\(2^{8}=256\\) vs. \\(2^{64}=1.844674407371 × 10^{19}\\)). This is the largest integral value a machine is able to handle for most operations (ignoring specialised hardware). + +![bit-bytes](./imgs/bit-byte-word.jpg) ### The Stack & Heap Now, most computers do not give away all of their memory to a single application nor will memory used by an application allocate memory all from the same place. When a program executes the OS will allocates a small amount of memory to the memory for the instructions, constant data, meta data about the program and a small amount of free memory. This small amount of free memory is called the stack. Any local variables, function call stack and data created in a program are allocated to this part of the program automatically. However, the stack is quite small so when you need access to a large amount of memory you have to request it from the OS explicitly. The location where this OS owned memory is kept is called the heap (or free store). The heap is theoretically infinite in size allowing you to store large amounts of data however, you must remember to return it to the OS when you are done otherwise the memory will leak and the OS will loose track of it when your program finishes (or crashes). +![memory-segment](./imgs/memory-segments.png) + +Besides, the stack and the heap you have the text and the data segment. The text segment would contain your executable instructions (your compiled C code) while the data segment has all initialised data. You also have a few other segments like BSS (Block Started by Symbol) which contains all uninitialised data and others not mentioned in the diagram above such as OS-reserved sections for Kernel code and data. If you ever wondered what RAM (main memory) actually looks like, well now you know. + ## What are Pointers? So how do we refer to a memory. Fundamentally we need to be able to store the address of some piece data. This address is just some unsigned integer; with a bit size equivalent to a machine word. Using this address we then need to be able redirect access to the data held by at this memory address. We could just use a special integer type that corresponds to a machine word type and use this to store an address however, we often want to be able to access other pieces of data surround the data at the address we are storing thus we need to also be able to encode the type or size of the data whose address we are holding. This is because, while addresses all have the same size/width, it may own some data that is larger or smaller. Remember the smallest addressable machine location is a byte not a machine word. This construction we have described is called a pointer, simply because holds the location of some data ie. it points to some data. The type of a pointer is the type of the data being **_pointed to_** followed by an asterisks. @@ -25,6 +35,9 @@ void* pd; //< Pointer to a void > **Note:** > > - `void*` represents a **polymorphic** pointer type meaning it can point to data of any type and must be cast to the correct type on usage. + +![pointers](./imgs/pointers-in-c.jpg) + ### Obtaining Pointers diff --git a/src/chapter4/macros.md b/src/chapter3/structs-macros.md similarity index 74% rename from src/chapter4/macros.md rename to src/chapter3/structs-macros.md index dbb0641..3e06130 100644 --- a/src/chapter4/macros.md +++ b/src/chapter3/structs-macros.md @@ -1,8 +1,42 @@ -# Macros & The Preprocessor +# Structures & Macros +Now let's cover some more advanced C language features before delving into operating system concepts. + +## Structures + +So far we have only been able to to manipulate primitive data types and collections of a single type but what if we want to manipulate and store data that is of different types. This is where structures come in. Structures are used to hold data of different types in a compact format. Structures are created using the `struct` keyword paired with a unique name followed by a brace scope of variable declarations. To then create a variable of the structure type you again use the `struct` keyword and the structures type name followed by a variable name. You can then initialise the fields using a comma separated list, enclosed in braces where each element is the desired value for initialising the field of the structure. The fields are then accessed using the variable and the member access operator (`.`) paired with the field's name. + +```c +#include + +struct A +{ + int i; + double d; + char* c; +}; + +int main() +{ + struct A a = { 5, 576.658, "Hello" }; + printf("%d\n", a.i); + printf("%f\n", a.d); + printf("%s\n", a.c); + + return 0; +} +``` + +> **Note:** +> +> - Structures do not support methods. +> - Elements in a structure a layed out contiguously ie. each element is right next to each other. +> - The size of a structure can be obtained normally using `sizeof`. + +## Macros & The Preprocessor Sometimes we need to control how source code is compiled, enable certain parts of the source code while disabling other parts. How do you do this? This is done using macros in C. Macros are compile time expressions that are executed by a part of the compiler called the preprocessor. -## What is a macro? +### What is a macro? Macros are expressions that are evaluated and removed from the final source code. They are created using a `#` followed by a macro identifier. One macro we have used consistently throughout this book is the `#include` macro which is used to copy the source code from header files into other source and header files. Macros in C mostly perform in-source text replacement. @@ -34,7 +68,7 @@ int main() > Note: Even though you can define function-like entities using macros I would highly recommend against it in 99% of cases as it is nearly impossible to debug as the macros are expanded and removed very early on in the compilation of a program. -#### Include Guards +### Include Guards One common use of macros is for include guards. These are valueless macros that are defined once for a header file and only expose the contents of the header file if the macro is not defined. This prevents headers from being included twice and causing a redefinition or duplication error. How does this work? Essentially, when a header is first encountered, a macro is checked. If it is **_not_** defined we then define it and define the rest of the contents of the header. If it was already defined then the header is 'empty'. This stops the contents of headers being imported multiple times or as a transitive dependency. @@ -47,7 +81,7 @@ One common use of macros is for include guards. These are valueless macros that #endif /// HEADER ``` -#### Defining Macros from the Command Line +### Defining Macros from the Command Line Macros are able to be defined from the command line through the compiler. Many compilers support a `-D` and `-U` fag which can define and undefine macros in source code respectively. These are typically used to control macros similar to header guards which control which parts of a codebase are defined eg. for different OS or graphical backends. diff --git a/src/chapter3/vms-containers.md b/src/chapter3/vms-containers.md new file mode 100644 index 0000000..95d5fb4 --- /dev/null +++ b/src/chapter3/vms-containers.md @@ -0,0 +1,42 @@ +# VMs & Containers + +## What is a virtual machine (VM)? + +A virtual machine is not a physical machine. It’s a file that replicates the computing environment of a physical machine. It’s similar to how virtual reality (VR) environments replicate the real world. VR isn’t a physical space; it’s a virtual imitation. Still, we can perform real-world functions in VR, such as exploring and interacting with objects. Instead of imitating video game functions, virtual machine software emulates computer system functions i.e. the operating system. To achieve this VMs use a technology called **virtualisation** as shown below. + +![vms](./imgs/vms.png) + +At the base, you have the host hardware and OS. This is the physical machine that is used to create the virtual machines. On top of this, you have the hypervisor. This allows multiple virtual machines, each with their own operating systems (OS), to run on a single physical server. + +VMs have a few downsides, though, which containers address. Two downsides particularly stand out: + +- **VMs consume more resources:** VMs have a higher resource overhead due to the need to run a full OS instance for each VM. This can lead to larger memory and storage consumption. This in turn can have a negative effect on performance and startup times of the virtual machine. +- **Portability:** VMs are typically less portable due to differences in underlying OS environments. Moving VMs between different hypervisors or cloud providers can be more complex. + +## What is a container? + +A container is a lightweight, standalone, and executable software package that includes everything needed to run a piece of software, including the code, runtime, system tools, and libraries. + +Containers are designed to isolate applications and their dependencies, ensuring that they can run consistently across different environments. Whether the application is running from your computer or in the cloud, the application behaviour remains the same. + +Unlike VMs which virtualise the hardware, containers *virtualise the operating system*. This simply means that a container uses a single OS to create a virtual application and its libraries. Containers run on top of a shared OS provided by the host system. + +![containers](./imgs/containers.png) + +The container engine allows you to spin up containers. It provides the tools and services necessary for building, running, and deploying containerised applications. + +## Docker +As you might know, Docker is an open-source containerization platform by which you can pack your application and all its dependencies into a standardized unit called a container. Let's clarify some Docker terminology, + +- **Docker Image**: An image is an inert, immutable, file that's essentially a snapshot of a container. Images are created with the [build](https://docs.docker.com/reference/cli/docker/image/build/) command, and they'll produce a container when started with [run](https://docs.docker.com/reference/cli/docker/container/run/). Images are stored in a Docker registry such as [registry.hub.docker.com](https://registry.hub.docker.com). +- **Docker Container**: To use a programming metaphor, if an image is a class, then a container is an instance of a class—a runtime object. Multiple containers can run from the same image simultaneously. +- **Docker Daemon**: The Docker daemon (`dockerd`) listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes. +- **Docker Client**: The Docker client (`docker`) is the primary way that many Docker users interact with Docker. When you use commands such as `docker run`, the client sends these commands to `dockerd`, which carries them out. +- **Docker Desktop**: This is an easy-to-install application for your Mac, Windows or Linux environment that enables you to build and share containerized applications and microservices. + +### Docker Architecture +Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing your Docker containers. + +![docker-arch](./imgs/docker-architecture.jpg) + +Go through [this guide](https://www.docker.com/get-started/) to download and install Docker. If you'd like to learn more about it's functionality and how to work with it, [check out their documentation website](https://docs.docker.com/get-started/overview/). \ No newline at end of file diff --git a/src/chapter5/batch-cloud.md b/src/chapter4/batch-cloud.md similarity index 100% rename from src/chapter5/batch-cloud.md rename to src/chapter4/batch-cloud.md diff --git a/src/chapter4/challenges.md b/src/chapter4/challenges.md index 2986a5d..39f5e4f 100644 --- a/src/chapter4/challenges.md +++ b/src/chapter4/challenges.md @@ -1,11 +1,45 @@ -# Challenges +# M3 Challenges -## Challenge 1 - Sum and Product Algorithms +## Challenge 1 -This challenge involves implementing the sum and product reductions on an array or memory block of integers. As a bonus challenge, try and make the algorithms more generic and work with any binary operator. +Navigate to your scratch directory and, using vim (or your chosen in-terminal editor) create a file called `hello.txt` that contains the text "Hello World". Once you have created the file, use the `cat` command to print the contents of the file to the screen. -## Challenge 2 - Array Concatenation +## Challenge 2 -In this challenge you have to implement an array concatenation function. This should join two arrays of the same type into a single array, similar to `strcat()`. You will need to allocate a new block of memory and in order to store the concatenated arrays which will requires the sizes of the two input arrays to be known by the function. This function should return a pointer to the resulting array. +Write a bash script that prints the contents of the above hello.txt file to the screen and run it locally (on your login node). -> Note: The type of the array this function concatenates can be any type except `char`. \ No newline at end of file +## Challenge 3 + +Submit the above script to the queue by writing another SLURM bash script. Check the status of the job using `squeue`. Once the job has finished, check the output using `cat`. You can find the output file in the directory you submitted the job from. + +## Challenge 4 + +Request an interactive node and attach to it. Once you have done this, install python 3.7 using conda. + +## Challenge 5 + +Clone and run [this](./dl_on_m3/alexnet_stl10.py) script. You will need to first install the dependencies for it. You don't need to wait for it to finish, just make sure it is working. You will know its working if it starts listing out the loss and accuracy for each epoch. You can stop it by pressing `ctrl + c`. + +Once you have confirmed that it is working, deactivate and delete the conda environment, and then end the interactive session. + +> Hint: I have included the dependencies and their versions (make sure you install the right version) in the `requirements.txt` file. You will need python 3.7 to run this script. + +## Challenge 6 + +Go back to the login node. Now you are going to put it all together. Write a bash script that does the following: + +- (1) requests a compute node +- (2) installs python using conda +- (3) clones and runs the above script + +Let this run fully. Check the output of the script to make sure it ran correctly. Does it match the output of the script you ran in challenge 5? +> Hint: You can check the output of the script at any time by `cat`ing the output file. The script does not need to have finished running for you to do this. + +## Challenge 7 + +Edit your submission script so that you get a gpu node, and run the script using the gpu. +> Hint: Use the m3h partition + +## Challenge 8 + +Now you want to clean up your working directory. First, push your solutions to your challenges repo. Then, delete the challenges directory, as well as the conda environment you created in challenge 6. diff --git a/src/chapter4/chapter4.md b/src/chapter4/chapter4.md index 27cf36f..82d33ef 100644 --- a/src/chapter4/chapter4.md +++ b/src/chapter4/chapter4.md @@ -1,3 +1,7 @@ -# More C +# M3 & SLURM -This chapter will walk you through the more intermediate features of the C language. It aims to build on the theoretical knowledge of operating systems you gained in the last chapter with the practical skills to actually use it. You will learn about other C language constructs, memory allocation, system calls (the Kernel's API) and actually spawning processes & threads. \ No newline at end of file +[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides a majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores. + +M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on. + +This book will introduce the theory behind HPC clusters and how parallel & distributed computing works on these systems. After this, you will learn how to connect to and use M3 along with how SLURM works and how to submit jobs and take advantage of the massive computational capability that M3 provides. diff --git a/src/chapter5/imgs/aaf.png b/src/chapter4/imgs/aaf.png similarity index 100% rename from src/chapter5/imgs/aaf.png rename to src/chapter4/imgs/aaf.png diff --git a/src/chapter5/imgs/aaf_strudel.png b/src/chapter4/imgs/aaf_strudel.png similarity index 100% rename from src/chapter5/imgs/aaf_strudel.png rename to src/chapter4/imgs/aaf_strudel.png diff --git a/src/chapter5/imgs/auth_strudel.png b/src/chapter4/imgs/auth_strudel.png similarity index 100% rename from src/chapter5/imgs/auth_strudel.png rename to src/chapter4/imgs/auth_strudel.png diff --git a/src/chapter5/imgs/batch-processing.jpeg b/src/chapter4/imgs/batch-processing.jpeg similarity index 100% rename from src/chapter5/imgs/batch-processing.jpeg rename to src/chapter4/imgs/batch-processing.jpeg diff --git a/src/chapter5/imgs/data_parallelism.jpg b/src/chapter4/imgs/data_parallelism.jpg similarity index 100% rename from src/chapter5/imgs/data_parallelism.jpg rename to src/chapter4/imgs/data_parallelism.jpg diff --git a/src/chapter5/imgs/distributed_memory_architecture.png b/src/chapter4/imgs/distributed_memory_architecture.png similarity index 100% rename from src/chapter5/imgs/distributed_memory_architecture.png rename to src/chapter4/imgs/distributed_memory_architecture.png diff --git a/src/chapter5/imgs/distributed_memory_architecture_2.png b/src/chapter4/imgs/distributed_memory_architecture_2.png similarity index 100% rename from src/chapter5/imgs/distributed_memory_architecture_2.png rename to src/chapter4/imgs/distributed_memory_architecture_2.png diff --git a/src/chapter5/imgs/distributed_vs_shared.png b/src/chapter4/imgs/distributed_vs_shared.png similarity index 100% rename from src/chapter5/imgs/distributed_vs_shared.png rename to src/chapter4/imgs/distributed_vs_shared.png diff --git a/src/chapter5/imgs/filezilla_connect_m3.png b/src/chapter4/imgs/filezilla_connect_m3.png similarity index 100% rename from src/chapter5/imgs/filezilla_connect_m3.png rename to src/chapter4/imgs/filezilla_connect_m3.png diff --git a/src/chapter5/imgs/filezilla_sitemanager.png b/src/chapter4/imgs/filezilla_sitemanager.png similarity index 100% rename from src/chapter5/imgs/filezilla_sitemanager.png rename to src/chapter4/imgs/filezilla_sitemanager.png diff --git a/src/chapter5/imgs/gurobi.png b/src/chapter4/imgs/gurobi.png similarity index 100% rename from src/chapter5/imgs/gurobi.png rename to src/chapter4/imgs/gurobi.png diff --git a/src/chapter5/imgs/gurobi2.png b/src/chapter4/imgs/gurobi2.png similarity index 100% rename from src/chapter5/imgs/gurobi2.png rename to src/chapter4/imgs/gurobi2.png diff --git a/src/chapter4/imgs/htop.png b/src/chapter4/imgs/htop.png new file mode 100644 index 0000000..2efbc06 Binary files /dev/null and b/src/chapter4/imgs/htop.png differ diff --git a/src/chapter5/imgs/interactive-processing.png b/src/chapter4/imgs/interactive-processing.png similarity index 100% rename from src/chapter5/imgs/interactive-processing.png rename to src/chapter4/imgs/interactive-processing.png diff --git a/src/chapter5/imgs/login-compute-nodes.jpeg b/src/chapter4/imgs/login-compute-nodes.jpeg similarity index 100% rename from src/chapter5/imgs/login-compute-nodes.jpeg rename to src/chapter4/imgs/login-compute-nodes.jpeg diff --git a/src/chapter5/imgs/memory_architectures.jpg b/src/chapter4/imgs/memory_architectures.jpg similarity index 100% rename from src/chapter5/imgs/memory_architectures.jpg rename to src/chapter4/imgs/memory_architectures.jpg diff --git a/src/chapter5/imgs/mpi_datatypes.png b/src/chapter4/imgs/mpi_datatypes.png similarity index 100% rename from src/chapter5/imgs/mpi_datatypes.png rename to src/chapter4/imgs/mpi_datatypes.png diff --git a/src/chapter5/imgs/mpi_routines.png b/src/chapter4/imgs/mpi_routines.png similarity index 100% rename from src/chapter5/imgs/mpi_routines.png rename to src/chapter4/imgs/mpi_routines.png diff --git a/src/chapter5/imgs/parallel-distributed.png b/src/chapter4/imgs/parallel-distributed.png similarity index 100% rename from src/chapter5/imgs/parallel-distributed.png rename to src/chapter4/imgs/parallel-distributed.png diff --git a/src/chapter5/imgs/parallel_computing_arrays_eg.png b/src/chapter4/imgs/parallel_computing_arrays_eg.png similarity index 100% rename from src/chapter5/imgs/parallel_computing_arrays_eg.png rename to src/chapter4/imgs/parallel_computing_arrays_eg.png diff --git a/src/chapter5/imgs/parallel_scalability.jpg b/src/chapter4/imgs/parallel_scalability.jpg similarity index 100% rename from src/chapter5/imgs/parallel_scalability.jpg rename to src/chapter4/imgs/parallel_scalability.jpg diff --git a/src/chapter5/imgs/ping_pong.png b/src/chapter4/imgs/ping_pong.png similarity index 100% rename from src/chapter5/imgs/ping_pong.png rename to src/chapter4/imgs/ping_pong.png diff --git a/src/chapter5/imgs/putty_key_not_cached.png b/src/chapter4/imgs/putty_key_not_cached.png similarity index 100% rename from src/chapter5/imgs/putty_key_not_cached.png rename to src/chapter4/imgs/putty_key_not_cached.png diff --git a/src/chapter5/imgs/putty_start.png b/src/chapter4/imgs/putty_start.png similarity index 100% rename from src/chapter5/imgs/putty_start.png rename to src/chapter4/imgs/putty_start.png diff --git a/src/chapter5/imgs/scale-out-up.png b/src/chapter4/imgs/scale-out-up.png similarity index 100% rename from src/chapter5/imgs/scale-out-up.png rename to src/chapter4/imgs/scale-out-up.png diff --git a/src/chapter5/imgs/slurm-arch.gif b/src/chapter4/imgs/slurm-arch.gif similarity index 100% rename from src/chapter5/imgs/slurm-arch.gif rename to src/chapter4/imgs/slurm-arch.gif diff --git a/src/chapter5/imgs/strudel1.png b/src/chapter4/imgs/strudel1.png similarity index 100% rename from src/chapter5/imgs/strudel1.png rename to src/chapter4/imgs/strudel1.png diff --git a/src/chapter5/imgs/strudel2.png b/src/chapter4/imgs/strudel2.png similarity index 100% rename from src/chapter5/imgs/strudel2.png rename to src/chapter4/imgs/strudel2.png diff --git a/src/chapter5/imgs/strudel_home.png b/src/chapter4/imgs/strudel_home.png similarity index 100% rename from src/chapter5/imgs/strudel_home.png rename to src/chapter4/imgs/strudel_home.png diff --git a/src/chapter5/imgs/task_parallelism.jpg b/src/chapter4/imgs/task_parallelism.jpg similarity index 100% rename from src/chapter5/imgs/task_parallelism.jpg rename to src/chapter4/imgs/task_parallelism.jpg diff --git a/src/chapter4/imgs/time.png b/src/chapter4/imgs/time.png new file mode 100644 index 0000000..da640d6 Binary files /dev/null and b/src/chapter4/imgs/time.png differ diff --git a/src/chapter5/job-scripting.md b/src/chapter4/job-scripting.md similarity index 100% rename from src/chapter5/job-scripting.md rename to src/chapter4/job-scripting.md diff --git a/src/chapter5/login.md b/src/chapter4/login.md similarity index 100% rename from src/chapter5/login.md rename to src/chapter4/login.md diff --git a/src/chapter5/m3-interface.md b/src/chapter4/m3-interface.md similarity index 100% rename from src/chapter5/m3-interface.md rename to src/chapter4/m3-interface.md diff --git a/src/chapter5/parallel-distributed.md b/src/chapter4/parallel-distributed.md similarity index 100% rename from src/chapter5/parallel-distributed.md rename to src/chapter4/parallel-distributed.md diff --git a/src/chapter5/slurm_intro.md b/src/chapter4/slurm_intro.md similarity index 100% rename from src/chapter5/slurm_intro.md rename to src/chapter4/slurm_intro.md diff --git a/src/chapter5/software-tooling.md b/src/chapter4/software-tooling.md similarity index 100% rename from src/chapter5/software-tooling.md rename to src/chapter4/software-tooling.md diff --git a/src/chapter4/spawn-procs.md b/src/chapter4/spawn-procs.md deleted file mode 100644 index 13aa1d0..0000000 --- a/src/chapter4/spawn-procs.md +++ /dev/null @@ -1,3 +0,0 @@ -# Spawning Processes & Threads - -![under-const](../imgs/under-const.gif) \ No newline at end of file diff --git a/src/chapter4/structs.md b/src/chapter4/structs.md deleted file mode 100644 index ff6d028..0000000 --- a/src/chapter4/structs.md +++ /dev/null @@ -1,30 +0,0 @@ -# Structures - -So far we have only been able to to manipulate primitive data types and collections of a single type but what if we want to manipulate and store data that is of different types. This is where structures come in. Structures are used to hold data of different types in a compact format. Structures are created using the `struct` keyword paired with a unique name followed by a brace scope of variable declarations. To then create a variable of the structure type you again use the `struct` keyword and the structures type name followed by a variable name. You can then initialise the fields using a comma separated list, enclosed in braces where each element is the desired value for initialising the field of the structure. The fields are then accessed using the variable and the member access operator (`.`) paired with the field's name. - -```c -#include - -struct A -{ - int i; - double d; - char* c; -}; - -int main() -{ - struct A a = { 5, 576.658, "Hello" }; - printf("%d\n", a.i); - printf("%f\n", a.d); - printf("%s\n", a.c); - - return 0; -} -``` - -> **Note:** -> -> - Structures do not support methods. -> - Elements in a structure a layed out contiguously ie. each element is right next to each other. -> - The size of a structure can be obtained normally using `sizeof`. diff --git a/src/chapter4/syscalls.md b/src/chapter4/syscalls.md deleted file mode 100644 index 2e229eb..0000000 --- a/src/chapter4/syscalls.md +++ /dev/null @@ -1,3 +0,0 @@ -# System Calls - -![under-const](../imgs/under-const.gif) \ No newline at end of file diff --git a/src/chapter5/challenges.md b/src/chapter5/challenges.md index 39f5e4f..0d9c4ce 100644 --- a/src/chapter5/challenges.md +++ b/src/chapter5/challenges.md @@ -1,45 +1,40 @@ -# M3 Challenges +# Parallel Computing Challenges -## Challenge 1 +## Pre-Tasks -Navigate to your scratch directory and, using vim (or your chosen in-terminal editor) create a file called `hello.txt` that contains the text "Hello World". Once you have created the file, use the `cat` command to print the contents of the file to the screen. +Make sure to clone a copy of **your** challenges repo onto M3, ideally in a personal folder on vf38_scratch. -## Challenge 2 +> Note: For every challenge you will be running the programs as SLURM jobs. This is so we don't overload the login nodes. A template [SLURM job script](./job.slurm) is provided at the root of this directory which you can use to submit your own jobs to SLURM by copying it to each challenges sub-directory and filling in the missing details. You may need more than one for some challenges. This template will put the would-be-printed output in a file named `slurm-.out`. -Write a bash script that prints the contents of the above hello.txt file to the screen and run it locally (on your login node). +## Task 1 - Single Cluster Job using OpenMP -## Challenge 3 +Create a program in `hello.c` that prints 'Hello, world from thread: ' to the output. Launch the job to a node SLURM. Next, extend the program to run on multi-nodes using OpenMPI. -Submit the above script to the queue by writing another SLURM bash script. Check the status of the job using `squeue`. Once the job has finished, check the output using `cat`. You can find the output file in the directory you submitted the job from. +> Note: +> +> - The output of a job is put in a slurm-.out file by default. +> - The template slurm job scripts will output the results to a `slurm-.out` file. -## Challenge 4 +## Task 2 - Parallel `for` Loop -Request an interactive node and attach to it. Once you have done this, install python 3.7 using conda. +In `array-gen.c` implement a program that generates an array containing the numbers 0..10'000 elements (inclusive) using a `for` loop. Measure the execution time using the `time` Linux command. Now reimplement the program to utilise OpenMP's parallel `for` loop macros, measuring the execution time again. Is there any performance improvement? Are the elements still in the correct order and if not how can you fix this. Try experimenting with different sized arrays and element types. Again, extend the program to use multi-nodes. -## Challenge 5 +> Hint: You will likely need to allocate memory from the heap. -Clone and run [this](./dl_on_m3/alexnet_stl10.py) script. You will need to first install the dependencies for it. You don't need to wait for it to finish, just make sure it is working. You will know its working if it starts listing out the loss and accuracy for each epoch. You can stop it by pressing `ctrl + c`. +## Task 3 - Parallel Reductions -Once you have confirmed that it is working, deactivate and delete the conda environment, and then end the interactive session. +In the C chapter we created a sum program that summed the elements of an array together. Using this as a base, create a new program that again computes the sum of the elements of an array but using OpenMP, comparing the execution time between the sequential and parallel versions. Is there any performance improvement? How would using a different binary operator change our ability to parallelize the the reduction? -> Hint: I have included the dependencies and their versions (make sure you install the right version) in the `requirements.txt` file. You will need python 3.7 to run this script. +If you have time, implement the sum but at each iteration, raise the current value to the power of the current accumulation divide by 100, adding this to the accumulation. Test a serial and parallel version. Is the parallel any faster? -## Challenge 6 +> Note: `module load gcc` to use newer version of gcc if you have error with something like `-std=c99`. -Go back to the login node. Now you are going to put it all together. Write a bash script that does the following: +## Task 4 - Laplace Equation for Calculating the Temperature of a Square Plane -- (1) requests a compute node -- (2) installs python using conda -- (3) clones and runs the above script +For this challenge you will attempt to parallelize an existing implementation of the Laplace Equation. Throughout the source files of this project there are various loops you can try and make faster by utilizing OpenMP macros. See if you can make a faster version in the `laplace2d-parallel.c`. To build these files make sure you're in that directory and use the command `make`. The executables will be in the same directory. -Let this run fully. Check the output of the script to make sure it ran correctly. Does it match the output of the script you ran in challenge 5? -> Hint: You can check the output of the script at any time by `cat`ing the output file. The script does not need to have finished running for you to do this. +## Task 5 - Calculate Pi using "Monte Carlo Algorithm" -## Challenge 7 +For this challenge you will have to try and implement the Monte Carlo algorithm with no framework or template and using everything you've learnt so far. Good luck. -Edit your submission script so that you get a gpu node, and run the script using the gpu. -> Hint: Use the m3h partition - -## Challenge 8 - -Now you want to clean up your working directory. First, push your solutions to your challenges repo. Then, delete the challenges directory, as well as the conda environment you created in challenge 6. +[Short explanation of Monte Carlo algorithm](https://www.youtube.com/watch?v=7ESK5SaP-bc&ab_channel=MarbleScience) diff --git a/src/chapter5/chapter5.md b/src/chapter5/chapter5.md index 82d33ef..95c1d02 100644 --- a/src/chapter5/chapter5.md +++ b/src/chapter5/chapter5.md @@ -1,7 +1,7 @@ -# M3 & SLURM +# Parallel Computing -[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides a majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores. +In this chapter, we will discuss the abstraction of parallel computing. To facilitate our exploration, we will employ a API within the C Programming Language: OpenMP. This tool will serve as a means to concretely illustrate the underlying language-independent theory. -M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on. +**Parallel computing is about executing the instructions of the program simultaneously.** -This book will introduce the theory behind HPC clusters and how parallel & distributed computing works on these systems. After this, you will learn how to connect to and use M3 along with how SLURM works and how to submit jobs and take advantage of the massive computational capability that M3 provides. +One of the core values of computing is the breaking down of a big problem into smaller easier to solve problems, or at least smaller problems. In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than sequentially (in order). diff --git a/src/chapter6/imgs/barrier-end.png b/src/chapter5/imgs/barrier-end.png similarity index 100% rename from src/chapter6/imgs/barrier-end.png rename to src/chapter5/imgs/barrier-end.png diff --git a/src/chapter6/imgs/barrier-wait.png b/src/chapter5/imgs/barrier-wait.png similarity index 100% rename from src/chapter6/imgs/barrier-wait.png rename to src/chapter5/imgs/barrier-wait.png diff --git a/src/chapter6/imgs/barrier.png b/src/chapter5/imgs/barrier.png similarity index 100% rename from src/chapter6/imgs/barrier.png rename to src/chapter5/imgs/barrier.png diff --git a/src/chapter6/imgs/deadlock.png b/src/chapter5/imgs/deadlock.png similarity index 100% rename from src/chapter6/imgs/deadlock.png rename to src/chapter5/imgs/deadlock.png diff --git a/src/chapter6/imgs/explicit-barrier.png b/src/chapter5/imgs/explicit-barrier.png similarity index 100% rename from src/chapter6/imgs/explicit-barrier.png rename to src/chapter5/imgs/explicit-barrier.png diff --git a/src/chapter6/imgs/fork-join.png b/src/chapter5/imgs/fork-join.png similarity index 100% rename from src/chapter6/imgs/fork-join.png rename to src/chapter5/imgs/fork-join.png diff --git a/src/chapter5/imgs/htop.png b/src/chapter5/imgs/htop.png index 2efbc06..cbc1fd3 100644 Binary files a/src/chapter5/imgs/htop.png and b/src/chapter5/imgs/htop.png differ diff --git a/src/chapter6/imgs/mpi-routines.png b/src/chapter5/imgs/mpi-routines.png similarity index 100% rename from src/chapter6/imgs/mpi-routines.png rename to src/chapter5/imgs/mpi-routines.png diff --git a/src/chapter6/imgs/one-thread-counter.png b/src/chapter5/imgs/one-thread-counter.png similarity index 100% rename from src/chapter6/imgs/one-thread-counter.png rename to src/chapter5/imgs/one-thread-counter.png diff --git a/src/chapter6/imgs/program-structure.png b/src/chapter5/imgs/program-structure.png similarity index 100% rename from src/chapter6/imgs/program-structure.png rename to src/chapter5/imgs/program-structure.png diff --git a/src/chapter5/imgs/time.png b/src/chapter5/imgs/time.png index da640d6..b9f5185 100644 Binary files a/src/chapter5/imgs/time.png and b/src/chapter5/imgs/time.png differ diff --git a/src/chapter6/imgs/two-threads-counter.png b/src/chapter5/imgs/two-threads-counter.png similarity index 100% rename from src/chapter6/imgs/two-threads-counter.png rename to src/chapter5/imgs/two-threads-counter.png diff --git a/src/chapter6/locks.md b/src/chapter5/locks.md similarity index 100% rename from src/chapter6/locks.md rename to src/chapter5/locks.md diff --git a/src/chapter6/message-passing.md b/src/chapter5/message-passing.md similarity index 100% rename from src/chapter6/message-passing.md rename to src/chapter5/message-passing.md diff --git a/src/chapter6/multithreading.md b/src/chapter5/multithreading.md similarity index 100% rename from src/chapter6/multithreading.md rename to src/chapter5/multithreading.md diff --git a/src/chapter6/synchronisation.md b/src/chapter5/synchronisation.md similarity index 100% rename from src/chapter6/synchronisation.md rename to src/chapter5/synchronisation.md diff --git a/src/chapter6/challenges.md b/src/chapter6/challenges.md index 0d9c4ce..a31dd95 100644 --- a/src/chapter6/challenges.md +++ b/src/chapter6/challenges.md @@ -1,40 +1,3 @@ -# Parallel Computing Challenges +# Challenges -## Pre-Tasks - -Make sure to clone a copy of **your** challenges repo onto M3, ideally in a personal folder on vf38_scratch. - -> Note: For every challenge you will be running the programs as SLURM jobs. This is so we don't overload the login nodes. A template [SLURM job script](./job.slurm) is provided at the root of this directory which you can use to submit your own jobs to SLURM by copying it to each challenges sub-directory and filling in the missing details. You may need more than one for some challenges. This template will put the would-be-printed output in a file named `slurm-.out`. - -## Task 1 - Single Cluster Job using OpenMP - -Create a program in `hello.c` that prints 'Hello, world from thread: ' to the output. Launch the job to a node SLURM. Next, extend the program to run on multi-nodes using OpenMPI. - -> Note: -> -> - The output of a job is put in a slurm-.out file by default. -> - The template slurm job scripts will output the results to a `slurm-.out` file. - -## Task 2 - Parallel `for` Loop - -In `array-gen.c` implement a program that generates an array containing the numbers 0..10'000 elements (inclusive) using a `for` loop. Measure the execution time using the `time` Linux command. Now reimplement the program to utilise OpenMP's parallel `for` loop macros, measuring the execution time again. Is there any performance improvement? Are the elements still in the correct order and if not how can you fix this. Try experimenting with different sized arrays and element types. Again, extend the program to use multi-nodes. - -> Hint: You will likely need to allocate memory from the heap. - -## Task 3 - Parallel Reductions - -In the C chapter we created a sum program that summed the elements of an array together. Using this as a base, create a new program that again computes the sum of the elements of an array but using OpenMP, comparing the execution time between the sequential and parallel versions. Is there any performance improvement? How would using a different binary operator change our ability to parallelize the the reduction? - -If you have time, implement the sum but at each iteration, raise the current value to the power of the current accumulation divide by 100, adding this to the accumulation. Test a serial and parallel version. Is the parallel any faster? - -> Note: `module load gcc` to use newer version of gcc if you have error with something like `-std=c99`. - -## Task 4 - Laplace Equation for Calculating the Temperature of a Square Plane - -For this challenge you will attempt to parallelize an existing implementation of the Laplace Equation. Throughout the source files of this project there are various loops you can try and make faster by utilizing OpenMP macros. See if you can make a faster version in the `laplace2d-parallel.c`. To build these files make sure you're in that directory and use the command `make`. The executables will be in the same directory. - -## Task 5 - Calculate Pi using "Monte Carlo Algorithm" - -For this challenge you will have to try and implement the Monte Carlo algorithm with no framework or template and using everything you've learnt so far. Good luck. - -[Short explanation of Monte Carlo algorithm](https://www.youtube.com/watch?v=7ESK5SaP-bc&ab_channel=MarbleScience) +![under-const](../imgs/under-const.gif) \ No newline at end of file diff --git a/src/chapter6/chapter6.md b/src/chapter6/chapter6.md index 95c1d02..25d3de6 100644 --- a/src/chapter6/chapter6.md +++ b/src/chapter6/chapter6.md @@ -1,7 +1,3 @@ -# Parallel Computing +# Parallellisation of Algorithms -In this chapter, we will discuss the abstraction of parallel computing. To facilitate our exploration, we will employ a API within the C Programming Language: OpenMP. This tool will serve as a means to concretely illustrate the underlying language-independent theory. - -**Parallel computing is about executing the instructions of the program simultaneously.** - -One of the core values of computing is the breaking down of a big problem into smaller easier to solve problems, or at least smaller problems. In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than sequentially (in order). +![under-const](../imgs/under-const.gif) \ No newline at end of file diff --git a/src/chapter7/imgs/Beale_contour.svg.png b/src/chapter6/imgs/Beale_contour.svg.png similarity index 100% rename from src/chapter7/imgs/Beale_contour.svg.png rename to src/chapter6/imgs/Beale_contour.svg.png diff --git a/src/chapter7/imgs/Rastrigin_contour_plot.svg.png b/src/chapter6/imgs/Rastrigin_contour_plot.svg.png similarity index 100% rename from src/chapter7/imgs/Rastrigin_contour_plot.svg.png rename to src/chapter6/imgs/Rastrigin_contour_plot.svg.png diff --git a/src/chapter7/imgs/Rosenbrock_contour.svg.png b/src/chapter6/imgs/Rosenbrock_contour.svg.png similarity index 100% rename from src/chapter7/imgs/Rosenbrock_contour.svg.png rename to src/chapter6/imgs/Rosenbrock_contour.svg.png diff --git a/src/chapter6/imgs/htop.png b/src/chapter6/imgs/htop.png deleted file mode 100644 index cbc1fd3..0000000 Binary files a/src/chapter6/imgs/htop.png and /dev/null differ diff --git a/src/chapter6/imgs/time.png b/src/chapter6/imgs/time.png deleted file mode 100644 index b9f5185..0000000 Binary files a/src/chapter6/imgs/time.png and /dev/null differ diff --git a/src/chapter7/machine-learning-and-hpc.md b/src/chapter6/machine-learning-and-hpc.md similarity index 100% rename from src/chapter7/machine-learning-and-hpc.md rename to src/chapter6/machine-learning-and-hpc.md diff --git a/src/chapter7/optim-algos.md b/src/chapter6/optim-algos.md similarity index 100% rename from src/chapter7/optim-algos.md rename to src/chapter6/optim-algos.md diff --git a/src/chapter7/optimisation-algorithms.md b/src/chapter6/optimisation-algorithms.md similarity index 100% rename from src/chapter7/optimisation-algorithms.md rename to src/chapter6/optimisation-algorithms.md diff --git a/src/chapter7/other-parallel-algos.md b/src/chapter6/other-parallel-algos.md similarity index 100% rename from src/chapter7/other-parallel-algos.md rename to src/chapter6/other-parallel-algos.md diff --git a/src/chapter7/parallel-search.md b/src/chapter6/parallel-search.md similarity index 100% rename from src/chapter7/parallel-search.md rename to src/chapter6/parallel-search.md diff --git a/src/chapter7/parallel-sort.md b/src/chapter6/parallel-sort.md similarity index 100% rename from src/chapter7/parallel-sort.md rename to src/chapter6/parallel-sort.md diff --git a/src/chapter7/challenges.md b/src/chapter7/challenges.md index a31dd95..169d112 100644 --- a/src/chapter7/challenges.md +++ b/src/chapter7/challenges.md @@ -1,3 +1,47 @@ -# Challenges +# Apache Spark Challenges -![under-const](../imgs/under-const.gif) \ No newline at end of file +## Overview + +- [Apache Spark Challenges](#apache-spark-challenges) + - [Overview](#overview) + - [Task 1 - Classic Distributed Problem: Token Counting](#task-1---classic-distributed-problem-token-counting) + - [Task 2 - Cluster Set-up Bash Scripts](#task-2---cluster-set-up-bash-scripts) + - [Task 3 - Spark and Slurm](#task-3---spark-and-slurm) + - [Task 4 - Data Processing](#task-4---data-processing) + - [Task 5 - Spark Machine Learning](#task-5---spark-machine-learning) + +> Note: Tasks 1, 2, and 3 closely resemble a **typical workflow** when working with Apache Spark: +> - **Step 1**: Interactively work with a small sample of the problem +> - **Step 2**: Solve and optimize the sample problem +> - **Step 3**: Submit the entire larger problem as a batch job +> - **Step 4**: Analyze the result and, if necessary, repeat steps 1 to 4 +> +> You should employ this workflow into task 4 and task 5 + +## Task 1 - Classic Distributed Problem: Token Counting + +Given a string of tokens, count the number of times each token apprears. You should do this task in an interactive JupyterLab notebook connecting to a Spark cluster. This is a cananical problem of distributed data processing, and often served as an example for [MapReduce Programming Model](https://en.wikipedia.org/wiki/MapReduce). + +> Hint: Have a look at [map()](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.map.html) and [reduceByKey()](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.reduceByKey.html) + +## Task 2 - Cluster Set-up Bash Scripts + +Write Bash Scripts to streamline the process of installing Spark and running the cluster. +> Hint: Try to combine the [subchapter: set up](./set-up.md) + +## Task 3 - Spark and Slurm + +Submit [task 1](#task-1---calculate-pi-using-monte-carlo-algorithm-again) as a Spark job using Slurm. This should be similar to [subchapter: job batching](./job-batching.md) +> Hint: +> - You will need to convert the notebook into a Python file. +> - Compare the content of `$SPARK_HOME/examples/src/main/python/pi.py` and [our Monte Carlo Pi Estimation](./internals.md#monte-carlo-pi-estimation). They both solve the same problem, however, there are stuffs that we don't need to add when directly using `spark-submit`. Why? + +## Task 4 - Data Processing + +In this task, we will start working witha dataframe and try to process a given real-world dataset. + +> The dataset, at around ~100MB, is considered small and not well-suited for Spark utilization (opting for Pandas might be more efficient). Nevertheless, working with this dataset serves as an exercise to understand more about Spark concepts and its capabilities. + +## Task 5 - Spark Machine Learning + +We will use the data from task 4 to build an intro-to-Machine-Learning model, [Linear Regression](https://en.wikipedia.org/wiki/Linear_regression), with [MLlib](https://spark.apache.org/mllib/) diff --git a/src/chapter7/chapter7.md b/src/chapter7/chapter7.md index 25d3de6..f83c84d 100644 --- a/src/chapter7/chapter7.md +++ b/src/chapter7/chapter7.md @@ -1,3 +1,16 @@ -# Parallellisation of Algorithms +# Apache Spark -![under-const](../imgs/under-const.gif) \ No newline at end of file +Apache Spark is an open-source, distributed computing system that has gained immense popularity for its speed, ease of use, and versatility in handling large-scale data processing tasks. Developed to overcome the limitations of the MapReduce paradigm, Spark offers a unified platform for various data processing workloads, including batch processing, real-time data streaming, machine learning, and graph processing. + +Spark provides high-level APIs in languages like Scala, Java, Python, and R, making it accessible to a wide range of developers with different programming backgrounds. + +In this chapter, we will: +- Set up a mini Spark cluster in M3. +- Take a closer look at the internal data structure, specifically Resilient Distributed Datasets (RDDs). +- Explore data processing in Spark and JupyterLab. +- Submit batch jobs utilizing both Slurm and Spark. +- Engage in some challenges. + +> Notes: +> - The material covered in this chapter draws heavily from the [official documentation of Spark 3.5.0](https://spark.apache.org/docs/latest/index.html). +> - Contents of [Setting up a Spark Cluster within M3 via Slurm](./set-up.md#setting-up-a-spark-cluster-within-m3-cluster) and [Submit Spark Job inside Slurm Job](./job-batching.md#job-batching) are both derived from a trial-and-error approach, and doesn't adhere to any official documentation. Consequently, there is a likelihood that it may not be the best practice. Thus, if you've discovered alternative methods or more effective approaches or may be even security vulnerabilities, please don't hesitate to submit a pull request. diff --git a/src/chapter8/data-processing.md b/src/chapter7/data-processing.md similarity index 100% rename from src/chapter8/data-processing.md rename to src/chapter7/data-processing.md diff --git a/src/chapter8/imgs/jupyterlab.png b/src/chapter7/imgs/jupyterlab.png similarity index 100% rename from src/chapter8/imgs/jupyterlab.png rename to src/chapter7/imgs/jupyterlab.png diff --git a/src/chapter8/imgs/spark-architecture.png b/src/chapter7/imgs/spark-architecture.png similarity index 100% rename from src/chapter8/imgs/spark-architecture.png rename to src/chapter7/imgs/spark-architecture.png diff --git a/src/chapter8/imgs/spark-cluster-overview.png b/src/chapter7/imgs/spark-cluster-overview.png similarity index 100% rename from src/chapter8/imgs/spark-cluster-overview.png rename to src/chapter7/imgs/spark-cluster-overview.png diff --git a/src/chapter8/imgs/spark-sql.png b/src/chapter7/imgs/spark-sql.png similarity index 100% rename from src/chapter8/imgs/spark-sql.png rename to src/chapter7/imgs/spark-sql.png diff --git a/src/chapter8/internals.md b/src/chapter7/internals.md similarity index 100% rename from src/chapter8/internals.md rename to src/chapter7/internals.md diff --git a/src/chapter8/job-batching.md b/src/chapter7/job-batching.md similarity index 100% rename from src/chapter8/job-batching.md rename to src/chapter7/job-batching.md diff --git a/src/chapter8/set-up.md b/src/chapter7/set-up.md similarity index 100% rename from src/chapter8/set-up.md rename to src/chapter7/set-up.md diff --git a/src/chapter8/challenges.md b/src/chapter8/challenges.md deleted file mode 100644 index 169d112..0000000 --- a/src/chapter8/challenges.md +++ /dev/null @@ -1,47 +0,0 @@ -# Apache Spark Challenges - -## Overview - -- [Apache Spark Challenges](#apache-spark-challenges) - - [Overview](#overview) - - [Task 1 - Classic Distributed Problem: Token Counting](#task-1---classic-distributed-problem-token-counting) - - [Task 2 - Cluster Set-up Bash Scripts](#task-2---cluster-set-up-bash-scripts) - - [Task 3 - Spark and Slurm](#task-3---spark-and-slurm) - - [Task 4 - Data Processing](#task-4---data-processing) - - [Task 5 - Spark Machine Learning](#task-5---spark-machine-learning) - -> Note: Tasks 1, 2, and 3 closely resemble a **typical workflow** when working with Apache Spark: -> - **Step 1**: Interactively work with a small sample of the problem -> - **Step 2**: Solve and optimize the sample problem -> - **Step 3**: Submit the entire larger problem as a batch job -> - **Step 4**: Analyze the result and, if necessary, repeat steps 1 to 4 -> -> You should employ this workflow into task 4 and task 5 - -## Task 1 - Classic Distributed Problem: Token Counting - -Given a string of tokens, count the number of times each token apprears. You should do this task in an interactive JupyterLab notebook connecting to a Spark cluster. This is a cananical problem of distributed data processing, and often served as an example for [MapReduce Programming Model](https://en.wikipedia.org/wiki/MapReduce). - -> Hint: Have a look at [map()](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.map.html) and [reduceByKey()](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.reduceByKey.html) - -## Task 2 - Cluster Set-up Bash Scripts - -Write Bash Scripts to streamline the process of installing Spark and running the cluster. -> Hint: Try to combine the [subchapter: set up](./set-up.md) - -## Task 3 - Spark and Slurm - -Submit [task 1](#task-1---calculate-pi-using-monte-carlo-algorithm-again) as a Spark job using Slurm. This should be similar to [subchapter: job batching](./job-batching.md) -> Hint: -> - You will need to convert the notebook into a Python file. -> - Compare the content of `$SPARK_HOME/examples/src/main/python/pi.py` and [our Monte Carlo Pi Estimation](./internals.md#monte-carlo-pi-estimation). They both solve the same problem, however, there are stuffs that we don't need to add when directly using `spark-submit`. Why? - -## Task 4 - Data Processing - -In this task, we will start working witha dataframe and try to process a given real-world dataset. - -> The dataset, at around ~100MB, is considered small and not well-suited for Spark utilization (opting for Pandas might be more efficient). Nevertheless, working with this dataset serves as an exercise to understand more about Spark concepts and its capabilities. - -## Task 5 - Spark Machine Learning - -We will use the data from task 4 to build an intro-to-Machine-Learning model, [Linear Regression](https://en.wikipedia.org/wiki/Linear_regression), with [MLlib](https://spark.apache.org/mllib/) diff --git a/src/chapter8/chapter8.md b/src/chapter8/chapter8.md deleted file mode 100644 index f83c84d..0000000 --- a/src/chapter8/chapter8.md +++ /dev/null @@ -1,16 +0,0 @@ -# Apache Spark - -Apache Spark is an open-source, distributed computing system that has gained immense popularity for its speed, ease of use, and versatility in handling large-scale data processing tasks. Developed to overcome the limitations of the MapReduce paradigm, Spark offers a unified platform for various data processing workloads, including batch processing, real-time data streaming, machine learning, and graph processing. - -Spark provides high-level APIs in languages like Scala, Java, Python, and R, making it accessible to a wide range of developers with different programming backgrounds. - -In this chapter, we will: -- Set up a mini Spark cluster in M3. -- Take a closer look at the internal data structure, specifically Resilient Distributed Datasets (RDDs). -- Explore data processing in Spark and JupyterLab. -- Submit batch jobs utilizing both Slurm and Spark. -- Engage in some challenges. - -> Notes: -> - The material covered in this chapter draws heavily from the [official documentation of Spark 3.5.0](https://spark.apache.org/docs/latest/index.html). -> - Contents of [Setting up a Spark Cluster within M3 via Slurm](./set-up.md#setting-up-a-spark-cluster-within-m3-cluster) and [Submit Spark Job inside Slurm Job](./job-batching.md#job-batching) are both derived from a trial-and-error approach, and doesn't adhere to any official documentation. Consequently, there is a likelihood that it may not be the best practice. Thus, if you've discovered alternative methods or more effective approaches or may be even security vulnerabilities, please don't hesitate to submit a pull request.