From e31ad0974e9c7b9ba5acf8f4df2bb6d075114c76 Mon Sep 17 00:00:00 2001 From: Alex Ocanoaia Date: Mon, 27 Oct 2025 19:33:11 +0200 Subject: [PATCH] computer: Modify lab 7 - Fixed linter errors - Fixed grammar errors - Add archive link to lab 7 Signed-off-by: Alex Ocanoaia --- .../questions/child-faults-after-write.md | 4 ++-- .../drills/tasks/page-faults/README.md | 20 +++++++++---------- .../drills/tasks/shared-memory/README.md | 2 +- .../copy-on-write/reading/copy-on-write.md | 16 +++++++-------- chapters/compute/overview/reading/lab7.md | 1 + .../drills/tasks/apache2/README.md | 3 ++- .../reading/processes-threads-apache2.md | 14 ++++++------- .../drills/tasks/mini-shell/README.md | 3 ++- config.yaml | 1 + 9 files changed, 34 insertions(+), 30 deletions(-) create mode 100644 chapters/compute/overview/reading/lab7.md diff --git a/chapters/compute/copy-on-write/drills/questions/child-faults-after-write.md b/chapters/compute/copy-on-write/drills/questions/child-faults-after-write.md index a9ad81ad7c..c0c8b11887 100644 --- a/chapters/compute/copy-on-write/drills/questions/child-faults-after-write.md +++ b/chapters/compute/copy-on-write/drills/questions/child-faults-after-write.md @@ -8,10 +8,10 @@ What causes the page faults registered by the child after the fifth step? + The child writes data to the frames it previously shared with its parent and the copy-on-write mechanism copies and remaps them before writing said data -- Demand paging propagates the lazy allocation of pages from the parent to the child +- Demand paging lazily allocates pages but does not cause these page faults in copy-on-write. - Creating the child process inherently duplicates some frames -- They are caused by the loader forking itself when creating the child process +- They are caused by the `loader` forking itself when creating the child process - They are caused by the `bash` process forking itself when creating the child process diff --git a/chapters/compute/copy-on-write/drills/tasks/page-faults/README.md b/chapters/compute/copy-on-write/drills/tasks/page-faults/README.md index 01341edf78..a543b0208e 100644 --- a/chapters/compute/copy-on-write/drills/tasks/page-faults/README.md +++ b/chapters/compute/copy-on-write/drills/tasks/page-faults/README.md @@ -1,5 +1,6 @@ # Minor and Major Page Faults +Enter the `page-faults/support` directory in the lab archive (or `chapters/compute/copy-on-write/drills/tasks/page-faults/support/page_faults.c` if you are working directly in the repository). The code in `chapters/compute/copy-on-write/drills/tasks/page-faults/support/page_faults.c` generates some minor and major page faults. Open 2 terminals: one in which you will run the program, and one which will monitor the page faults of the program. In the monitoring terminal, run the following command: @@ -9,20 +10,19 @@ watch -n 1 'ps -eo min_flt,maj_flt,cmd | grep ./page_faults | head -n 1' ``` Compile the program and run it in the other terminal. -You must press `enter` one time, before the program will prompt you to press `enter` more times. -Watch the first number on the monitoring terminal; -it increases. -Those are the minor page faults. +You must press `Enter` once before the program prompts you to press it more times. +Watch the first number on the monitoring terminal (**it will increase**). +These are the minor page faults. ## Minor Page Faults -A minor page fault is generated whenever a requested page is present in the physical memory, as a frame, but that frame isn't allocated to the process generating the request. +A minor page fault occurs whenever a requested page is present in physical memory as a frame but has not yet been allocated to the process requesting it. These types of page faults are the most common, and they happen when calling functions from dynamic libraries, allocating heap memory, loading programs, reading files that have been cached, and many more situations. Now back to the program. The monitoring command already starts with some minor page faults, generated when loading the program. -After pressing `enter`, the number increases, because a function from a dynamic library (libc) is fetched when the first `printf()` is executed. +After pressing `Enter`, the number increases, because a function from a dynamic library (libc) is fetched when the first `printf()` is executed. Subsequent calls to functions that are in the same memory page as `printf()` won't generate other page faults. After allocating the 100 Bytes, you might not see the number of page faults increase. @@ -32,7 +32,7 @@ Notice that not all the pages for the 1GB are allocated. They are allocated - and generate page faults - when modified. By now you should know that this mechanism is called [copy-on-write](../../copy-on-write/reading/copy-on-write.md). -Continue with pressing `enter` and observing the effects util you reach opening `file.txt`. +Continue with pressing `Enter` and observing the effects util you reach opening `file.txt`. Note that neither opening a file, getting information about it, nor mapping it in memory using `mmap()`, generate page faults. Also note the `posix_fadvise()` call after the one to `fstat()`. @@ -46,11 +46,11 @@ These types of page faults happen in 2 situations: - a page that was swapped out (to the disk), due to lack of memory, is now accessed - this case is harder to show - the OS needs to read a file from the disk, because the file contents aren't present in the cache - the case we are showing now -Press `enter` to print the file contents. -Note the second number go up in the monitoring terminal. +Press `Enter` to print the file contents. +Note the second number goes up in the monitoring terminal. Comment the `posix_fadvise()` call, recompile the program, and run it again. You won't get any major page fault, because the file contents are cached by the OS, to avoid those page faults. -As a rule, the OS will avoid major page faults whenever possible, because they are very costly in terms of running time. +As a rule, the OS avoids major page faults whenever possible because they are very costly in terms of running time. If you're having difficulties solving this exercise, go through [this](../../../guides/fork-faults.md) reading material. diff --git a/chapters/compute/copy-on-write/drills/tasks/shared-memory/README.md b/chapters/compute/copy-on-write/drills/tasks/shared-memory/README.md index 698dda23ae..aa4b034d2f 100644 --- a/chapters/compute/copy-on-write/drills/tasks/shared-memory/README.md +++ b/chapters/compute/copy-on-write/drills/tasks/shared-memory/README.md @@ -1,6 +1,6 @@ # Shared Memory -Navigate to the `chapters/compute/copy-on-write/drills/tasks/shared-memory/` directory, run `make skels` to generate the `support/` folder, enter the `support/src/` folder, open `shared_memory.c` and go through the practice items below. +Enter the `shared-memory/` directory in the lab archive (or `chapters/compute/copy-on-write/drills/tasks/shared-memory/` if you are working directly in the repository), run `make skels`, then enter `support/src`, open `shared_memory.c` and go through the practice items below. Use the `support/tests/checker.sh` script to check your solution. diff --git a/chapters/compute/copy-on-write/reading/copy-on-write.md b/chapters/compute/copy-on-write/reading/copy-on-write.md index a3088a773b..ffda25d6da 100644 --- a/chapters/compute/copy-on-write/reading/copy-on-write.md +++ b/chapters/compute/copy-on-write/reading/copy-on-write.md @@ -1,20 +1,20 @@ # Copy-on-Write -So far, you know that the parent and child process have separate virtual address spaces. -But how are they created, namely how are they "separated"? -And what about the **PAS (physical address space)**? +So far, you know that the parent and child processes have separate virtual address spaces. +But how are they created, and more specifically, how are they "separated"? +What about the **PAS (physical address space)**? Of course, we would like the stack of the parent, for example, to be physically distinct from that of the child, so they can execute different functions and use different local variables. -But should **all** memory sections from the PAS of the parent be distinct from that of the child? -What about some read-only memory sections, such as `.text` and `.rodata`? -And what about the heap, where the child _may_ use some data previously written by the parent and then override it with its own data. +But should **all** memory sections from the **PAS** of the parent be distinct from that of the child? +What about read-only memory sections, such as `.text` and `.rodata`? +What about the heap, where the child may use data previously written by the parent and then override it with its own data? The answer to all of these questions is a core mechanism of multiprocess operating systems called **Copy-on-Write**. It works according to one very simple principle: > The VAS of the child process initially points to the same PAS as that of the parent. > A (physical) frame is only duplicated by the child when it attempts to **write** data to it. -This ensures that read-only sections remain shared, while writable sections are shared as long as their contents remain unchanged. +This ensures that read-only sections remain shared, while writable sections remain shared until their contents are modified. When changes happen, the process making the change receives a unique frame as a modified copy of the original frame _on demand_. In the image below, we have the state of the child and parent processes right after `fork()` returns in both of them. @@ -32,5 +32,5 @@ For a real-world example of **Copy-on-Write** in action, take a look at [this br **Be careful!** Do not confuse **copy-on-write** with **demand paging**. Remember from the [Data chapter](reading/working-with-memory.md) that **demand paging** means that when you allocate memory, the OS allocates virtual memory that remains unmapped to physical memory until it's used. -On the other hand, **copy-on-write** posits that the virtual memory is already mapped to some frames. +In contrast, **copy-on-write** assumes that virtual memory is already mapped to physical frames. These frames are only duplicated when one of the processes attempts to write data to them. diff --git a/chapters/compute/overview/reading/lab7.md b/chapters/compute/overview/reading/lab7.md new file mode 100644 index 0000000000..4aea0e834c --- /dev/null +++ b/chapters/compute/overview/reading/lab7.md @@ -0,0 +1 @@ +The contents of the lab are located in the [lab archive](https://github.com/cs-pub-ro/operating-systems/raw/refs/heads/lab-archives/Lab_7_Copy_on_Write.zip) and in the [GitHub repository](https://github.com/cs-pub-ro/operating-systems). diff --git a/chapters/compute/processes-threads-apache2/drills/tasks/apache2/README.md b/chapters/compute/processes-threads-apache2/drills/tasks/apache2/README.md index 75d244dea8..8232c3c639 100644 --- a/chapters/compute/processes-threads-apache2/drills/tasks/apache2/README.md +++ b/chapters/compute/processes-threads-apache2/drills/tasks/apache2/README.md @@ -1,6 +1,7 @@ # Investigate `apache2` Using `strace` -Enter the `chapters/compute/processes-threads-apache2/drills/tasks/apache2/support/` folder and go through the practice items below. +Enter the `apache2/` directory in the lab archive (or `chapters/compute/processes-threads-apache2/drills/tasks/apache2/` if you are working directly in the repository), run `make skels`, then enter `support/`. +Go through the practice items below. 1. Use `make run` to start the container. Use `strace` inside the container to discover the server document root. diff --git a/chapters/compute/processes-threads-apache2/reading/processes-threads-apache2.md b/chapters/compute/processes-threads-apache2/reading/processes-threads-apache2.md index acb5321c56..da4f73b15b 100644 --- a/chapters/compute/processes-threads-apache2/reading/processes-threads-apache2.md +++ b/chapters/compute/processes-threads-apache2/reading/processes-threads-apache2.md @@ -1,26 +1,26 @@ # Usage of Processes and Threads in `apache2` We'll take a look at how a real-world application - the `apache2` HTTP server - makes use of processes and threads. -Since the server must be able to handle multiple clients at the same time, it must therefore use some form of concurrency. +Since the server must handle multiple clients simultaneously, it therefore needs to use some form of concurrency. When a new client arrives, the server offloads the work of interacting with that client to another process or thread. -The choice of whether to use multiple processes or threads is not baked into the code. +The choice of using multiple processes or threads is not baked into the code. Instead, `apache2` provides a couple of modules called MPMs (Multi-Processing Modules). Each module implements a different concurrency model, and the users can pick whatever module best fits their needs by editing the server configuration files. The most common MPMs are -- `prefork`: there are multiple worker processes, each process is single-threaded and handles one client request at a time -- `worker`: there are multiple worker processes, each process is multi-threaded, and each thread handles one client request at a time -- `event`: same as `worker` but designed to better handle some particular use cases +- `prefork`: multiple worker processes, each single-threaded and handling one client request at a time. +- `worker`: multiple worker processes, each multi-threaded, with each thread handling one client request at a time. +- `event`: similar to `worker`, but designed to handle certain use cases more efficiently. -In principle, `prefork` provides more stability and backwards compatibility, but it has a bigger overhead. +In principle, `prefork` provides more stability and backward compatibility, but it has a bigger overhead. On the other hand, `worker` and `event` are more scalable, and thus able to handle more simultaneous connections, due to the usage of threads. On modern systems, `event` is almost always the default. ## Conclusion -So far, you've probably seen that spawning a process can "use" a different program (hence the path in the args of `system` or `Popen`), but some languages such as Python allow you to spawn a process that executes a function from the same script. +So far, you've probably seen that spawning a process can "use" a different program (hence the path in the args of `system` or `Popen`), but some languages, such as Python, allow you to spawn a process that executes a function from the same script. A thread, however, can only start from a certain entry point **within the current address space**, as it is bound to the same process. Concretely, a process is but a group of threads. For this reason, when we talk about scheduling or synchronization, we talk about threads. diff --git a/chapters/compute/processes/drills/tasks/mini-shell/README.md b/chapters/compute/processes/drills/tasks/mini-shell/README.md index 10355b2664..a502275c85 100644 --- a/chapters/compute/processes/drills/tasks/mini-shell/README.md +++ b/chapters/compute/processes/drills/tasks/mini-shell/README.md @@ -3,7 +3,8 @@ As you might remember, to create a new process you need to use `fork` (or `clone`) and `exec` system calls. If you don't, take a look at [what happens under the hood when you use `system`](../../../guides/system-dissected.md). -Enter the `chapters/compute/processes/drills/tasks/mini-shell` directory, run `make skels`, open the `support/src` folder and go through the practice items below. +Enter the `mini-shell/` directory in the lab archive (or `chapters/compute/processes/drills/tasks/mini-shell` if you are working directly in the repository), run `make skels`, then enter `support/src`. +Go through the practice items below. Use the `tests/checker.sh` script to check your solutions. diff --git a/config.yaml b/config.yaml index 5fc7d36f30..2730e3705a 100644 --- a/config.yaml +++ b/config.yaml @@ -101,6 +101,7 @@ lab_structure: - title: Lab 7 - Copy-on-Write filename: lab7.md content: + - reading/lab7.md - tasks/page-faults.md - tasks/mini-shell.md - tasks/apache2.md