Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ What causes the page faults registered by the child after the fifth step?

+ The child writes data to the frames it previously shared with its parent and the copy-on-write mechanism copies and remaps them before writing said data

- Demand paging propagates the lazy allocation of pages from the parent to the child
- Demand paging lazily allocates pages but does not cause these page faults in copy-on-write.

- Creating the child process inherently duplicates some frames

- They are caused by the loader forking itself when creating the child process
- They are caused by the `loader` forking itself when creating the child process

- They are caused by the `bash` process forking itself when creating the child process
20 changes: 10 additions & 10 deletions chapters/compute/copy-on-write/drills/tasks/page-faults/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Minor and Major Page Faults

Enter the `page-faults/support` directory in the lab archive (or `chapters/compute/copy-on-write/drills/tasks/page-faults/support/page_faults.c` if you are working directly in the repository).
The code in `chapters/compute/copy-on-write/drills/tasks/page-faults/support/page_faults.c` generates some minor and major page faults.
Open 2 terminals: one in which you will run the program, and one which will monitor the page faults of the program.
In the monitoring terminal, run the following command:
Expand All @@ -9,20 +10,19 @@ watch -n 1 'ps -eo min_flt,maj_flt,cmd | grep ./page_faults | head -n 1'
```

Compile the program and run it in the other terminal.
You must press `enter` one time, before the program will prompt you to press `enter` more times.
Watch the first number on the monitoring terminal;
it increases.
Those are the minor page faults.
You must press `Enter` once before the program prompts you to press it more times.
Watch the first number on the monitoring terminal (**it will increase**).
These are the minor page faults.

## Minor Page Faults

A minor page fault is generated whenever a requested page is present in the physical memory, as a frame, but that frame isn't allocated to the process generating the request.
A minor page fault occurs whenever a requested page is present in physical memory as a frame but has not yet been allocated to the process requesting it.
These types of page faults are the most common, and they happen when calling functions from dynamic libraries, allocating heap memory, loading programs, reading files that have been cached, and many more situations.
Now back to the program.

The monitoring command already starts with some minor page faults, generated when loading the program.

After pressing `enter`, the number increases, because a function from a dynamic library (libc) is fetched when the first `printf()` is executed.
After pressing `Enter`, the number increases, because a function from a dynamic library (libc) is fetched when the first `printf()` is executed.
Subsequent calls to functions that are in the same memory page as `printf()` won't generate other page faults.

After allocating the 100 Bytes, you might not see the number of page faults increase.
Expand All @@ -32,7 +32,7 @@ Notice that not all the pages for the 1GB are allocated.
They are allocated - and generate page faults - when modified.
By now you should know that this mechanism is called [copy-on-write](../../copy-on-write/reading/copy-on-write.md).

Continue with pressing `enter` and observing the effects util you reach opening `file.txt`.
Continue with pressing `Enter` and observing the effects util you reach opening `file.txt`.

Note that neither opening a file, getting information about it, nor mapping it in memory using `mmap()`, generate page faults.
Also note the `posix_fadvise()` call after the one to `fstat()`.
Expand All @@ -46,11 +46,11 @@ These types of page faults happen in 2 situations:
- a page that was swapped out (to the disk), due to lack of memory, is now accessed - this case is harder to show
- the OS needs to read a file from the disk, because the file contents aren't present in the cache - the case we are showing now

Press `enter` to print the file contents.
Note the second number go up in the monitoring terminal.
Press `Enter` to print the file contents.
Note the second number goes up in the monitoring terminal.

Comment the `posix_fadvise()` call, recompile the program, and run it again.
You won't get any major page fault, because the file contents are cached by the OS, to avoid those page faults.
As a rule, the OS will avoid major page faults whenever possible, because they are very costly in terms of running time.
As a rule, the OS avoids major page faults whenever possible because they are very costly in terms of running time.

If you're having difficulties solving this exercise, go through [this](../../../guides/fork-faults.md) reading material.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Shared Memory

Navigate to the `chapters/compute/copy-on-write/drills/tasks/shared-memory/` directory, run `make skels` to generate the `support/` folder, enter the `support/src/` folder, open `shared_memory.c` and go through the practice items below.
Enter the `shared-memory/` directory in the lab archive (or `chapters/compute/copy-on-write/drills/tasks/shared-memory/` if you are working directly in the repository), run `make skels`, then enter `support/src`, open `shared_memory.c` and go through the practice items below.

Use the `support/tests/checker.sh` script to check your solution.

Expand Down
16 changes: 8 additions & 8 deletions chapters/compute/copy-on-write/reading/copy-on-write.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# Copy-on-Write

So far, you know that the parent and child process have separate virtual address spaces.
But how are they created, namely how are they "separated"?
And what about the **PAS (physical address space)**?
So far, you know that the parent and child processes have separate virtual address spaces.
But how are they created, and more specifically, how are they "separated"?
What about the **PAS (physical address space)**?
Of course, we would like the stack of the parent, for example, to be physically distinct from that of the child, so they can execute different functions and use different local variables.

But should **all** memory sections from the PAS of the parent be distinct from that of the child?
What about some read-only memory sections, such as `.text` and `.rodata`?
And what about the heap, where the child _may_ use some data previously written by the parent and then override it with its own data.
But should **all** memory sections from the **PAS** of the parent be distinct from that of the child?
What about read-only memory sections, such as `.text` and `.rodata`?
What about the heap, where the child may use data previously written by the parent and then override it with its own data?

The answer to all of these questions is a core mechanism of multiprocess operating systems called **Copy-on-Write**.
It works according to one very simple principle:
> The VAS of the child process initially points to the same PAS as that of the parent.
> A (physical) frame is only duplicated by the child when it attempts to **write** data to it.

This ensures that read-only sections remain shared, while writable sections are shared as long as their contents remain unchanged.
This ensures that read-only sections remain shared, while writable sections remain shared until their contents are modified.
When changes happen, the process making the change receives a unique frame as a modified copy of the original frame _on demand_.

In the image below, we have the state of the child and parent processes right after `fork()` returns in both of them.
Expand All @@ -32,5 +32,5 @@ For a real-world example of **Copy-on-Write** in action, take a look at [this br
**Be careful!**
Do not confuse **copy-on-write** with **demand paging**.
Remember from the [Data chapter](reading/working-with-memory.md) that **demand paging** means that when you allocate memory, the OS allocates virtual memory that remains unmapped to physical memory until it's used.
On the other hand, **copy-on-write** posits that the virtual memory is already mapped to some frames.
In contrast, **copy-on-write** assumes that virtual memory is already mapped to physical frames.
These frames are only duplicated when one of the processes attempts to write data to them.
1 change: 1 addition & 0 deletions chapters/compute/overview/reading/lab7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The contents of the lab are located in the [lab archive](https://github.com/cs-pub-ro/operating-systems/raw/refs/heads/lab-archives/Lab_7_Copy_on_Write.zip) and in the [GitHub repository](https://github.com/cs-pub-ro/operating-systems).
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Investigate `apache2` Using `strace`

Enter the `chapters/compute/processes-threads-apache2/drills/tasks/apache2/support/` folder and go through the practice items below.
Enter the `apache2/` directory in the lab archive (or `chapters/compute/processes-threads-apache2/drills/tasks/apache2/` if you are working directly in the repository), run `make skels`, then enter `support/`.
Go through the practice items below.

1. Use `make run` to start the container.
Use `strace` inside the container to discover the server document root.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
# Usage of Processes and Threads in `apache2`

We'll take a look at how a real-world application - the `apache2` HTTP server - makes use of processes and threads.
Since the server must be able to handle multiple clients at the same time, it must therefore use some form of concurrency.
Since the server must handle multiple clients simultaneously, it therefore needs to use some form of concurrency.
When a new client arrives, the server offloads the work of interacting with that client to another process or thread.

The choice of whether to use multiple processes or threads is not baked into the code.
The choice of using multiple processes or threads is not baked into the code.
Instead, `apache2` provides a couple of modules called MPMs (Multi-Processing Modules).
Each module implements a different concurrency model, and the users can pick whatever module best fits their needs by editing the server configuration files.

The most common MPMs are

- `prefork`: there are multiple worker processes, each process is single-threaded and handles one client request at a time
- `worker`: there are multiple worker processes, each process is multi-threaded, and each thread handles one client request at a time
- `event`: same as `worker` but designed to better handle some particular use cases
- `prefork`: multiple worker processes, each single-threaded and handling one client request at a time.
- `worker`: multiple worker processes, each multi-threaded, with each thread handling one client request at a time.
- `event`: similar to `worker`, but designed to handle certain use cases more efficiently.

In principle, `prefork` provides more stability and backwards compatibility, but it has a bigger overhead.
In principle, `prefork` provides more stability and backward compatibility, but it has a bigger overhead.
On the other hand, `worker` and `event` are more scalable, and thus able to handle more simultaneous connections, due to the usage of threads.
On modern systems, `event` is almost always the default.

## Conclusion

So far, you've probably seen that spawning a process can "use" a different program (hence the path in the args of `system` or `Popen`), but some languages such as Python allow you to spawn a process that executes a function from the same script.
So far, you've probably seen that spawning a process can "use" a different program (hence the path in the args of `system` or `Popen`), but some languages, such as Python, allow you to spawn a process that executes a function from the same script.
A thread, however, can only start from a certain entry point **within the current address space**, as it is bound to the same process.
Concretely, a process is but a group of threads.
For this reason, when we talk about scheduling or synchronization, we talk about threads.
Expand Down
3 changes: 2 additions & 1 deletion chapters/compute/processes/drills/tasks/mini-shell/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
As you might remember, to create a new process you need to use `fork` (or `clone`) and `exec` system calls.
If you don't, take a look at [what happens under the hood when you use `system`](../../../guides/system-dissected.md).

Enter the `chapters/compute/processes/drills/tasks/mini-shell` directory, run `make skels`, open the `support/src` folder and go through the practice items below.
Enter the `mini-shell/` directory in the lab archive (or `chapters/compute/processes/drills/tasks/mini-shell` if you are working directly in the repository), run `make skels`, then enter `support/src`.
Go through the practice items below.

Use the `tests/checker.sh` script to check your solutions.

Expand Down
1 change: 1 addition & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ lab_structure:
- title: Lab 7 - Copy-on-Write
filename: lab7.md
content:
- reading/lab7.md
- tasks/page-faults.md
- tasks/mini-shell.md
- tasks/apache2.md
Expand Down