TryHackMe Walkthrough: CVE-2026–31431 — Copy-Fail

Introduction

Hibullahi AbdulAzeez

~6 min read · May 9, 2026 (Updated: May 9, 2026) · Free: Yes

Introduction

Copy-Fail (CVE-2026–31431) is a Linux kernel Local Privilege Escalation vulnerability that lets any unprivileged local user get a root shell in seconds — using nothing but a 732-byte Python script with zero external dependencies.

What makes it scary? No race condition. No kernel offsets. No special tools. It just works.

What Is the Page Cache? (The Core Concept)

Before diving into the exploit, you need to understand one key Linux concept: the page cache.

When Linux reads a file from disk, it caches that file in memory (RAM). Every process on the system reads from that same cached copy — not the disk directly.

Here's the key insight:

If you corrupt the in-memory page cache copy of a file, every process that runs that file executes your corrupted version — but the on-disk file stays completely clean.

This means:

sha256sum /usr/bin/su → ✅ hash looks fine
AIDE, Tripwire, IMA → ✅ all report clean
What actually runs → ❌ your shellcode

File integrity tools read from disk. The exploit writes to memory. They never see each other.

The Four Components That Create the Vulnerability

Copy-Fail isn't a bug in one place — it's what happens when four normal kernel components interact in an unexpected way:

1. The Page Cache

Shared memory holding cached file contents. Corrupting it affects every process without touching disk.

2. AF_ALG — The Kernel Crypto Socket

A Linux socket interface (socket family 38) that exposes the kernel's cryptographic operations to userspace. The critical detail: it requires zero special privileges. Any normal user can open one.

3. authencesn — The Scratch Write

authencesn is a kernel AEAD template used for IPsec. During decryption, it writes 4 bytes into the output buffer before verifying the HMAC tag. If the HMAC check fails (and the exploit deliberately makes it fail), the write already happened — there's no rollback. The attacker controls both the value written and where it lands.

4. splice() — The Bridge

splice() moves data between file descriptors by transferring page references, not copies. When you splice /usr/bin/su into an AF_ALG socket, the crypto pipeline now holds a direct reference to the live page cache pages of that binary.

The 2017 Optimisation That Broke Everything

In 2017 (kernel 4.14, commit 72548b093ee3), the AF_ALG AEAD code was optimised to use a single scatterlist instead of separate source and destination (req->src = req->dst).

For normal writes, fine. But for data arriving via splice() from a file? Those pages are the kernel's own live page cache. Setting output = input meant the output of the crypto operation now pointed directly at the file's page cache pages.

Result: the authencesn scratch write now writes shellcode directly into the kernel's page cache. Nine years passed before anyone noticed.

Comparison: Dirty Pipe vs Copy-Fail

Property Dirty Pipe (CVE-2022–0847) Copy-Fail (CVE-2026–31431) Kernel subsystem pipe / splice AF_ALG crypto / splice Write primitive Arbitrary via pipe flag bug Controlled 4-byte via authencesn Race condition required Yes No Kernel offsets needed No No On-disk file changed No No File integrity bypass Yes Yes

The "No race condition" row is the game-changer. Dirty Pipe could fail and need retries. Copy-Fail is deterministic — it either lands cleanly or returns an error with no side effects.

Task 2: How the Exploit Works

The exploit repeats a simple loop ~40 times:

Opens an AF_ALG socket bound to authencesn(hmac(sha256),cbc(aes))
Calculates the target offset in /usr/bin/su
Constructs the message so seqno_lo carries the shellcode bytes to write
Calls splice() to feed page cache pages from /usr/bin/su into the socket
Calls recvmsg() → triggers the authencesn scratch write → 4 bytes of shellcode land in the page cache
Repeats until ~40 chunks of shellcode are written
Calls execve("/usr/bin/su") → kernel loads from corrupted cache → shellcode runs → root shell

Answers for Task 2:

What year was the optimisation introduced? → 2017
Which AEAD algorithm template performs the scratch write? → authencesn
What system call transfers page cache pages without copying? → splice
Does the HMAC failure undo the page cache corruption? → Nay

Task 3: Running the Exploit

Step 1 — Confirm You're Unprivileged

id

Expected output:

uid=1001(karen) gid=1001(karen) groups=1001(karen)

Step 2 — Inspect the Script

head -30 /home/karen/exploit.py

Pure Python standard library — os, socket, zlib. No pip. No compiled code. Runs on any Python 3.10+.

Step 3 — Run the Exploit

python3 /home/karen/exploit.py

Watch it write 4 bytes at a time. After ~40 iterations, you get dropped into a root shell.

# whoami
uid=0(root) gid=0(root) groups=0(root)

Step 4 — Read the Flag

cat /root/flag.txt

Flag: THM{copy_fail_kernel_lpe}

Step 5 — The sha256sum Moment

sha256sum /usr/bin/su

The hash matches a clean, unmodified binary. AIDE and Tripwire would report nothing wrong. Yet you just got root through that exact binary. This is the point of the room — file integrity tools are blind to page cache attacks.

Step 6 — Exit and Cleanup

exit

The exploit calls posix_fadvise(POSIX_FADV_DONTNEED) on cleanup, which evicts the corrupted pages. The original clean binary reloads from disk. By the time you exit, there's nothing to find.

Task 4: Detection and Mitigation

How Do You Detect It?

Since the disk is never touched and the page cache is clean after exploitation, filesystem monitoring is useless. Detection must happen during the attack, by watching process behaviour.

Key signals to monitor:

Syscall What to Watch For socket(AF_ALG, ...) Any unexpected process — fires before corruption splice() Splice from a setuid binary FD into a socket FD recvmsg() on AF_ALG fd ~40 calls in seconds from same PID posix_fadvise(DONTNEED) Called on a setuid binary after AF_ALG activity

Legitimate processes that use AF_ALG AEAD sockets (your allowlist): cryptsetup, systemd-cryptsetup, kcapi-enc, kcapi-dgst, kcapi-mac, kcapi-speed, charon, charon-systemd

Anything outside that list creating 40+ AF_ALG sockets in seconds is almost certainly running this exploit.

auditd rules:

-a always,exit -F arch=b64 -S socket -F a0=38 -k copy_fail_af_alg
-a always,exit -F arch=b64 -S splice -k copy_fail_splice

Mitigation (Ubuntu/Debian)

The permanent fix is updating to kernel 6.18.22, 6.19.12, or 7.0. Until your distro ships the patch, blacklist the vulnerable module:

Step 1 — Check if algif_aead is a loadable module:

modinfo algif_aead

Answer: modinfo algif_aead

Step 2 — Blacklist it:

echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/disable-algif-aead.conf
sudo rmmod algif_aead 2>/dev/null || true

Step 3 — Verify the block:

sudo modprobe algif_aead
# Should return an error

Step 4 — Confirm the PoC fails:

python3 /home/karen/exploit.py
# Fails at the first socket() call — exploit chain never executes

⚠️ Important: This modprobe approach only works on Ubuntu/Debian where algif_aead is a loadable module. On RHEL/CentOS/AlmaLinux it's compiled into the kernel — use grubby instead:

sudo grubby --update-kernel=ALL --args="initcall_blacklist=algif_aead_init"
sudo reboot

Key Takeaways

The page cache is shared infrastructure. One process corrupting it affects the whole system — including containers sharing the same kernel.
No race condition = rapid weaponisation. Public reimplementations in C, Rust, Go, and arm64 appeared on GitHub within 24 hours of disclosure.
File integrity tools (AIDE, Tripwire, IMA) are blind to this class of attack. They hash from disk. This exploit writes to memory.
Detection requires syscall-level monitoring — auditd watching socket(AF_ALG), eBPF/Falco tracking process behaviour, or kernel telemetry correlating recvmsg() volume with execve() of a setuid binary.
Patch as soon as your distro ships it. Use the modprobe blacklist as an interim control, not a permanent fix.

MITRE ATT&CK Mapping

Technique ID Signal Local Privilege Escalation via kernel flaw T1068 AF_ALG socket by unexpected process Escape to Host from Container T1611 Container ID in Falco alert Setuid binary abuse T1548.001 execve of setuid binary after AF_ALG activity Indicator Removal via page eviction T1070 posix_fadvise(DONTNEED) on setuid binary

Room completed on TryHackMe. CVE-2026–31431 was reported by Taeyang Lee at Theori, disclosed publicly on 29 April 2026. The mainline kernel fix landed as commit a664bf3d603d on 1 April 2026.

#cve #tryhackme-walkthrough #tryhackme #copyfail

< Go to the original

TryHackMe Walkthrough: CVE-2026–31431 — Copy-Fail

Introduction

Introduction

What Is the Page Cache? (The Core Concept)

The Four Components That Create the Vulnerability

1. The Page Cache

2. AF_ALG — The Kernel Crypto Socket

3. authencesn — The Scratch Write

4. splice() — The Bridge

The 2017 Optimisation That Broke Everything

Comparison: Dirty Pipe vs Copy-Fail

Task 2: How the Exploit Works

Task 3: Running the Exploit

Step 1 — Confirm You're Unprivileged

Step 2 — Inspect the Script

Step 3 — Run the Exploit

Step 4 — Read the Flag

Step 5 — The sha256sum Moment

Step 6 — Exit and Cleanup

Task 4: Detection and Mitigation

How Do You Detect It?

Mitigation (Ubuntu/Debian)

Key Takeaways

MITRE ATT&CK Mapping

Reporting a Problem