The Bug Class - What is Page Cache Writes are in Linux (Beginner to Advanced) PART 01

what is a page cache in first place?

In Linux almost everything is a file. When Linux reads a file from a disk(hard disk) it doesn't keep going back to disk every time to read a file, that would be painfully slow, instead it loads the file contents to RAM(random access memory) and keep it there a place called called Page Cache. The older ancestor name was Buffer cache, Every file you open or every library your program loads is sitting in the this place known as page cache in the RAM.

There are three main activities related to the page cache.

adding a page when accessing a file portion not already in the cache.
removing a page when the cache gets too big.
finding the page including a given file offset.

I know sounds scary some words, but stick with me, because once you understand this its very fascinating.

Don't get confused of this above image, i'll explain everything , for this article you have to know what is a Byte or a KB and decimal, binary, hexadecimal these things simply. now let me explain,

File on Disk

As you can see there is a section called "File on Disk". as the name suggests its a file that is on the hard disk. and what is this random hexadecimal characters(0x00000000, …) that are below that, This is called a file offset a position of a byte, Lets say this text file size is 11 bytes. each byte represents a position within the file. if this file contains word "HelloWorld"

Byte: 0 1 2 3 4 5 6 7 8 9 Data: H e l l o W o r l d

and this byte 0 represents word "H" and byte 1 represents word "e" and so on. in Linux this numbers start from 0. and when write in decimal it goes like

decimal
0
4096
  8192
12288

or in hexadecimal
0x0000
0x1000
0x2000
0x3000

decimal
0
4096
  8192
12288

or in hexadecimal
0x0000
0x1000
0x2000
0x3000

this text file size is 10 bytes, and the address for each character can write like below

H -->  0x0000000
e -->  0x0000001
l -->  0x0000002
l -->  0x0000003
o -->  0x0000004
W -->  0x0000005
o -->  0x0000006
r -->  0x0000007
l -->  0x0000008
d -->  0x0000009

H -->  0x0000000
e -->  0x0000001
l -->  0x0000002
l -->  0x0000003
o -->  0x0000004
W -->  0x0000005
o -->  0x0000006
r -->  0x0000007
l -->  0x0000008
d -->  0x0000009

you may wonder why use hexadecimal in the first place, because computers use binaries 1 and 0, but binary is messy for a humans to work with, and imagine a file offset address look like this

00000000000000000000000000010010

00000000000000000000000000010010

this address is hard to read for a human and to debug it, so hexadecimal is used here, here is how

Binary   Hexadecimal
0000     0
0001     1
0010     2
1111     F
1010     A

Binary   Hexadecimal
0000     0
0001     1
0010     2
1111     F
1010     A

still not understand, here a huge one

binary address-              1101001110110100
split into 4 by 4 sections - 1101 0011 1011 0100
covert to hexadecimal -      D 3 B 4
so the address is -          0xD3B4

binary address-              1101001110110100
split into 4 by 4 sections - 1101 0011 1011 0100
covert to hexadecimal -      D 3 B 4
so the address is -          0xD3B4

much easier to read and understand. but that is not the only reason why use hexadecimal. it is also widely used by the OS. In the main image above, under the page cache section, you can see page 0, page 1, and so on. This means that in the RAM, file data is divided into fixed size blocks called pages. A single page is usually 4 KB (4096 bytes). These pages are arranged sequentially, so the first starts at 0, the next at 4096, then 8192, and so on. You may wonder why 4 KB is used. Answer is it's chosen by CPU & OS design mainly because of efficiency and overhead. A page has to big enough to keep the bookkeeping and small enough to reduce wasting memory.

in files

offset address means = a file 1 byte address (0x1000)
a pages address means = a block of offset addresses (0x0000 to 0x1000)
        page 0 = 0x0000 to 0x0FFF
        page 1 = 0x1000 to 0X1FFF
        page 2 = 0x2000 to 0x2FFF

in files

offset address means = a file 1 byte address (0x1000)
a pages address means = a block of offset addresses (0x0000 to 0x1000)
        page 0 = 0x0000 to 0x0FFF
        page 1 = 0x1000 to 0X1FFF
        page 2 = 0x2000 to 0x2FFF

How does kernel know if this file is in page cache or not

When Linux reads a file from a hard disk it asks where in the file to read. where in the file, it means the offset address. In linux there a specific identifier for each file exists in there, its called Inode, you can check any file inode like below

this file inode is 427437. Inode is a data structure. and this inode contains metadata like

file size
file permissions (rwx)
owner(user/group)
timestamps(created, modified)
where file is stored inthe disk

file size
file permissions (rwx)
owner(user/group)
timestamps(created, modified)
where file is stored inthe disk

you can't directly see inode metadata as a file, but you can see whats in that inode metadata like below

back to topic. how kernel know file is in page cache or not. Kernel takes inode and the offset address (inode + offset address) of the file and run a function page_hash(). it spits out a hash table that is stored in the RAM.

Okay let me back up a little bit and explain this thing, imagine you have 01 million books in your library, and some guy come to you and ask "do you have a book called To Kill a Mockingbird" and you have a 2 options to know whether it exists in your library or not.

go through each book until find book "To Kill a Mockingbird" and you will eventually find it. but it will take you forever
Or when you first receive a book you will run its title through formula and it will generate answer of the shelf and it will tell you exactly which shelf that book is in and when someone come to you and ask this book exist you run that title same as that formula TA DA and you get the answer. this formula doesnt have need to be clear, but it has to be consistent, every time you enter title it generates the same results.

page_hash("To Kill a Mockingbird") == sum of the characters in the title % 1000

this gives us answer like between 0 to 999. and there is a problem lets say a a formula generated number 230 its the shelf number and a different book can also be generate that shelf number. this can be solved by going to that shelf and create a list of books containing in that shelf.

so how above scenario exactly works in our scenario.

First kernel takes inode (book title) + offset (1000) run it through the page_hash() function and it gives a large table called page_hash_table (shelf) and to know exactly the where it runs that listing by chaining functions(this is called page descriptor) they are called next_hash(the next ) and pprev_hash(the backward),

If file is in the page cache? YES then serve it from the RAM, not found then go back to the disk get that file & store it in page cache then serve.

XARRAY

also note this page_hash_table was used in 1999 to 2000 linux kernels. linux kernel version 2.4. after 2.4 kernels used radix tree and modern kernels (version 4.20+) uses XARRY and let me explain what it is and how it works.

think XARRAY like a normal array(checkout this link) instead its sparse array. xarray is per file like inode. there is no inode + offset and calculate through a function here. that's the interesting part, so instead of like old inode+offset in a file Xarray is per file no, it is deal with the offsets in the file itself.

inode → gets you to the right file's address_space
address_space → has its own XArray
XArray → indexed by offset → gives you the page

inode → gets you to the right file's address_space
address_space → has its own XArray
XArray → indexed by offset → gives you the page

what is the address_space???

XARRAY predecessor

This is a normal array we know
index: [0] [1] [2] [3] [4] [5] [6]
value:  H  e   l   l    o   w   o

This is a normal array we know
index: [0] [1] [2] [3] [4] [5] [6]
value:  H  e   l   l    o   w   o

lets say above array is a file.txt that contain that "H e l l o w o" this word, how many byes offset? yeah its 7 bytes, lets say if file is 2GB and that array has to be extremely large and we cant pre allocate the memory for array that big and the second problem is offsets isn't just a page number. and it can get very large number, kernel uses the page index(page 0, page 1, page 2 …) which can go larger in modern 64 bit systems. that type of array can be very large we cannot have that. and the solution we need something that behave like a array but not indexes, values out but doesnt allocate spaces for empty entries. This xarray actually backed by previous predecessor radix tree. dont overthink it, think of it like a array where indexes are stored as a chunks.

Alright i'll explain simple as possible

slots:  [0]  [1]  [2]  [3]  [4]  [5]
pages:   p0   p1   p2   p3   p4   p5

slots:  [0]  [1]  [2]  [3]  [4]  [5]
pages:   p0   p1   p2   p3   p4   p5

Say if you want to get the value of the page 3, you can simply do array[3] and can get like that. but there is a problem. it is the memory. if use 1 GB file it has 264,144 pages and array that long has to have 264,144 slots that has to be pre allocate even before you cache a single page. and most of this slots will be NULL(free/nothing)so its a waste. And if you take a look at modern OS that is 64 bit the page indexes is large. you cannot have that big array simply, so what should we do then

The solution

instead of one big array we build a small tree. i know the term tree is confusing, hang on. For example lets say you are in a hotel and your room 85. floor number is starting from 0. each floor has rooms 0–63 and to go to you calculate like this

85/64 → 1
85%64 → 21

85/64 → 1
85%64 → 21

go to 1st floor and and 21st room. that is your room. do you realize what is levels you go through to find this,

building
├── floor 0  (rooms 0-63)    → pages 0 to 63
├── floor 1  (rooms 0-63)    → pages 64 to 127
├── floor 2  (rooms 0-63)    → pages 128 to 191
.
.
└── floor 63 (rooms 0-63)    → pages 4032 to 4095

building
├── floor 0  (rooms 0-63)    → pages 0 to 63
├── floor 1  (rooms 0-63)    → pages 64 to 127
├── floor 2  (rooms 0-63)    → pages 128 to 191
.
.
└── floor 63 (rooms 0-63)    → pages 4032 to 4095

the building is called root level or in our case ROOT NODE. and the floors are called CHILD NODES and rooms are called SLOTS actual pages.

the thing is floors doesn't exist yet. assume you have a file that has only 3 pages(i'm discussing how to see this pages in below practical, i recommend you to go there and comeback to this part). so 3 pages are cached. page 0, page 1 and page 70. in the building story,

Floor 0 exists ---> it has room 0 and room 01 occupied(page 0 and page 1 is here)
Floor 1 exists ---> it has room number 6 occupied(page 70 is here)
Floor 2 to 63 doenst exist at all ---> they havent built it yet.

Floor 0 exists ---> it has room 0 and room 01 occupied(page 0 and page 1 is here)
Floor 1 exists ---> it has room number 6 occupied(page 70 is here)
Floor 2 to 63 doenst exist at all ---> they havent built it yet.

and what happen when you ask pages that hasnt been cached yet like page 500.

500/63 --> floor 7
500%63 --> room 52

500/63 --> floor 7
500%63 --> room 52

you go to this entry and see there is nothing NULL. floor 7 doesnt exist. Page 500 isnt in the page cache go read from the disk

the whole building is the XARRAY. its a very complicated subject and i'll write whole another article about this for you. for now i believe this is enough,

TIP: In networking if you know how a TCP packets keep sequence this is little similar to that.

ADDRESS_SPACE

Originally how this solves that 1GB file problem. before that i need to point to you something where does this XArray lives? In Address_space.

you know,

inode → the kernel unique id for each file
page cache → store file data in RAM
Xarray → holds the cached pages, one xarray per file

the linux is written mostly by C programming language. in C there a thing called struct. it stores group of multiple related variables(read here about variables). And this address space is also a struct. it represents the relationship between RAM & file. also its the container that holds the XARRAY for a file(every file has xarray like it has inode). how to write a struct,

int id;
char name[20];
int age;

// these are multiple variables

struct Person {
    int id;
    char name[20];
    int age;
};

// and this is a struct that give them to live together as a one object

int id;
char name[20];
int age;

// these are multiple variables

struct Person {
    int id;
    char name[20];
    int age;
};

// and this is a struct that give them to live together as a one object

so the struct for out address_space look like this,

struct address_space {
    struct inode        *host;        // which inode owns this
    struct xarray        i_pages;     // the XArray all cached pages live here
    const struct address_space_operations *a_ops;  // function pointers
    unsigned long        nrpages;     // how many pages are currently cached
};

struct address_space {
    struct inode        *host;        // which inode owns this
    struct xarray        i_pages;     // the XArray all cached pages live here
    const struct address_space_operations *a_ops;  // function pointers
    unsigned long        nrpages;     // how many pages are currently cached
};

these are the main fields you need to know.

here is the full path when you read a file means this is the function that will get called — — read(fd, buf, count)

1. kernel resolves fd → file struct → inode
2. inode->i_mapping gives the address_space
3. compute page index = file_offset / 4096
4. xa_load(&mapping->i_pages, index) — look in XArray
5a. page found → copy data to userspace buffer → done
5b. page not found → call mapping->a_ops->read_folio()
    → that function goes to disk, reads 4KB,
    → stores result in a new page
    → inserts that page into the XArray at the right index
    → copy data to userspace buffer → done

1. kernel resolves fd → file struct → inode
2. inode->i_mapping gives the address_space
3. compute page index = file_offset / 4096
4. xa_load(&mapping->i_pages, index) — look in XArray
5a. page found → copy data to userspace buffer → done
5b. page not found → call mapping->a_ops->read_folio()
    → that function goes to disk, reads 4KB,
    → stores result in a new page
    → inserts that page into the XArray at the right index
    → copy data to userspace buffer → done

PRO tip: Userspace is where applications run with limited privileges, while kernel space is where the operating system core runs with full hardware access and control.

Practical

It's getting boring, Lets do some practical here and see how this works. open up a terminal and do as following

ls mean list -i flag mean inode, 1st get the inode number in this case its 2100250, next see what details are in inode..

Mm interesting.. this Inode: indicated what is the kernel index with page cache and this /etc/passwd file size 2883 bytes and user and group and modify and last acced date and other details as well. but our final goal is to see that page_hash() underneath work it self. Now you are going to install this

why because we need a tool called "fincore", we are going to see if this /etc/passwd file is actually in the page cache or not. kernel doesnt show directly whats in the page cache in single command. thats why we are going to use this tool.

it is indeed in the page cache and the PAGES says 1. it means how many pages that is currently is cached. its 1 page. Lets tell kernel to flush this cache(remove) and then hit cache and see it how it works,

let me explain what this command does, "sync" flushes(removes) the caches and "&&" also run "echo 3" mean free all 3 dentries, innode, page cache.

also

echo 1 means page cache free to /proc/sys/vm/drop_caches
echo 2 means dentries() and inodes free to /proc/sys/vm/drop_caches
  detries is pathname lookups like /home/Desktop/file.txt did
  and inodes we already disccused this a inode caches that is in page cache
echo 3 means page cache, dentries, inodes free to /proc/sys/vm/drop_caches

echo 1 means page cache free to /proc/sys/vm/drop_caches
echo 2 means dentries() and inodes free to /proc/sys/vm/drop_caches
  detries is pathname lookups like /home/Desktop/file.txt did
  and inodes we already disccused this a inode caches that is in page cache
echo 3 means page cache, dentries, inodes free to /proc/sys/vm/drop_caches

what tee does is its redirect the echo output to that /proc/sys/vm/drop_caches file that file is only can be write by sudo(super user in linux) and that why instead of redirecting using ">" symbol it has to use sudo no, so use sudo tee. if logged in as root its become simple as below

lets recheck whether caches is drop or not by using out tool fincore

as you can see the PAGES is set to 0 that means that /etc/passwd now isnt in the page cache, how to get back to page cache,

when /etc/passwd file get used, in background the kernel gets that file data and put the page offsets in the page cache again that PAGES become 1 means its now in page cache. NOTE: fincore only shows the page cache

if you want to see exact page size you can use the command below:

also to see how big the page cache in right now

Cached is right now size of your page cache in KB. this is actually a serveral GB. We can also look into this cache realtime jump when loading a file to page cache.

open a terminal 01 and first run this command

in above command "watch -n 1" does is repeats the command every 1 seconds and keep updating the output. "cat /proc/meminfo" this meminfo file is a live update file. every time you read it you get real time data of how RAM being used, finally "grep Cached" means filter the word "Cached" in that file. meaning run the cat /proc/meminfo and filter the word cached to only show the out of that section and watch run that cat /proc/meminfo every second.

above image is the what result it shows. the page cache size is 986352 KB when i run the watch command. Now open a terminal 02 and type

"dd" is a low level tool in linux that copies the data byte by byte. sudo means run dd as a super user. "if=/dev/sda" is the input file, its the raw disk. "of=/dev/null" is the output file. means write /dev/sda file to /dev/null. /dev/null is void, a black hole. you can put anything there nothing will be saved. "bs=4096" means read 4KB by 4KB of disk data. "2>/dev/null" we previously discussed what is > does it redirects the out put to what is inthe left. in this case if 2>/dev/null means if any erros occurs redirect to that black hole.

in above image you can see that page cache jumped from 986352 to 989156. its increased. means data is in the cache. yeah of course i only run this command couple of seconds. you can try this at your machine and see how it works.

Page cache is shared, that means any process can access the contents that are in the page cache, lets say there are 2 programs are running and both of them use a same library, and when that library loads into the page cache the library isn't get duplicated to use for both of these programs, instead it loads once to page cache and both of these programs use that library. Kernel make one copies for and multiple users (in linux there are users and you can check available users using "cat /etc/passwd"

in the terminal and it will return file that has lines like this above and the username is first string "root " or "daemon" etc) can use it.

For most of the files this system is fine, but files that are write only, files that marked as immutable. (you can check whether file is write or read only by "ls -la" i will talk about this in another topic) or files that don't have write permission to kernel is supposed to enforce this loaded pages that nobody can modify this. for now i will end this article part 01 and soon i will upload part 02 of this.

okay i think that's enough for this part.

Main reason i started this article was a i recently saw a vunerability CVE 2026–31431copyfail the way most people find things worth learning about. Someone mentioned it, I searched it. and three hours later I was more confused than when I started.

Not because the information wasn't out there. It was. But everything I found assumed I already knew the thing I was trying to learn. Kernel mailing list threads written for kernel developers. Exploit code with no explanation. Academic papers that somehow managed to make a privilege escalation bug feel boring.

Nobody had written the version for someone who just wanted to actually understand it.

So I started ro research it from scratch. And I kept going backwards until I hit ground, which turned out to be further back than I expected. Copy Fail doesnt appear out of nowhere. It comes from a family of bugs going back to 2017, each one a variation on the same idea, each one found after people thought the previous one was fixed.

To understand the latest one you have to understand why all of them exist. And to understand that you have to understand something about how Linux handles files in memory.

here is some pdf resouce i found about linux kernel. — https://repo.zenk-security.com/Linux%20et%20systemes%20d.exploitations/EN-Understanding%20The%20Linux%20Kernel%201.pdf

this book mostly older linux kernels. but i found out its has good foundation to moden kernels as well.

This is part one. We're not touching the CVE yet. Before any of that makes sense you need to understand the page cache, what it is, how the kernel uses it, and why it becomes dangerous when something goes wrong. That's what this covers.

See you in part 02.

socials — https://x.com/Dulanga_Ruksh4n

github.com — https://github.com/Dulanga-Rukshan

tryhackme.com — https://tryhackme.com/p/DulangaRuksh4n

Contents