A deep technical walkthrough of how I used Semgrep — a static analysis engine — to systematically hunt for vulnerabilities across the WordPress plugin ecosystem, resulting in assigned CVEs.
WordPress powers over 43% of the internet. Its plugin ecosystem — over 60,000 plugins on the official repository alone — is written almost entirely in PHP, often by developers who have never read OWASP. That combination makes it one of the most target-rich environments for vulnerability research.
This post is a full technical walkthrough of how I used Semgrep, an open-source static analysis engine, to systematically audit WordPress plugins at scale.
Why Semgrep for WordPress Research?
Before diving in, it's worth explaining why Semgrep — and not just grep, or a full SAST tool like SonarQube.
grep is fast but dumb. It pattern-matches strings. It can't understand that $_GET['id'] flowing into wpdb->query() via three intermediate variables is dangerous. You'll drown in false positives.
Full SAST tools are intelligent but heavy. They require project setup, language servers, often paid licenses. Running them across 300 plugins in an afternoon isn't practical.
Semgrep sits in the middle. It understands syntax trees, not just strings. It can match patterns like "any function call where this argument comes from user input" without requiring full data-flow analysis. It's fast, scriptable, and the rule language is human-readable YAML.
Most importantly: Semgrep has a PHP engine that understands WordPress idioms. And you can write custom rules in minutes.
The Setup
Installing Semgrep
pip install semgrep
semgrep --version
# semgrep 1.62.0Building a Plugin Corpus
The WordPress SVN repository hosts every public plugin. I built a simple script to bulk-download plugins, targeting ones with:
- Active installs > 1,000 (enough real-world exposure to matter)
- Last updated within the past 2 years (reduces dead-code noise)
- PHP as primary language
# Fetch plugin list from WordPress.org API
curl 'https://api.wordpress.org/plugins/info/1.2/?action=query_plugins&request[per_page]=100&request[page]=1' \
| jq '.plugins[].slug' -r > plugins.txt
# Clone each plugin from SVN
while read slug; do
svn checkout "https://plugins.svn.wordpress.org/$slug/trunk" "plugins/$slug" --quiet
done < plugins.txtI ended up with a corpus of ~400 plugins totalling roughly 2.1GB of PHP.
Rule Category 1: SQL Injection
WordPress provides $wpdb->prepare() as the canonical way to safely construct queries. The vulnerability pattern is simple: user input flows into a query without going through prepare().
Rule: Direct $wpdb->query() with Interpolated Input
rules:
- id: wpdb-query-user-input
message: |
Potential SQL injection: user-controlled input passed directly to $wpdb->query().
Use $wpdb->prepare() to parameterize the query.
severity: ERROR
languages: [php]
patterns:
- pattern: $wpdb->query("..." . $USER_INPUT . "...")
- pattern-either:
- pattern: $USER_INPUT = $_GET[$KEY]
- pattern: $USER_INPUT = $_POST[$KEY]
- pattern: $USER_INPUT = $_REQUEST[$KEY]
- pattern: $USER_INPUT = sanitize_text_field($_GET[$KEY])
- pattern: $USER_INPUT = sanitize_text_field($_POST[$KEY])
message: "SQL Injection via $wpdb->query() - missing prepare()"Note the last two patterns — sanitize_text_field() does not protect against SQL injection. It strips HTML tags and extra whitespace. Developers routinely mistake it for a SQL sanitization function. This false sense of safety is responsible for dozens of CVEs.
Rule: $wpdb->get_results() with Unparameterized Input
rules:
- id: wpdb-get-results-sqli
message: "SQL Injection via $wpdb->get_results() without prepare()"
severity: ERROR
languages: [php]
pattern: |
$wpdb->get_results("SELECT ... " . $_GET[...])I extended this with get_var, get_row, get_col, and query variants.
The ORDER BY Blind Spot
Prepared statements cannot parameterize ORDER BY column names or direction. This forces developers to handle it manually — and most don't:
// Common vulnerable pattern
$orderby = $_GET['orderby'];
$order = $_GET['order'];
$results = $wpdb->get_results(
"SELECT * FROM {$wpdb->prefix}my_table ORDER BY $orderby $order"
);
rules:
- id: wpdb-orderby-injection
message: "SQL Injection via ORDER BY clause with user input"
severity: ERROR
languages: [php]
pattern: |
$wpdb->get_results("... ORDER BY " . $...)Rule Category 2: Cross-Site Scripting (XSS)
WordPress's output escaping functions are well-documented: esc_html(), esc_attr(), esc_url(), wp_kses(). The rule is simple — everything printed to the page must pass through one of these.
Rule: Reflected XSS via echo of Unescaped Input
rules:
- id: reflected-xss-echo
message: "Potential Reflected XSS: unescaped user input echoed to page"
severity: ERROR
languages: [php]
patterns:
- pattern: echo $_GET[$KEY]
- pattern: echo $_POST[$KEY]
- pattern: echo $_REQUEST[$KEY]
- pattern: print($_GET[$KEY])
- pattern: print($_POST[$KEY])Rule: XSS via sanitize_text_field Bypass
This is critical: sanitize_text_field() is often used where esc_html() should be used. They solve different problems. sanitize_text_field is for storing data safely. esc_html is for displaying it. Using only the former and skipping the latter is a stored XSS:
rules:
- id: xss-sanitize-not-escape
message: |
sanitize_text_field() used before output — this does NOT prevent XSS.
Use esc_html() or esc_attr() when outputting to HTML.
severity: WARNING
languages: [php]
pattern: echo sanitize_text_field($...)Stored XSS via update_option / get_option
A common pattern: user input is stored via update_option() and later retrieved with get_option() and echoed without escaping:
rules:
- id: stored-xss-get-option
message: "Potential Stored XSS: get_option() result echoed without escaping"
severity: WARNING
languages: [php]
pattern: echo get_option($KEY)This has a higher false positive rate (some option values are developer-controlled), but it's a useful triage signal.
Rule Category 3: Broken Access Control
This is the vulnerability class I found frequently, and it's where Semgrep's syntax-aware matching really shines.
In WordPress, AJAX handlers are registered like this:
add_action('wp_ajax_my_action', 'my_action_callback');
add_action('wp_ajax_nopriv_my_action', 'my_action_callback'); // No auth requiredThe dangerous pattern is: a sensitive action (deleting users, exporting data, modifying settings) is registered on wp_ajax_nopriv_ — meaning unauthenticated users can trigger it — without any additional capability check inside the callback.
Rule: nopriv AJAX Action Without Capability Check
rules:
- id: nopriv-ajax-no-cap-check
message: |
wp_ajax_nopriv_ action registered — callback may be accessible to
unauthenticated users. Verify that sensitive operations require capability checks.
severity: WARNING
languages: [php]
pattern: add_action('wp_ajax_nopriv_$ACTION', '$CALLBACK')This is a triage rule. I then manually reviewed the $CALLBACK functions flagged.
Rule: Sensitive Operations Without current_user_can()
rules:
- id: delete-without-capability-check
message: "wp_delete_post() called without current_user_can() check nearby"
severity: WARNING
languages: [php]
patterns:
- pattern: wp_delete_post($POST_ID, ...)
- pattern-not-inside: |
if (current_user_can(...)) { ... }The pattern-not-inside operator is one of Semgrep's most powerful features. It matches code that is NOT wrapped in a given context — in this case, a capability check. This directly encodes the "dangerous function called without guard" pattern.
I used this same structure for:
wp_delete_user()delete_option()update_option()wp_insert_user()wp_update_user()
Rule Category 4: CSRF (Missing Nonce Verification)
WordPress nonces (wp_nonce_field(), check_ajax_referer(), wp_verify_nonce()) are the CSRF protection mechanism. Forms and AJAX handlers that mutate state must verify a nonce.
Rule: AJAX Handler Without Nonce Check
rules:
- id: ajax-missing-nonce
message: "AJAX callback registered without nonce verification — potential CSRF"
severity: WARNING
languages: [php]
patterns:
- pattern: |
function $FUNC() {
...
$RESPONSE = ...;
wp_send_json($RESPONSE);
}
- pattern-not-inside: |
function $FUNC() {
...
check_ajax_referer(...);
...
}
- pattern-not-inside: |
function $FUNC() {
...
wp_verify_nonce(...);
...
}Rule: Settings Form Without Nonce
rules:
- id: settings-save-no-nonce
message: "Settings saved via $_POST without nonce verification — CSRF risk"
severity: WARNING
languages: [php]
patterns:
- pattern: update_option($KEY, $_POST[$VAL])
- pattern-not-inside: |
if (wp_verify_nonce(...)) { ... }
- pattern-not-inside: |
check_admin_referer(...);
...
update_option($KEY, $_POST[$VAL]);Rule Category 5: PHP Object Injection
unserialize() on user-controlled data is a classic PHP vulnerability. If a suitable POP (Property Oriented Programming) gadget chain exists — either in the plugin itself or in a dependency — it can lead to remote code execution.
rules:
- id: unserialize-user-input
message: |
unserialize() called with user-controlled input — potential PHP Object Injection.
If a POP chain exists, this can lead to RCE.
severity: ERROR
languages: [php]
pattern-either:
- pattern: unserialize($_GET[$KEY])
- pattern: unserialize($_POST[$KEY])
- pattern: unserialize($_COOKIE[$KEY])
- pattern: unserialize(base64_decode($_GET[$KEY]))
- pattern: unserialize(base64_decode($_POST[$KEY]))The base64_decode variants are important — developers often think base64-encoding the input adds security. It doesn't. Semgrep's nested function call matching catches this cleanly.
Rule Category 6: Arbitrary File Inclusion
rules:
- id: dynamic-include-user-input
message: "Dynamic file inclusion with user-controlled path — potential LFI/RFI"
severity: ERROR
languages: [php]
pattern-either:
- pattern: include($_GET[$KEY])
- pattern: include($_POST[$KEY])
- pattern: require($_GET[$KEY])
- pattern: include(plugin_dir_path(__FILE__) . $_GET[$KEY])
- pattern: require_once(plugin_dir_path(__FILE__) . $_GET[$KEY])The last two patterns are particularly interesting. Developers believe that prefixing with plugin_dir_path() makes it safe because it anchors to the plugin directory. It doesn't — a ../ traversal sequence bypasses it completely.
Rule Category 7: Arbitrary File Upload
rules:
- id: arbitrary-file-upload
message: "File upload without extension/MIME validation — potential webshell upload"
severity: ERROR
languages: [php]
patterns:
- pattern: move_uploaded_file($_FILES[$KEY]['tmp_name'], $DEST)
- pattern-not-inside: |
$EXT = pathinfo(...);
...
move_uploaded_file(...);
- pattern-not-inside: |
wp_check_filetype(...);
...
move_uploaded_file(...);Running the Rules at Scale
With the corpus downloaded and rules written, I ran full scans using a simple shell loop:
#!/bin/bash
mkdir -p results
for plugin_dir in plugins/*/; do
plugin_name=$(basename "$plugin_dir")
semgrep \
--config ./rules/ \
--json \
--output "results/${plugin_name}.json" \
"$plugin_dir" 2>/dev/null
done
echo "Scan complete."Parsing Results
# Find all plugins with at least one ERROR severity finding
jq -r 'select(.results | length > 0) | .results[] | select(.extra.severity == "ERROR") | .path' \
results/*.json | cut -d'/' -f1-2 | sort -uManual Triage: Turning Findings Into Confirmed Vulnerabilities
Semgrep finds potential vulnerabilities. Manual review confirms them. My triage process:
Step 1: Trace the Data Flow
For each flagged finding, I traced the full data flow manually:
- What is the source? (
$_GET,$_POST,$_REQUEST,$_COOKIE) - Does the input pass through any sanitization? If so, is it appropriate sanitization?
- What is the sink? (
$wpdb->query(),echo,include, etc.) - Is there any gate (auth check, nonce check, capability check) between source and sink?
Step 2: Build a Proof of Concept
For every confirmed vulnerability, I built a minimal PoC — typically a curl command or a small HTML page.
For SQL injection:
# Example: Blind SQLi via orderby parameter in a plugin AJAX handler
curl -s -X POST 'https://target.local/wp-admin/admin-ajax.php' \
--data 'action=plugin_get_data&orderby=name,SLEEP(5)--&order=ASC&nonce=XXXX'For CSRF:
<!-- csrf-poc.html: loads in attacker's page, triggers settings change on victim's WP -->
<form id="f" method="POST" action="https://victim.local/wp-admin/admin-ajax.php">
<input name="action" value="plugin_save_settings">
<input name="option_value" value="attacker-controlled">
</form>
<script>document.getElementById('f').submit();</script>Step 3: Determine Exploitability and Impact
Not all confirmed vulnerabilities are equal. I scored each on:
- Authentication required? Unauthenticated = higher severity
- What's the impact? RCE > Data exfiltration > Account takeover > XSS > Info disclosure
- How many installs? 100k installs > 1k installs in terms of real-world impact
Real Finding: CSRF to Settings Takeover
One of the more impactful findings: a plugin had an AJAX handler for saving plugin configuration that checked the user was an admin — but never verified a nonce.
// Vulnerable code (paraphrased)
add_action('wp_ajax_save_plugin_config', 'save_plugin_config_cb');
function save_plugin_config_cb() {
if (!current_user_can('manage_options')) {
wp_send_json_error('Unauthorized');
}
// No nonce check here.
$api_key = sanitize_text_field($_POST['api_key']);
update_option('plugin_api_key', $api_key);
wp_send_json_success();
}The capability check is correct. But without nonce verification, any website can forge a cross-site request that an authenticated admin's browser will execute. An attacker could:
- Host a malicious page that fires the AJAX request
- Trick an admin into visiting it (e.g., via email, LinkedIn message)
- The admin's browser submits the request with their valid session cookie
- The plugin's API key gets replaced with the attacker's
- Attacker receives all data the plugin was sending to the API
Real Finding: Unauthenticated SQL Injection
A nopriv AJAX action for fetching records. The orderby parameter was interpolated directly:
add_action('wp_ajax_nopriv_plugin_fetch_records', 'plugin_fetch_records_cb');
function plugin_fetch_records_cb() {
global $wpdb;
$orderby = sanitize_text_field($_POST['orderby']); // False safety
$order = sanitize_text_field($_POST['order']);
$results = $wpdb->get_results(
"SELECT * FROM {$wpdb->prefix}plugin_records "
. "ORDER BY $orderby $order"
);
wp_send_json_success($results);
}sanitize_text_field() doesn't strip SQL syntax. The ORDER BY clause can't be parameterized with $wpdb->prepare(). This is a textbook unauthenticated blind SQL injection.
Exploitation: Time-based blind via SLEEP(), confirmed in under 60 seconds:
curl -X POST 'https://target/wp-admin/admin-ajax.php' \
--data 'action=plugin_fetch_records&orderby=id,SLEEP(5)--&order=ASC'
# Response delayed 5 seconds — confirmed.From here, sqlmap with --technique=T would extract the entire database.
Optimizing Your Ruleset Over Time
After the first batch, I refined my approach:
Reduce false positives: Add pattern-not clauses for known-safe patterns. If a function always wraps output in esc_html() internally (like the_title()), exclude it.
Add metavariable filtering: Semgrep supports metavariable-regex to narrow matches:
- metavariable-regex:
metavariable: $KEY
regex: '(id|user_id|post_id|order_id)'This focuses SQL injection rules on fields that are likely to be used in queries.
Chain rules with pattern-inside: If you want to only flag echo inside a specific function or callback context, use pattern-inside to scope matches.