A deep technical walkthrough of how I used Semgrep — a static analysis engine — to systematically hunt for vulnerabilities across the WordPress plugin ecosystem, resulting in assigned CVEs.

WordPress powers over 43% of the internet. Its plugin ecosystem — over 60,000 plugins on the official repository alone — is written almost entirely in PHP, often by developers who have never read OWASP. That combination makes it one of the most target-rich environments for vulnerability research.

This post is a full technical walkthrough of how I used Semgrep, an open-source static analysis engine, to systematically audit WordPress plugins at scale.

Why Semgrep for WordPress Research?

Before diving in, it's worth explaining why Semgrep — and not just grep, or a full SAST tool like SonarQube.

grep is fast but dumb. It pattern-matches strings. It can't understand that $_GET['id'] flowing into wpdb->query() via three intermediate variables is dangerous. You'll drown in false positives.

Full SAST tools are intelligent but heavy. They require project setup, language servers, often paid licenses. Running them across 300 plugins in an afternoon isn't practical.

Semgrep sits in the middle. It understands syntax trees, not just strings. It can match patterns like "any function call where this argument comes from user input" without requiring full data-flow analysis. It's fast, scriptable, and the rule language is human-readable YAML.

Most importantly: Semgrep has a PHP engine that understands WordPress idioms. And you can write custom rules in minutes.

The Setup

Installing Semgrep

pip install semgrep
semgrep --version
# semgrep 1.62.0

Building a Plugin Corpus

The WordPress SVN repository hosts every public plugin. I built a simple script to bulk-download plugins, targeting ones with:

  • Active installs > 1,000 (enough real-world exposure to matter)
  • Last updated within the past 2 years (reduces dead-code noise)
  • PHP as primary language
# Fetch plugin list from WordPress.org API
curl 'https://api.wordpress.org/plugins/info/1.2/?action=query_plugins&request[per_page]=100&request[page]=1' \
  | jq '.plugins[].slug' -r > plugins.txt
# Clone each plugin from SVN
while read slug; do
  svn checkout "https://plugins.svn.wordpress.org/$slug/trunk" "plugins/$slug" --quiet
done < plugins.txt

I ended up with a corpus of ~400 plugins totalling roughly 2.1GB of PHP.

Rule Category 1: SQL Injection

WordPress provides $wpdb->prepare() as the canonical way to safely construct queries. The vulnerability pattern is simple: user input flows into a query without going through prepare().

Rule: Direct $wpdb->query() with Interpolated Input

rules:
  - id: wpdb-query-user-input
    message: |
      Potential SQL injection: user-controlled input passed directly to $wpdb->query().
      Use $wpdb->prepare() to parameterize the query.
    severity: ERROR
    languages: [php]
    patterns:
      - pattern: $wpdb->query("..." . $USER_INPUT . "...")
      - pattern-either:
          - pattern: $USER_INPUT = $_GET[$KEY]
          - pattern: $USER_INPUT = $_POST[$KEY]
          - pattern: $USER_INPUT = $_REQUEST[$KEY]
          - pattern: $USER_INPUT = sanitize_text_field($_GET[$KEY])
          - pattern: $USER_INPUT = sanitize_text_field($_POST[$KEY])
    message: "SQL Injection via $wpdb->query() - missing prepare()"

Note the last two patterns — sanitize_text_field() does not protect against SQL injection. It strips HTML tags and extra whitespace. Developers routinely mistake it for a SQL sanitization function. This false sense of safety is responsible for dozens of CVEs.

Rule: $wpdb->get_results() with Unparameterized Input

rules:
  - id: wpdb-get-results-sqli
    message: "SQL Injection via $wpdb->get_results() without prepare()"
    severity: ERROR
    languages: [php]
    pattern: |
      $wpdb->get_results("SELECT ... " . $_GET[...])

I extended this with get_var, get_row, get_col, and query variants.

The ORDER BY Blind Spot

Prepared statements cannot parameterize ORDER BY column names or direction. This forces developers to handle it manually — and most don't:

// Common vulnerable pattern
$orderby = $_GET['orderby'];
$order   = $_GET['order'];
$results = $wpdb->get_results(
    "SELECT * FROM {$wpdb->prefix}my_table ORDER BY $orderby $order"
);
rules:
  - id: wpdb-orderby-injection
    message: "SQL Injection via ORDER BY clause with user input"
    severity: ERROR
    languages: [php]
    pattern: |
      $wpdb->get_results("... ORDER BY " . $...)

Rule Category 2: Cross-Site Scripting (XSS)

WordPress's output escaping functions are well-documented: esc_html(), esc_attr(), esc_url(), wp_kses(). The rule is simple — everything printed to the page must pass through one of these.

Rule: Reflected XSS via echo of Unescaped Input

rules:
  - id: reflected-xss-echo
    message: "Potential Reflected XSS: unescaped user input echoed to page"
    severity: ERROR
    languages: [php]
    patterns:
      - pattern: echo $_GET[$KEY]
      - pattern: echo $_POST[$KEY]
      - pattern: echo $_REQUEST[$KEY]
      - pattern: print($_GET[$KEY])
      - pattern: print($_POST[$KEY])

Rule: XSS via sanitize_text_field Bypass

This is critical: sanitize_text_field() is often used where esc_html() should be used. They solve different problems. sanitize_text_field is for storing data safely. esc_html is for displaying it. Using only the former and skipping the latter is a stored XSS:

rules:
  - id: xss-sanitize-not-escape
    message: |
      sanitize_text_field() used before output — this does NOT prevent XSS.
      Use esc_html() or esc_attr() when outputting to HTML.
    severity: WARNING
    languages: [php]
    pattern: echo sanitize_text_field($...)

Stored XSS via update_option / get_option

A common pattern: user input is stored via update_option() and later retrieved with get_option() and echoed without escaping:

rules:
  - id: stored-xss-get-option
    message: "Potential Stored XSS: get_option() result echoed without escaping"
    severity: WARNING
    languages: [php]
    pattern: echo get_option($KEY)

This has a higher false positive rate (some option values are developer-controlled), but it's a useful triage signal.

Rule Category 3: Broken Access Control

This is the vulnerability class I found frequently, and it's where Semgrep's syntax-aware matching really shines.

In WordPress, AJAX handlers are registered like this:

add_action('wp_ajax_my_action', 'my_action_callback');
add_action('wp_ajax_nopriv_my_action', 'my_action_callback'); // No auth required

The dangerous pattern is: a sensitive action (deleting users, exporting data, modifying settings) is registered on wp_ajax_nopriv_ — meaning unauthenticated users can trigger it — without any additional capability check inside the callback.

Rule: nopriv AJAX Action Without Capability Check

rules:
  - id: nopriv-ajax-no-cap-check
    message: |
      wp_ajax_nopriv_ action registered — callback may be accessible to
      unauthenticated users. Verify that sensitive operations require capability checks.
    severity: WARNING
    languages: [php]
    pattern: add_action('wp_ajax_nopriv_$ACTION', '$CALLBACK')

This is a triage rule. I then manually reviewed the $CALLBACK functions flagged.

Rule: Sensitive Operations Without current_user_can()

rules:
  - id: delete-without-capability-check
    message: "wp_delete_post() called without current_user_can() check nearby"
    severity: WARNING
    languages: [php]
    patterns:
      - pattern: wp_delete_post($POST_ID, ...)
      - pattern-not-inside: |
          if (current_user_can(...)) { ... }

The pattern-not-inside operator is one of Semgrep's most powerful features. It matches code that is NOT wrapped in a given context — in this case, a capability check. This directly encodes the "dangerous function called without guard" pattern.

I used this same structure for:

  • wp_delete_user()
  • delete_option()
  • update_option()
  • wp_insert_user()
  • wp_update_user()

Rule Category 4: CSRF (Missing Nonce Verification)

WordPress nonces (wp_nonce_field(), check_ajax_referer(), wp_verify_nonce()) are the CSRF protection mechanism. Forms and AJAX handlers that mutate state must verify a nonce.

Rule: AJAX Handler Without Nonce Check

rules:
  - id: ajax-missing-nonce
    message: "AJAX callback registered without nonce verification — potential CSRF"
    severity: WARNING
    languages: [php]
    patterns:
      - pattern: |
          function $FUNC() {
            ...
            $RESPONSE = ...;
            wp_send_json($RESPONSE);
          }
      - pattern-not-inside: |
          function $FUNC() {
            ...
            check_ajax_referer(...);
            ...
          }
      - pattern-not-inside: |
          function $FUNC() {
            ...
            wp_verify_nonce(...);
            ...
          }

Rule: Settings Form Without Nonce

rules:
  - id: settings-save-no-nonce
    message: "Settings saved via $_POST without nonce verification — CSRF risk"
    severity: WARNING
    languages: [php]
    patterns:
      - pattern: update_option($KEY, $_POST[$VAL])
      - pattern-not-inside: |
          if (wp_verify_nonce(...)) { ... }
      - pattern-not-inside: |
          check_admin_referer(...);
          ...
          update_option($KEY, $_POST[$VAL]);

Rule Category 5: PHP Object Injection

unserialize() on user-controlled data is a classic PHP vulnerability. If a suitable POP (Property Oriented Programming) gadget chain exists — either in the plugin itself or in a dependency — it can lead to remote code execution.

rules:
  - id: unserialize-user-input
    message: |
      unserialize() called with user-controlled input — potential PHP Object Injection.
      If a POP chain exists, this can lead to RCE.
    severity: ERROR
    languages: [php]
    pattern-either:
      - pattern: unserialize($_GET[$KEY])
      - pattern: unserialize($_POST[$KEY])
      - pattern: unserialize($_COOKIE[$KEY])
      - pattern: unserialize(base64_decode($_GET[$KEY]))
      - pattern: unserialize(base64_decode($_POST[$KEY]))

The base64_decode variants are important — developers often think base64-encoding the input adds security. It doesn't. Semgrep's nested function call matching catches this cleanly.

Rule Category 6: Arbitrary File Inclusion

rules:
  - id: dynamic-include-user-input
    message: "Dynamic file inclusion with user-controlled path — potential LFI/RFI"
    severity: ERROR
    languages: [php]
    pattern-either:
      - pattern: include($_GET[$KEY])
      - pattern: include($_POST[$KEY])
      - pattern: require($_GET[$KEY])
      - pattern: include(plugin_dir_path(__FILE__) . $_GET[$KEY])
      - pattern: require_once(plugin_dir_path(__FILE__) . $_GET[$KEY])

The last two patterns are particularly interesting. Developers believe that prefixing with plugin_dir_path() makes it safe because it anchors to the plugin directory. It doesn't — a ../ traversal sequence bypasses it completely.

Rule Category 7: Arbitrary File Upload

rules:
  - id: arbitrary-file-upload
    message: "File upload without extension/MIME validation — potential webshell upload"
    severity: ERROR
    languages: [php]
    patterns:
      - pattern: move_uploaded_file($_FILES[$KEY]['tmp_name'], $DEST)
      - pattern-not-inside: |
          $EXT = pathinfo(...);
          ...
          move_uploaded_file(...);
      - pattern-not-inside: |
          wp_check_filetype(...);
          ...
          move_uploaded_file(...);

Running the Rules at Scale

With the corpus downloaded and rules written, I ran full scans using a simple shell loop:

#!/bin/bash
mkdir -p results
for plugin_dir in plugins/*/; do
  plugin_name=$(basename "$plugin_dir")
  semgrep \
    --config ./rules/ \
    --json \
    --output "results/${plugin_name}.json" \
    "$plugin_dir" 2>/dev/null
done
echo "Scan complete."

Parsing Results

# Find all plugins with at least one ERROR severity finding
jq -r 'select(.results | length > 0) | .results[] | select(.extra.severity == "ERROR") | .path' \
  results/*.json | cut -d'/' -f1-2 | sort -u

Manual Triage: Turning Findings Into Confirmed Vulnerabilities

Semgrep finds potential vulnerabilities. Manual review confirms them. My triage process:

Step 1: Trace the Data Flow

For each flagged finding, I traced the full data flow manually:

  1. What is the source? ($_GET, $_POST, $_REQUEST, $_COOKIE)
  2. Does the input pass through any sanitization? If so, is it appropriate sanitization?
  3. What is the sink? ($wpdb->query(), echo, include, etc.)
  4. Is there any gate (auth check, nonce check, capability check) between source and sink?

Step 2: Build a Proof of Concept

For every confirmed vulnerability, I built a minimal PoC — typically a curl command or a small HTML page.

For SQL injection:

# Example: Blind SQLi via orderby parameter in a plugin AJAX handler
curl -s -X POST 'https://target.local/wp-admin/admin-ajax.php' \
  --data 'action=plugin_get_data&orderby=name,SLEEP(5)--&order=ASC&nonce=XXXX'

For CSRF:

<!-- csrf-poc.html: loads in attacker's page, triggers settings change on victim's WP -->
<form id="f" method="POST" action="https://victim.local/wp-admin/admin-ajax.php">
  <input name="action" value="plugin_save_settings">
  <input name="option_value" value="attacker-controlled">
</form>
<script>document.getElementById('f').submit();</script>

Step 3: Determine Exploitability and Impact

Not all confirmed vulnerabilities are equal. I scored each on:

  • Authentication required? Unauthenticated = higher severity
  • What's the impact? RCE > Data exfiltration > Account takeover > XSS > Info disclosure
  • How many installs? 100k installs > 1k installs in terms of real-world impact

Real Finding: CSRF to Settings Takeover

One of the more impactful findings: a plugin had an AJAX handler for saving plugin configuration that checked the user was an admin — but never verified a nonce.

// Vulnerable code (paraphrased)
add_action('wp_ajax_save_plugin_config', 'save_plugin_config_cb');
function save_plugin_config_cb() {
    if (!current_user_can('manage_options')) {
        wp_send_json_error('Unauthorized');
    }
    // No nonce check here.
    $api_key = sanitize_text_field($_POST['api_key']);
    update_option('plugin_api_key', $api_key);
    wp_send_json_success();
}

The capability check is correct. But without nonce verification, any website can forge a cross-site request that an authenticated admin's browser will execute. An attacker could:

  1. Host a malicious page that fires the AJAX request
  2. Trick an admin into visiting it (e.g., via email, LinkedIn message)
  3. The admin's browser submits the request with their valid session cookie
  4. The plugin's API key gets replaced with the attacker's
  5. Attacker receives all data the plugin was sending to the API

Real Finding: Unauthenticated SQL Injection

A nopriv AJAX action for fetching records. The orderby parameter was interpolated directly:

add_action('wp_ajax_nopriv_plugin_fetch_records', 'plugin_fetch_records_cb');
function plugin_fetch_records_cb() {
    global $wpdb;
    $orderby = sanitize_text_field($_POST['orderby']); // False safety
    $order   = sanitize_text_field($_POST['order']);
    $results = $wpdb->get_results(
        "SELECT * FROM {$wpdb->prefix}plugin_records "
        . "ORDER BY $orderby $order"
    );
    wp_send_json_success($results);
}

sanitize_text_field() doesn't strip SQL syntax. The ORDER BY clause can't be parameterized with $wpdb->prepare(). This is a textbook unauthenticated blind SQL injection.

Exploitation: Time-based blind via SLEEP(), confirmed in under 60 seconds:

curl -X POST 'https://target/wp-admin/admin-ajax.php' \
  --data 'action=plugin_fetch_records&orderby=id,SLEEP(5)--&order=ASC'
# Response delayed 5 seconds — confirmed.

From here, sqlmap with --technique=T would extract the entire database.

Optimizing Your Ruleset Over Time

After the first batch, I refined my approach:

Reduce false positives: Add pattern-not clauses for known-safe patterns. If a function always wraps output in esc_html() internally (like the_title()), exclude it.

Add metavariable filtering: Semgrep supports metavariable-regex to narrow matches:

- metavariable-regex:
    metavariable: $KEY
    regex: '(id|user_id|post_id|order_id)'

This focuses SQL injection rules on fields that are likely to be used in queries.

Chain rules with pattern-inside: If you want to only flag echo inside a specific function or callback context, use pattern-inside to scope matches.

Resources