🕷️ Building Arachne: A Black-Box Web Security Scanner (From Scratch)

At some point, I got tired of how fragmented web security testing felt.

Oderinde Toluwanimi

~3 min read · May 7, 2026 (Updated: May 7, 2026) · Free: Yes

One tool for recon. Another for misconfigurations. Another for access control testing. And then manually stitching everything together into something that actually resembles a report.

So I built Arachne — a black-box web security scanner that automates the entire discovery → probing → analysis → reporting pipeline.

Not a noisy scanner. Not an exploit framework.

A safe, structured, and extensible security analysis tool.

The Idea

I wanted something that:

Works purely from the outside (black-box)
Performs safe, non-destructive probing
Chains multiple techniques together
Produces clean, client-ready reports
Can evolve into something smarter (👀 RL-based probing)

That's how Arachne was born.

What Arachne Actually Does

At a high level, Arachne runs a pipeline like this:

Recon → Endpoint Discovery → Probing → Analysis → Reporting

1. Reconnaissance

It starts by fingerprinting the target:

Server headers
Security headers (CSP, HSTS, etc.)
CORS policies
Exposed paths

This gives a quick understanding of the attack surface.

2. Endpoint Discovery (SPA-aware)

Modern apps hide APIs inside JavaScript bundles.

So instead of relying only on crawling, Arachne:

Parses SPA JS files
Extracts likely API endpoints
Builds an inventory of attack surface

This alone uncovers way more than traditional crawling.

3. Safe Probing

Then it probes endpoints without breaking anything:

OPTIONS requests
Query fuzzing
Status behavior checks
Error triggering (controlled)

Everything is:

Rate-limited
Non-destructive
Designed for real-world environments

4. Vulnerability Signals

Arachne doesn't scream "YOU'RE HACKED."

It looks for signals:

🔹 Misconfigurations

Missing CSP
Missing HSTS
Wildcard CORS
Referrer policy issues

🔹 Access Control Signals

ID-based response differences
Potential IDOR patterns

🔹 Error Handling Issues

Stack trace disclosures
Verbose error responses

Example finding:

Stack trace disclosure in error response
/rest/country-mapping
Confidence: 0.80

5. Triage System

Instead of dumping noise, Arachne:

Filters useful signals
Scores findings
Produces structured outputs

So you're not drowning in useless logs.

6. RL-Guided Scanning (Experimental)

This is where things get interesting.

Arachne includes a reinforcement learning module that:

Learns which inputs trigger interesting responses
Guides mutation-based probing
Identifies anomalies

Still experimental, but promising.

7. Reporting (The Real Goal)

At the end of everything, Arachne generates:

JSON reports
Markdown reports
HTML reports (client-ready)

Example summary:

Endpoints discovered: 49
Findings: 89
Misconfigurations: 23
Top issue: Stack trace disclosure

Why I Built This

Honestly?

I was busy.

Between work and everything else, I didn't have time to:

Run 5 different tools
Manually correlate outputs
Clean up reports

So instead of slowing down, I built something that works with me.

Arachne is basically:

"What if recon, scanning, and reporting actually felt like one tool?"

Challenges I Faced

1. Data Normalization

Every module outputs different structures.

Fix:

Standardized .to_dict() across modules
Built a merge pipeline

2. Report Merging

Combining outputs from:

recon
triage
misconfig
RL
access control

…without breaking structure was painful.

3. Serialization Issues

At one point:

TypeError: Object of type MergeStats is not JSON serializable

Fix:

Converted custom objects → dicts before dumping

4. Signal vs Noise

The hardest problem in security tooling:

What actually matters?

Solution:

Confidence scoring
Triage filtering
Summary views

What Makes Arachne Different

Most tools are:

Either too noisy
Or too shallow

Arachne tries to balance:

✔ Safe probing ✔ Structured analysis ✔ Real-world usability ✔ Clean reporting

How It's Structured

arachne/
├── crawler/
├── scanner/
├── reporting/
├── rl/
├── pipeline/
├── cli.py

Each module is isolated but contributes to the pipeline.

Example Workflow

# 1. Discover endpoints
arachne seed-spa http://target --out spa_inventory.json
# 2. Probe endpoints
arachne probe-inventory http://target --inventory spa_inventory.json
# 3. Scan for misconfigs
arachne scan-misconfig http://target --inventory spa_inventory.json
# 4. Access control analysis
arachne scan-ac http://target
arachne verify-ac http://target
# 5. Merge everything
arachne merge-report --out merged.json ...
# 6. Generate HTML report
arachne report-html merged.json

Lessons Learned

Building tools > just using tools
Reporting matters as much as detection
"Safe" scanning is harder than aggressive scanning
Clean architecture saves you later

What's Next

Smarter RL probing
Better correlation between findings
Plugin system
More real-world test cases

Final Thoughts

Arachne started as:

"I need something better than this workflow."

Now it's a full pipeline.

Still evolving. Still improving.

But already something I'd actually use in a real engagement.

Built with curiosity, necessity, and a bit of controlled chaos.

#cybersecurity #ethical-hacking #exploit-development #penetration-testing #programming

< Go to the original