One tool for recon. Another for misconfigurations. Another for access control testing. And then manually stitching everything together into something that actually resembles a report.

So I built Arachne β€” a black-box web security scanner that automates the entire discovery β†’ probing β†’ analysis β†’ reporting pipeline.

Not a noisy scanner. Not an exploit framework.

A safe, structured, and extensible security analysis tool.

The Idea

I wanted something that:

  • Works purely from the outside (black-box)
  • Performs safe, non-destructive probing
  • Chains multiple techniques together
  • Produces clean, client-ready reports
  • Can evolve into something smarter (πŸ‘€ RL-based probing)

That's how Arachne was born.

What Arachne Actually Does

At a high level, Arachne runs a pipeline like this:

Recon β†’ Endpoint Discovery β†’ Probing β†’ Analysis β†’ Reporting

1. Reconnaissance

It starts by fingerprinting the target:

  • Server headers
  • Security headers (CSP, HSTS, etc.)
  • CORS policies
  • Exposed paths

This gives a quick understanding of the attack surface.

2. Endpoint Discovery (SPA-aware)

Modern apps hide APIs inside JavaScript bundles.

So instead of relying only on crawling, Arachne:

  • Parses SPA JS files
  • Extracts likely API endpoints
  • Builds an inventory of attack surface

This alone uncovers way more than traditional crawling.

3. Safe Probing

Then it probes endpoints without breaking anything:

  • OPTIONS requests
  • Query fuzzing
  • Status behavior checks
  • Error triggering (controlled)

Everything is:

  • Rate-limited
  • Non-destructive
  • Designed for real-world environments

4. Vulnerability Signals

Arachne doesn't scream "YOU'RE HACKED."

It looks for signals:

πŸ”Ή Misconfigurations

  • Missing CSP
  • Missing HSTS
  • Wildcard CORS
  • Referrer policy issues

πŸ”Ή Access Control Signals

  • ID-based response differences
  • Potential IDOR patterns

πŸ”Ή Error Handling Issues

  • Stack trace disclosures
  • Verbose error responses

Example finding:

Stack trace disclosure in error response
/rest/country-mapping
Confidence: 0.80

5. Triage System

Instead of dumping noise, Arachne:

  • Filters useful signals
  • Scores findings
  • Produces structured outputs

So you're not drowning in useless logs.

6. RL-Guided Scanning (Experimental)

This is where things get interesting.

Arachne includes a reinforcement learning module that:

  • Learns which inputs trigger interesting responses
  • Guides mutation-based probing
  • Identifies anomalies

Still experimental, but promising.

7. Reporting (The Real Goal)

At the end of everything, Arachne generates:

  • JSON reports
  • Markdown reports
  • HTML reports (client-ready)

Example summary:

  • Endpoints discovered: 49
  • Findings: 89
  • Misconfigurations: 23
  • Top issue: Stack trace disclosure

Why I Built This

Honestly?

I was busy.

Between work and everything else, I didn't have time to:

  • Run 5 different tools
  • Manually correlate outputs
  • Clean up reports

So instead of slowing down, I built something that works with me.

Arachne is basically:

"What if recon, scanning, and reporting actually felt like one tool?"

Challenges I Faced

1. Data Normalization

Every module outputs different structures.

Fix:

  • Standardized .to_dict() across modules
  • Built a merge pipeline

2. Report Merging

Combining outputs from:

  • recon
  • triage
  • misconfig
  • RL
  • access control

…without breaking structure was painful.

3. Serialization Issues

At one point:

TypeError: Object of type MergeStats is not JSON serializable

Fix:

  • Converted custom objects β†’ dicts before dumping

4. Signal vs Noise

The hardest problem in security tooling:

What actually matters?

Solution:

  • Confidence scoring
  • Triage filtering
  • Summary views

What Makes Arachne Different

Most tools are:

  • Either too noisy
  • Or too shallow

Arachne tries to balance:

βœ” Safe probing βœ” Structured analysis βœ” Real-world usability βœ” Clean reporting

How It's Structured

arachne/
β”œβ”€β”€ crawler/
β”œβ”€β”€ scanner/
β”œβ”€β”€ reporting/
β”œβ”€β”€ rl/
β”œβ”€β”€ pipeline/
β”œβ”€β”€ cli.py

Each module is isolated but contributes to the pipeline.

Example Workflow

# 1. Discover endpoints
arachne seed-spa http://target --out spa_inventory.json
# 2. Probe endpoints
arachne probe-inventory http://target --inventory spa_inventory.json
# 3. Scan for misconfigs
arachne scan-misconfig http://target --inventory spa_inventory.json
# 4. Access control analysis
arachne scan-ac http://target
arachne verify-ac http://target
# 5. Merge everything
arachne merge-report --out merged.json ...
# 6. Generate HTML report
arachne report-html merged.json

Lessons Learned

  • Building tools > just using tools
  • Reporting matters as much as detection
  • "Safe" scanning is harder than aggressive scanning
  • Clean architecture saves you later

What's Next

  • Smarter RL probing
  • Better correlation between findings
  • Plugin system
  • More real-world test cases

Final Thoughts

Arachne started as:

"I need something better than this workflow."

Now it's a full pipeline.

Still evolving. Still improving.

But already something I'd actually use in a real engagement.

Built with curiosity, necessity, and a bit of controlled chaos.