June 13, 2026
Building a Web Reconnaissance & Exposure Scanner in Python: From a Simple Crawler to a Security…
Author : Nilanjan Chowdhury
Nilanjan Chowdhury
3 min read
A story about how a simple Python web crawler turned into my most ambitious cybersecurity project yet.
A few weeks ago, I was feeling pretty proud of myself.
I had just built a web crawler in Python.
It fetched a webpage, extracted links, and even flagged interesting-looking endpoints such as /admin and /login. To me, it felt like a real security tool.
I uploaded it to GitHub, admired my work for a few minutes, and thought:
"Nice. I've built a cybersecurity project."
Then I pointed it at a real website.
And reality hit me.
The crawler wasn't discovering much. It wasn't analyzing anything meaningful. It wasn't producing reports. It wasn't helping me understand the attack surface of a website.
In fact, after the excitement wore off, I realized something uncomfortable:
I hadn't built a security tool. I had built a link collector.
That realization became the starting point for one of the most rewarding projects I've worked on so far.
The Problem With Beginner Security Projects
When you're learning cybersecurity, it's easy to fall into a trap.
You build something.
It works.
You stop.
I had done exactly that.
The original version of my crawler was only around a few dozen lines of Python. It could:
- Request a webpage
- Extract links
- Search for a few keywords
- Print results to the terminal
Technically, it worked.
But if I was a security analyst trying to understand a target website, the tool wouldn't actually help me.
It couldn't answer questions like:
- Are there hidden paths?
- What does robots.txt reveal?
- Are security headers configured?
- Are there upload forms?
- Are email addresses exposed?
- Which pages look risky?
I wasn't doing reconnaissance.
I was doing bookkeeping.
The Rabbit Hole Begins
What started as a simple improvement quickly became a complete redesign.
Every time I added a feature, another weakness appeared.
I added endpoint detection.
Then I realized I needed URL normalization.
I added crawling.
Then I realized I needed depth limits.
I added depth limits.
Then I realized I needed reporting.
I added reporting.
Then I realized the entire codebase was becoming a mess.
One file became two.
Two became three.
Before I knew it, I had accidentally started building an actual software project.
The Moment It Started Feeling Real
The turning point came when I stopped thinking:
"How do I make the crawler better?"
and started asking:
"What information would I want if I were performing reconnaissance on a website?"
That question changed everything.
Instead of collecting links, I started collecting intelligence.
The scanner now looked for:
- Sensitive endpoints
- Login portals
- Upload functionality
- Exposed emails
- HTML comments
- Security headers
- robots.txt disclosures
Suddenly, the output wasn't just a list of URLs.
It was telling a story about the website itself.
Discovering Things Websites Accidentally Reveal
One of my favorite additions was robots.txt analysis.
Most people never look at robots.txt.
Search engines do.
Security researchers do.
Attackers definitely do.
When I began parsing robots.txt files automatically, I started finding paths that website owners had specifically asked search engines not to index.
Things like:
/admin/
/internal/
/backup/
/private//admin/
/internal/
/backup/
/private/These aren't vulnerabilities.
But they are clues.
And reconnaissance is all about collecting clues.
From Script to Project
At some point I opened my Python file and realized I hated looking at it.
Hundreds of lines.
Functions everywhere.
No structure.
No separation of responsibilities.
No scalability.
So I did something I had been avoiding:
I refactored the entire thing.
The project became:
Web-Recon-Exposure-Scanner/
│
├── scanner.py
├── crawler.py
├── analyzer.py
├── reporter.py
├── utils.pyWeb-Recon-Exposure-Scanner/
│
├── scanner.py
├── crawler.py
├── analyzer.py
├── reporter.py
├── utils.pyFor the first time, it felt less like a coding exercise and more like actual software.
That refactor probably taught me more than any individual feature I added.
The First Scan
The first time Version 1.1 completed a full scan and generated reports, I sat there staring at the terminal for a few seconds.
It wasn't Burp Suite.
It wasn't Nmap.
It wasn't OWASP ZAP.
But it was mine.
The scanner could:
✓ Crawl pages
✓ Analyze exposures
✓ Assess security headers
✓ Parse robots.txt
✓ Score findings
✓ Generate reports
And every single line had been built from scratch while learning along the way.
That feeling is hard to describe.
If you've ever built something yourself, you'll understand it immediately.
What This Project Actually Taught Me
The biggest lesson wasn't Python.
It wasn't web security.
It wasn't even reconnaissance.
The biggest lesson was this:
Most good projects aren't built in one brilliant moment.
They're built through dozens of small improvements, frustrating bugs, redesigns, and moments where you realize your original solution wasn't good enough.
Version 1.1 is not the final version.
Not even close.
But it's the first version I'm genuinely proud of.
What's Next?
Version 1.2 is already planned.
The next goal is to move beyond raw reports and build:
- HTML dashboards
- Visual risk summaries
- Better reporting
- More advanced exposure analysis
Eventually I want this project to include:
- Technology fingerprinting
- JavaScript analysis
- Secret detection
- Screenshot capture
- Interactive reporting
In other words:
The same journey continues.
Final Thoughts
When I started this project, I thought I was building a web crawler.
What I actually built was a lesson in software engineering, cybersecurity, and continuous improvement.
And honestly?
That's worth far more than the code itself.
Connect with me at : Linkedin: https://www.linkedin.com/in/nilanjan-chowdhury-a36787359/ GitHub: https://github.com/CalculusGuy Medium: https://medium.com/@nilanjan.calculus My Website: https://calculusguy.github.io/nilanjanchowdhury.github.io/
#Cybersecurity #Python #Web Security #Programming #SoftwareDevelopment