Day 5: Footprinting & Reconnaissance -How Attackers Know Everything Before They Touch Anything

Day 4 was building the lab — getting machines to talk to each other, watching packets move in Wireshark.

Day 5 is where actual hacking begins. Not with an exploit. Not with a tool. With information.

Before any attacker sends a single malicious packet, they already know what you're running, where you're exposed, who works for you, and sometimes — what your passwords might be. That process is called footprinting and reconnaissance. It's the most underrated phase of penetration testing, and today I went deep on it.

What Is Footprinting & Reconnaissance?

Footprinting is the systematic process of collecting every piece of information about a target before attempting any attack. Think of it as building a complete profile of your target — without them knowing you exist.

What you're collecting:

Personal details — names, email addresses, phone numbers, job titles of employees
Company details — org structure, locations, business relationships, partners
System information — what servers they run, what software versions, what OS, open ports
Entities belonging to the target — subsidiaries, domains, IP ranges, third-party vendors
Technology stack — web servers, databases, frameworks, CMS platforms

The goal: by the time you start active exploitation, you're not guessing. You already know where the doors are, which ones have weak locks, and who has the keys.

Two Types of Reconnaissance

Reconnaissance
     |
     ├── Passive
     └── Active

Reconnaissance
     |
     ├── Passive
     └── Active

Passive Reconnaissance

Gathering information without directly interacting with the target.

You never touch their systems. You use publicly available sources — search engines, social media, public databases, DNS records. The target has zero visibility into what you're doing.

Examples:

Searching their company on Google
Looking up their employees on LinkedIn
Checking public DNS records
Using Shodan to find their internet-facing devices

Why it matters: Zero risk of detection. No logs on their systems. You can spend weeks here and they'll never know.

Active Reconnaissance

Gathering information by directly interacting with the target's systems.

You're sending packets to their servers. You're scanning their network. There's now a chance — however small — that they log it or detect it.

Examples:

Nmap port scanning
Banner grabbing
DNS zone transfer attempts
Traceroute to map their network path

The key difference: Passive = no contact. Active = you're touching them. In a real engagement, you always exhaust passive recon before going active.

Why Reconnaissance Matters — The Real Use Cases

This isn't just academic. Here's why this phase is the foundation of everything:

Information gathering — A complete target profile means fewer surprises during exploitation. You know what's there before you try to break it.
Time saving — Targeted attacks are faster than blind ones. You skip the noise and go straight to what's vulnerable.
Easy processing — Structured information upfront makes the entire engagement organized. You're not chasing random leads mid-attack.
Accurate attacking — You attack real vulnerabilities on real systems with real version numbers, not guesses. The difference between "port 80 is open" and "Apache 2.2.14 is running on port 80 with this specific known CVE" is the difference between hours of fumbling and a clean, targeted exploit.

How to Do Footprinting — The Methods

1. Footprinting Through Search Engines

Search engines index far more about a target than they realize. The tools:

Google — the obvious one, but most people use it wrong (more on dorking below)

Shodan — this is not a normal search engine. Shodan crawls internet-connected devices and indexes their banners, open ports, and service versions. Today I ran a search for "microsoft" on Shodan and got a report showing:

11,292,944 total internet-facing devices associated with Microsoft
Top exposed ports: 80 (1.8M devices), 443 (1.4M), 5985 (846K), 135, 5357
Known vulnerabilities across those devices: HTTP.sys Remote Code Execution (44K devices), HTTP.sys DoS (42K), EternalBlue (917 devices — yes, the NSA exploit from 2017 is still running on nearly 1,000 Microsoft-associated internet-facing systems)
Operating systems breakdown: Windows Server 2012 R2 (337K), Windows Server 2022 (270K), Windows Server 2019 (174K)

This is passive recon. I didn't touch a single Microsoft server. Shodan did the scanning; I just read the report.

What this means for a real attacker: Before touching a target, they already know what ports are open, what software versions are exposed, and which specific CVEs apply. Shodan doesn't just find devices — it maps the attack surface.

Censys — similar to Shodan, indexes internet-connected infrastructure with a focus on certificates and protocol-level data. Good for finding subdomains and SSL certificate information.

DuckDuckGo — useful specifically because it's not traceable. No search history, no personalization, no logging. If you want to research a target without your searches being tied back to you — DDG.

Not Evil — a Tor-based search engine. Accessed only through the Tor Browser. Complete anonymity. Used when you don't want even your ISP knowing what you're researching.

Yahoo — still indexes content Google sometimes misses. Worth cross-referencing.

2. Footprinting Using Advanced Google Dorking

Google dorking (also called Google hacking) is using advanced search operators to find information that's technically public but not meant to be easily found.

High-value dork combinations:

site:target.com filetype:xls — finds exposed spreadsheets (often contain internal data)
intitle:"index of" site:target.com — finds open directory listings
site:target.com inurl:admin — finds admin panels
filetype:sql site:target.com — finds exposed database dumps
intext:"password" filetype:txt site:target.com — finds text files with credentials

The Google Hacking Database (GHDB) on Exploit-DB is a searchable collection of thousands of proven dork queries categorized by what they expose.

Today I used the indemof: / intitle:"index of" technique on wscubetech.com and found their XML sitemap

which maps out the entire site structure including categories, courses, tutorials, programs, quizzes, blog index, and more. That's their full content architecture, publicly accessible, no authentication required. For a real attacker, this is the blueprint of the site before writing a single line of exploit code.

Social media is one of the richest passive recon sources that most people underestimate. What you can extract:

LinkedIn:

Employee names, job titles, departments
Technology stack (engineers often list the tools they use)
Organizational hierarchy — who reports to whom
Recent hires = recently deployed technologies
Job postings = what systems they're using and expanding ("looking for AWS engineer with experience in Kubernetes" tells you their infrastructure)

Today's example: WsCube Tech's LinkedIn showed 607 employees, company structure (E-Learning, Jodhpur/Rajasthan HQ), affiliated pages (WsCube Tech Jodhpur branch). For a social engineering attack, this is your org chart.

Facebook:

Company contact info — phone numbers, email addresses often publicly listed
Physical addresses
Working hours, events, announcements
Employee personal pages linked to company page

Today's example: WsCube Tech's Facebook page had their phone number (092696 98122), email (info@wscubetech.com), and website in plain view — no login required. That's their support/contact infrastructure handed to you for free.

Instagram, Twitter — employee activity, company culture, sometimes infrastructure details leak through screenshots in posts (server names, internal tool names, etc.)

What to look for across all platforms:

Email format (firstname.lastname@company.com? first.l@company.com?) — once you know the format, you can guess any employee's email
Tech stack mentions in job posts and employee bios
Office locations and physical security details
Key personnel — who has admin access? Who's the IT manager?
Recent news — acquisitions, mergers, new product launches = new attack surface

The Shodan Deep Dive — What That Report Actually Tells You

Looking back at the Microsoft Shodan report from today:

Ports section:

Port 5985 (846K devices) = WinRM (Windows Remote Management). This is remote administration. Open on nearly a million devices.
Port 5357 (435K devices) = Web Services for Devices. Should rarely be internet-facing.

Vulnerabilities section:

EternalBlue still active on 917 devices. EternalBlue is the NSA exploit that powered WannaCry in 2017. Nine years later, still running.
CVE-2017–7269 (2,370 devices) = IIS 6.0 buffer overflow. IIS 6.0. That shipped with Windows Server 2003.

What this tells you: Large organizations have legacy systems that never get patched. The gap between "software reached end of life" and "organization actually decommissioned it" is measured in years, sometimes decades.

SSL/TLS section (from the HTTP Insights):

TLS 1.0 still running on 719K devices
TLS 1.1 on 722K devices Both versions are deprecated and known to be breakable. Over 1.4 million Microsoft-associated devices are running protocols that shouldn't be used in production.

What I Actually Did Today — The Practical Sequence

Searched a university (NFSU Bhopal) on Google and DuckDuckGo — compared results, noted what each surface
Ran Shodan search on "microsoft" — studied the full report: ports, vulnerabilities, OS breakdown, SSL data
Checked Censys as a secondary search engine for internet infrastructure
Used intitle:"index of" Google dork on wscubetech.com — found the sitemap.xml showing full site architecture
Pulled WsCube Tech's LinkedIn — mapped company structure, employee count, location data
Pulled WsCube Tech's Facebook — collected phone number, email, website, follower count

All passive. Zero contact with target systems. This is the standard passive recon workflow.

What I Don't Have Clean Yet

DNS footprinting — WHOIS lookups, DNS enumeration, zone transfer attempts. This bridges passive and active recon and I haven't touched it properly.
Email footprinting — finding all email addresses associated with a target, validating which are active, inferring email format patterns. That's tomorrow.
Website footprinting — deep analysis of a target website: technologies used, server headers, hidden directories, metadata in files. Also tomorrow.
Shodan CLI — running Shodan from the Kali terminal instead of the web UI. More powerful, scriptable, automatable.

Honest Reflection

Recon feels less exciting than exploitation. There's no "shell popped" moment. But this is where real attackers spend the most time — and where defenders have the most blind spots.

Most organizations don't know what Shodan shows about them. They don't monitor what their employees post on LinkedIn. They don't track what Google has indexed from their servers. They're focused on defending the perimeter while their attack surface is being mapped from the outside, passively, with zero alerts triggered.

The other thing that hit me today: the information is already out there. I didn't hack anything. I used public tools on public data. Everything I found about WsCube Tech — their phone number, email, employee count, site structure, technology stack — was available without logging into anything or sending a single packet to their servers.

That's the unsettling part of passive recon. The target doesn't even know you exist yet.

Tomorrow's Target

Website Footprinting — extracting maximum information from a target website: tech stack detection, server headers, metadata in files, robots.txt, hidden pages
Email Footprinting — finding email addresses, tracking email headers, inferring email formats, validating which addresses are active

Today's Quick Reference

Reconnaissance types:

Passive = no target contact, zero detection risk
Active = direct interaction, detection possible

Search engine tools:

Shodan — internet-connected device scanner, shows open ports, versions, vulnerabilities
Censys — infrastructure indexing, SSL/certificate focus
DuckDuckGo — no tracking, no search history
Not Evil — Tor-based, full anonymity

Top Google dork operators:

site: — restrict to domain
filetype: — find specific file types
intitle: — page title search
inurl: — URL string search
cache: — Google's cached version
indemof: / intitle:"index of" — find open directory listings

Social media intel:

LinkedIn → org structure, employee names, tech stack from job posts
Facebook → contact details, phone numbers, emails
Instagram/Twitter → employee activity, accidental infrastructure leaks

Key insight from Shodan:

EternalBlue (2017 NSA exploit) still running on nearly 1,000 Microsoft-associated internet-facing devices in 2024
1.4M+ Microsoft devices still running deprecated TLS 1.0/1.1
Legacy never dies in enterprise environments

Day 6 tomorrow. Website footprinting. Email footprinting. The target's web presence as an attack surface.

Recon isn't the glamorous part. It's the part that makes everything else work.

Tags: #EthicalHacking #Footprinting #Reconnaissance #GoogleDorking #Shodan #OSINT #PenetrationTesting #CyberSecurity #LearningInPublic #30DaysOfHacking #InfoSec #Beginner #PassiveRecon #ActiveRecon #GoogleHacking

Contents