VAPT Series Part 2: Reconnaissance and Information Gathering — The Foundation of Every Pentest

If you've followed along since the introduction and lab setup post, you already know what VAPT (Vulnerability Assessment and Penetration Testing) is and you have a lab ready to practice in. Now it's time to actually start the engagement — and every real-world penetration test, regardless of scope or target, begins the same way: with reconnaissance.

This post breaks down what reconnaissance actually means, why it's arguably the most underrated phase of the entire VAPT process, and how to approach it methodically rather than randomly throwing tools at a target.

Why Reconnaissance Matters More Than People Think

New pentesters often want to skip straight to "hacking" — scanning ports, firing exploits, popping shells. But experienced penetration testers will tell you that the quality of a pentest is decided long before any exploit is attempted. The information you gather in this phase determines:

What attack surface actually exists
Which systems are in scope and reachable
What technologies, versions, and configurations are in play
What human and organizational weaknesses might exist (for social engineering-aware engagements)
How to prioritize your limited testing time

A rushed or shallow recon phase leads to missed vulnerabilities, wasted effort on dead ends, and reports that look thin compared to what a thorough tester would produce. In professional VAPT work, recon often consumes 30–40% of total project time — and for good reason.

Passive vs Active Reconnaissance

Reconnaissance splits into two broad categories, and understanding the difference is essential both technically and legally.

Passive Reconnaissance

Passive recon means gathering information without directly interacting with the target's systems. You're not sending packets to their servers; you're pulling information from public sources. This is virtually risk-free from a detection standpoint because the target never sees your traffic.

Sources for passive recon include:

WHOIS records — domain registration details, sometimes including registrant contact info
DNS records — historical and current DNS data, often revealing subdomains, mail servers, and infrastructure hints
Search engines — using advanced search operators to find exposed documents, login portals, or misconfigurations indexed by Google or Bing
Public code repositories — GitHub, GitLab, and similar platforms sometimes contain leaked credentials, API keys, or internal infrastructure details committed by mistake
Social media and job postings — surprisingly useful for understanding what technologies an organization uses (job listings asking for "AWS, Kubernetes, Jenkins experience" tell you a lot)
Certificate transparency logs — public logs of SSL/TLS certificates issued, which often reveal subdomains that aren't otherwise public
Breach databases and paste sites — checking if organizational email addresses or credentials have appeared in known breaches (for awareness, not for unauthorized use)

Active Reconnaissance

Active recon involves direct interaction with target systems — sending requests, probing ports, querying services. This carries detection risk because firewalls, intrusion detection systems (IDS), and security teams may notice the traffic.

Active recon includes:

Port scanning — identifying which ports are open and what services are listening
Service enumeration — determining software versions and configurations running on open ports
DNS zone transfers — attempting to pull full DNS records directly from a misconfigured server
Network mapping — understanding how systems are connected and segmented
Banner grabbing — collecting service banners that often reveal exact software versions

Active recon should only ever happen within an explicitly authorized scope and rules of engagement document. This is non-negotiable in professional and ethical hacking contexts.

Building an OSINT Methodology

OSINT (Open Source Intelligence) gathering benefits enormously from having a structured methodology rather than an ad hoc approach. Here's a simplified framework you can apply:

1. Define Your Targets

Before gathering anything, clarify exactly what's in scope. Is it a single domain? A range of IP addresses? An entire organization including subsidiaries? Scope definition prevents both legal issues and wasted effort.

2. Map the Organization's Digital Footprint

Start broad. Identify:

Primary domain and all known subdomains
IP ranges owned or leased by the organization
Cloud infrastructure providers in use (AWS, Azure, GCP)
Third-party services and vendors that may be in scope or relevant

3. Identify the Technology Stack

Understanding what an organization runs on helps you anticipate vulnerability classes later. Look for:

Web server software and versions
Content management systems (WordPress, Drupal, etc.)
Programming languages and frameworks in use
Known third-party plugins or components

4. Gather Human Intelligence (Where In Scope)

For engagements that include social engineering testing, understanding organizational structure, employee names, email formats, and roles can be relevant. This is sensitive territory and must be explicitly authorized in the scope document.

5. Document Everything

This is the step most beginners skip — and it's the one that separates amateur recon from professional recon. Every finding should be documented with:

Source of the information
Date and time discovered
Relevance to the assessment
Screenshot or evidence where applicable

Good documentation here saves enormous time when you write your final report, and it gives you a clear paper trail justifying your findings.

Common Tool Categories (Conceptual Overview)

Rather than walking through exact commands (which is better suited for hands-on lab practice under proper authorization), it's worth understanding the categories of tools used in this phase and what role each plays:

DNS enumeration tools help map subdomains and DNS infrastructure
OSINT aggregation frameworks pull data from multiple public sources into a single workflow, saving manual searching time
Network scanners identify live hosts and open services within an authorized IP range
Search engine reconnaissance tools automate the process of finding exposed information indexed by search engines
Certificate transparency search tools help discover subdomains via SSL certificate logs
Metadata extraction tools pull hidden information out of publicly available documents (PDFs, Word docs) that organizations have published

If you're in your lab environment from the previous post, this is a great stage to start getting comfortable navigating these tool categories — not memorizing flags, but understanding what each one is actually trying to accomplish and why.

From Recon to the Next Phase

Everything gathered during reconnaissance feeds directly into the next phase: scanning and enumeration, where you take this broad map of information and start narrowing in on specific systems, services, and potential weaknesses. Without solid recon, scanning becomes guesswork. With it, scanning becomes targeted and efficient.

A useful mental model: recon answers "what exists and what does the attack surface look like," while scanning answers "which of these things might actually be vulnerable."

Ethical and Legal Reminders

It bears repeating, especially in a public-facing post: reconnaissance — even the passive kind — should only be performed against systems and organizations you have explicit written authorization to test. Passive recon against random domains "just to practice" can still raise legal and ethical questions depending on jurisdiction and intent. Practicing in your own lab, on intentionally vulnerable platforms designed for this purpose, or against systems you've been contractually authorized to test, is the only responsible path forward.

What's Next

In the next post in this series, we'll move into the scanning and enumeration phase — taking the information gathered here and using it to identify live hosts, open ports, and running services in a structured, methodical way.

If you found this useful, let me know in the comments what part of the VAPT process you'd like covered next, or if there's a specific recon technique you'd like a deeper dive on.

Contents