When performing web reconnaissance, some of the most valuable insights don't come from scanning tools — they come from standardized files that developers often overlook.

Two of the most powerful yet underestimated sources are:

  • robots.txt
  • .well-known/ URIs

These are not vulnerabilities by themselves — but they often point directly to them.

robots.txt: The "Do Not Enter" Sign That Everyone Can Read

Think of robots.txt as a guideline for bots. It tells crawlers what they should and shouldn't access.

However, here's the reality:

It doesn't enforce anything — it only reveals intentions.

What is robots.txt?

A simple text file located at:

https://target.com/robots.txt

It follows the Robots Exclusion Standard and contains rules for web crawlers.

Example robots.txt

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
User-agent: Googlebot
Crawl-delay: 10
Sitemap: https://example.com/sitemap.xml

What This Tells You

  • /admin/ → likely admin panel
  • /private/ → sensitive content
  • /public/ → intentionally exposed
  • Sitemap → full site structure

This is direct reconnaissance intelligence.

Why robots.txt is Gold in Recon

1. Hidden Directories

Developers often list sensitive paths:

/admin/
/backup/
/internal/
/tmp/

These are high-value targets.

2. Website Structure Mapping

You can quickly understand:

  • Important directories
  • Restricted areas
  • Application layout

3. Crawler Traps (Honeypots)

Sometimes fake paths are added to detect attackers.

Example:

/do-not-enter/
/trap/

If accessed → you may get flagged.

.well-known: The Standardized Intelligence Hub

The .well-known/ directory is defined in RFC 8615.

It provides a central place for metadata, configs, and security-related info.

Location

https://target.com/.well-known/

Why It Matters

Unlike robots.txt, this directory often contains:

  • Security policies
  • Authentication configs
  • API endpoints
  • Cryptographic data

Important .well-known Files

URIPurposesecurity.txtVulnerability reporting infochange-passwordPassword reset pageopenid-configurationAuth provider configassetlinks.jsonApp-domain verificationmta-sts.txtEmail security policy

Deep Dive: openid-configuration

One of the most powerful recon endpoints:

https://target.com/.well-known/openid-configuration

Example Output

{
  "issuer": "https://example.com",
  "authorization_endpoint": "https://example.com/oauth2/authorize",
  "token_endpoint": "https://example.com/oauth2/token",
  "userinfo_endpoint": "https://example.com/oauth2/userinfo",
  "jwks_uri": "https://example.com/oauth2/jwks"
}

What You Gain

Endpoint Discovery

  • Login endpoints
  • Token endpoints
  • User data endpoints

Cryptographic Info

  • JWT signing keys (JWKS)

Supported Features

  • Auth flows
  • Scopes
  • Algorithms

Real Recon Strategy

Combine both:

Step 1: Check robots.txt

curl https://target.com/robots.txt

Step 2: Check .well-known

curl https://target.com/.well-known/security.txt
curl https://target.com/.well-known/openid-configuration

Step 3: Analyze

  • Look for hidden paths
  • Extract endpoints
  • Map authentication flow

Why This Technique Is Powerful

  • Passive reconnaissance (low noise)
  • No brute force needed
  • High signal-to-noise ratio
  • Often overlooked by beginners

Cheat Sheet

robots.txt

# View robots.txt
curl https://target.com/robots.txt
# Common sensitive paths
/admin/
/backup/
/private/
/tmp/
/config/

.well-known

# Common endpoints
/.well-known/security.txt
/.well-known/openid-configuration
/.well-known/change-password
/.well-known/assetlinks.json
/.well-known/mta-sts.txt

What to Look For

  • Hidden directories
  • API endpoints
  • Authentication flows
  • Misconfigurations
  • Sensitive metadata

Pro Tips

  • Always check robots.txt first
  • Treat disallowed paths as targets, not restrictions
  • Use .well-known to map auth systems
  • Combine with:
  • Crawling
  • Fingerprinting
  • CT logs

Key Takeaways

  • robots.txt reveals where not to look — which is exactly where you should look
  • .well-known reveals how the system works internally
  • Together, they provide high-value reconnaissance with minimal effort