Hidden Entry Points: robots.txt + .well-known URIs in Web Recon

When performing web reconnaissance, some of the most valuable insights don't come from scanning tools — they come from standardized files…

Mert Baykal

~2 min read · April 19, 2026 (Updated: April 19, 2026) · Free: Yes

When performing web reconnaissance, some of the most valuable insights don't come from scanning tools — they come from standardized files that developers often overlook.

Two of the most powerful yet underestimated sources are:

robots.txt
.well-known/ URIs

These are not vulnerabilities by themselves — but they often point directly to them.

robots.txt: The "Do Not Enter" Sign That Everyone Can Read

Think of robots.txt as a guideline for bots. It tells crawlers what they should and shouldn't access.

However, here's the reality:

It doesn't enforce anything — it only reveals intentions.

What is robots.txt?

A simple text file located at:

https://target.com/robots.txt

It follows the Robots Exclusion Standard and contains rules for web crawlers.

Example robots.txt

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
User-agent: Googlebot
Crawl-delay: 10
Sitemap: https://example.com/sitemap.xml

What This Tells You

/admin/ → likely admin panel
/private/ → sensitive content
/public/ → intentionally exposed
Sitemap → full site structure

This is direct reconnaissance intelligence.

Why robots.txt is Gold in Recon

1. Hidden Directories

Developers often list sensitive paths:

/admin/
/backup/
/internal/
/tmp/

These are high-value targets.

2. Website Structure Mapping

You can quickly understand:

Important directories
Restricted areas
Application layout

3. Crawler Traps (Honeypots)

Sometimes fake paths are added to detect attackers.

Example:

/do-not-enter/
/trap/

If accessed → you may get flagged.

.well-known: The Standardized Intelligence Hub

The .well-known/ directory is defined in RFC 8615.

It provides a central place for metadata, configs, and security-related info.

Location

https://target.com/.well-known/

Why It Matters

Unlike robots.txt, this directory often contains:

Security policies
Authentication configs
API endpoints
Cryptographic data

Important .well-known Files

URIPurposesecurity.txtVulnerability reporting infochange-passwordPassword reset pageopenid-configurationAuth provider configassetlinks.jsonApp-domain verificationmta-sts.txtEmail security policy

Deep Dive: openid-configuration

One of the most powerful recon endpoints:

https://target.com/.well-known/openid-configuration

Example Output

{
  "issuer": "https://example.com",
  "authorization_endpoint": "https://example.com/oauth2/authorize",
  "token_endpoint": "https://example.com/oauth2/token",
  "userinfo_endpoint": "https://example.com/oauth2/userinfo",
  "jwks_uri": "https://example.com/oauth2/jwks"
}

What You Gain

Endpoint Discovery

Login endpoints
Token endpoints
User data endpoints

Cryptographic Info

JWT signing keys (JWKS)

Supported Features

Auth flows
Scopes
Algorithms

Real Recon Strategy

Combine both:

Step 1: Check robots.txt

curl https://target.com/robots.txt

Step 2: Check .well-known

curl https://target.com/.well-known/security.txt
curl https://target.com/.well-known/openid-configuration

Step 3: Analyze

Look for hidden paths
Extract endpoints
Map authentication flow

Why This Technique Is Powerful

Passive reconnaissance (low noise)
No brute force needed
High signal-to-noise ratio
Often overlooked by beginners

Cheat Sheet

robots.txt

# View robots.txt
curl https://target.com/robots.txt
# Common sensitive paths
/admin/
/backup/
/private/
/tmp/
/config/

.well-known

# Common endpoints
/.well-known/security.txt
/.well-known/openid-configuration
/.well-known/change-password
/.well-known/assetlinks.json
/.well-known/mta-sts.txt

What to Look For

Hidden directories
API endpoints
Authentication flows
Misconfigurations
Sensitive metadata

Pro Tips

Always check robots.txt first
Treat disallowed paths as targets, not restrictions
Use .well-known to map auth systems
Combine with:
Crawling
Fingerprinting
CT logs

Key Takeaways

robots.txt reveals where not to look — which is exactly where you should look
.well-known reveals how the system works internally
Together, they provide high-value reconnaissance with minimal effort

#web-security #cybersecurity #osint #bug-bounty #penetration-testing

< Go to the original