When performing web reconnaissance, some of the most valuable insights don't come from scanning tools — they come from standardized files that developers often overlook.
Two of the most powerful yet underestimated sources are:
robots.txt.well-known/URIs
These are not vulnerabilities by themselves — but they often point directly to them.
robots.txt: The "Do Not Enter" Sign That Everyone Can Read
Think of robots.txt as a guideline for bots. It tells crawlers what they should and shouldn't access.
However, here's the reality:
It doesn't enforce anything — it only reveals intentions.
What is robots.txt?
A simple text file located at:
https://target.com/robots.txtIt follows the Robots Exclusion Standard and contains rules for web crawlers.
Example robots.txt
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
User-agent: Googlebot
Crawl-delay: 10
Sitemap: https://example.com/sitemap.xmlWhat This Tells You
/admin/→ likely admin panel/private/→ sensitive content/public/→ intentionally exposed- Sitemap → full site structure
This is direct reconnaissance intelligence.
Why robots.txt is Gold in Recon
1. Hidden Directories
Developers often list sensitive paths:
/admin/
/backup/
/internal/
/tmp/These are high-value targets.
2. Website Structure Mapping
You can quickly understand:
- Important directories
- Restricted areas
- Application layout
3. Crawler Traps (Honeypots)
Sometimes fake paths are added to detect attackers.
Example:
/do-not-enter/
/trap/If accessed → you may get flagged.
.well-known: The Standardized Intelligence Hub
The .well-known/ directory is defined in RFC 8615.
It provides a central place for metadata, configs, and security-related info.
Location
https://target.com/.well-known/Why It Matters
Unlike robots.txt, this directory often contains:
- Security policies
- Authentication configs
- API endpoints
- Cryptographic data
Important .well-known Files
URIPurposesecurity.txtVulnerability reporting infochange-passwordPassword reset pageopenid-configurationAuth provider configassetlinks.jsonApp-domain verificationmta-sts.txtEmail security policy
Deep Dive: openid-configuration
One of the most powerful recon endpoints:
https://target.com/.well-known/openid-configurationExample Output
{
"issuer": "https://example.com",
"authorization_endpoint": "https://example.com/oauth2/authorize",
"token_endpoint": "https://example.com/oauth2/token",
"userinfo_endpoint": "https://example.com/oauth2/userinfo",
"jwks_uri": "https://example.com/oauth2/jwks"
}What You Gain
Endpoint Discovery
- Login endpoints
- Token endpoints
- User data endpoints
Cryptographic Info
- JWT signing keys (JWKS)
Supported Features
- Auth flows
- Scopes
- Algorithms
Real Recon Strategy
Combine both:
Step 1: Check robots.txt
curl https://target.com/robots.txtStep 2: Check .well-known
curl https://target.com/.well-known/security.txt
curl https://target.com/.well-known/openid-configurationStep 3: Analyze
- Look for hidden paths
- Extract endpoints
- Map authentication flow
Why This Technique Is Powerful
- Passive reconnaissance (low noise)
- No brute force needed
- High signal-to-noise ratio
- Often overlooked by beginners
Cheat Sheet
robots.txt
# View robots.txt
curl https://target.com/robots.txt
# Common sensitive paths
/admin/
/backup/
/private/
/tmp/
/config/.well-known
# Common endpoints
/.well-known/security.txt
/.well-known/openid-configuration
/.well-known/change-password
/.well-known/assetlinks.json
/.well-known/mta-sts.txtWhat to Look For
- Hidden directories
- API endpoints
- Authentication flows
- Misconfigurations
- Sensitive metadata
Pro Tips
- Always check robots.txt first
- Treat disallowed paths as targets, not restrictions
- Use
.well-knownto map auth systems - Combine with:
- Crawling
- Fingerprinting
- CT logs
Key Takeaways
robots.txtreveals where not to look — which is exactly where you should look.well-knownreveals how the system works internally- Together, they provide high-value reconnaissance with minimal effort