"The internet forgets nothing. It only hides things badly."
Disclaimer: All techniques in this document are for authorized security research, bug bounty programs, and educational purposes only. Unauthorized use against systems you do not own or have explicit written permission to test is illegal. The author assumes no liability.
Table of Contents
- The Philosophy of Passive Reconnaissance
- Operator Logic: The Grammar of Dorking
- Advanced Operator Chaining
- Vulnerability Case Studies
- 4.1 Exposed Environment & Config Files
- 4.2 SQL Dumps & Database Backups
- 4.3 Open Directories & Sensitive Logs
- 4.4 IoT Devices & Exposed Dashboards
5. The OSINT Operator's Reference Sheet
- Mastery Lab: 15 Progressive Exercises
- Operational Security & Legal Boundaries
- Further Resources
1. The Philosophy of Passive Reconnaissance
Before a single packet hits a target network, before a scanner spins up, before social engineering enters the picture — there is Google Dorking. It is the art of weaponizing the world's most powerful indexing engine against its own corpus.
Search engines don't just index content. They index mistakes: misconfigured web servers, forgotten backup files, publicly accessible admin panels, credentials left in config files, and directory listings that were never meant to face the internet. Every one of these is a data point. Aggregate enough data points and you have a target profile — all without generating a single log entry on the target's systems.
This is the core advantage of passive recon: zero footprint. You are not connecting to the target. You are reading what the internet already knows about them.
Dorking is built on Google's Advanced Search Operators — boolean and syntactic filters that narrow the search index with surgical precision. The technique is also known as Google Hacking, popularized by Johnny Long's Google Hacking Database (GHDB) and the Exploit-DB project. The same principles apply to Bing, DuckDuckGo, Shodan, Censys, and ZoomEye.
This document teaches you the complete grammar of that language.
2. Operator Logic: The Grammar of Dorking
Think of Google's index as a relational database. Every web page is a row. Search operators are WHERE clauses. Combine them correctly and you SELECT only the rows that matter.
2.1 Core Operator Reference
site:
Function: Restricts results to a specific domain or subdomain.
Syntax: site:example.com
Logic: WHERE domain = 'example.com'
site:tesla.com filetype:pdf
site:*.gov confidential
site:staging.example.comInversion: -site:www.example.com excludes the primary domain, forcing Google to surface subdomains and alternative properties.
site:example.com -site:www.example.comintitle:
Function: Matches search terms against the HTML <title> tag of a page.
Syntax: intitle:"keyword"
Logic: WHERE page_title LIKE '%keyword%'
Multi-word variant: allintitle:word1 word2 requires both words in title.
intitle:"index of"
intitle:"admin panel" site:example.com
allintitle:login portal adminWhy it matters: Page titles are rarely sanitized by developers. Admin panels, error pages, and directory listings broadcast their nature in the title.
inurl:
Function: Matches search terms against the URL path.
Syntax: inurl:"/admin"
Logic: WHERE url_path CONTAINS '/admin'
Multi-word variant: allinurl:word1 word2
inurl:"/wp-admin/admin-ajax.php"
inurl:"/phpinfo.php" site:example.com
inurl:"?id=" inurl:".php"Why it matters: URL structures reveal application frameworks, parameter names, and hidden endpoints. A URL containing /api/v1/internal/ tells you more than any documentation.
filetype: / ext:
Function: Restricts results to a specific file extension. filetype: and ext: are functionally identical on Google.
Syntax: filetype:pdf or ext:sql
Logic: WHERE file_extension = 'pdf'
filetype:env site:example.com
ext:sql "INSERT INTO"
filetype:log inurl:"/var/log"
ext:bak site:example.comWhy it matters: This is the single most powerful operator for finding exposed sensitive files. Developers deploy backup files (.bak, .old, .orig), leave configuration files (.env, .config, .yaml) accessible, and forget SQL dumps in web-accessible directories.
intext:
Function: Searches the full body text (content) of indexed pages.
Syntax: intext:"password"
Logic: WHERE page_body CONTAINS 'password'
intext:"DB_PASSWORD" filetype:env
intext:"-----BEGIN RSA PRIVATE KEY-----"
intext:"mysql_connect" filetype:phpcache:
Function: Returns Google's cached (archived) snapshot of a page.
Syntax: cache:example.com/page
Use case: Accessing content that has since been removed or taken offline. Useful for recovering deleted data during an investigation.
cache:example.com/admin/configNote:
cache:is increasingly deprecated in Google's interface but still functions in direct URL form:https://webcache.googleusercontent.com/search?q=cache:example.com
before: and after:
Function: Filters results by index date.
Syntax: before:YYYY-MM-DD / after:YYYY-MM-DD
site:example.com after:2023-01-01 filetype:pdf
intext:"password" filetype:env after:2022-06-01Boolean Modifiers
Modifier Symbol Behavior AND (implicit / space) Both terms must appear OR OR or | Either term may appear NOT - (minus prefix) Exclude this term Exact match "..." (quotes) Literal string match Wildcard * (asterisk) Matches any single word
"admin" OR "administrator" inurl:login site:example.com
intitle:"index of" -"parent directory" # Exclude false positives
"password" * "username" filetype:txt2.2 Operator Scope Rules
Understanding scope prevents false positives and wasted time:
- Operators without quotes match partial strings:
inurl:adminmatches/admin,/administrator,/adminpanel - Quoted operators enforce exact substring:
inurl:"/admin/"matches only/admin/ - Multiple operators of the same type use AND logic by default:
intitle:login inurl:adminrequires BOTH conditions - Minus operator must precede the term with no space:
-site:example.com(correct) vs- site:example.com(wrong) site:is the scope anchor — always place it first or last for readability
3. Advanced Operator Chaining
Individual operators are picks. Chained operators are crowbars.
3.1 The Chaining Formula
[SCOPE] + [LOCATION FILTER] + [CONTENT FILTER] + [FILE TYPE] + [NEGATIVE EXCLUSIONS]Exemplar chain:
site:example.com inurl:"/backup" ext:sql -"access denied"Decomposed:
site:example.com→ Target scopeinurl:"/backup"→ Location filter (URL must contain/backup)ext:sql→ File type filter-"access denied"→ Exclude pages that are protected (reduces noise)
3.2 High-Yield Chain Patterns
Pattern: Subdomain Enumeration via site: inversion
site:*.example.com -site:www.example.com -site:mail.example.comGoogle indexes subdomains. This surfaces staging, dev, api, and internal subdomains that certificate transparency logs may miss.
Pattern: Parameter Discovery for Injection Testing
site:example.com inurl:"?" inurl:".php" -inurl:".js"Surfaces PHP pages with GET parameters — primary attack surface for SQLi and LFI.
Pattern: Exposed Panels & Interfaces
intitle:"dashboard" inurl:"/admin" site:example.com
intitle:"phpMyAdmin" inurl:"/phpmyadmin" -demo -tutorial
intitle:"Kibana" inurl:":5601"Pattern: Version Disclosure for CVE Matching
intitle:"Apache/2.4.49" "server at"
intext:"Powered by WordPress 5.8" site:example.com
intitle:"IIS Windows Server" intext:"IIS/10.0"Extract version strings, cross-reference with NVD or Exploit-DB.
Pattern: Cloud & Infrastructure Enumeration
site:s3.amazonaws.com "example"
site:blob.core.windows.net "example"
site:storage.googleapis.com "example"Discovers public cloud storage buckets associated with a target — a goldmine for misconfiguration findings.
Pattern: API Key & Token Exposure in GitHub (via Google)
site:github.com "example.com" "api_key" OR "secret_key" OR "access_token"
site:pastebin.com "example.com" passwordPattern: Error Message Exploitation
Error messages are intelligence leakage. They reveal stack traces, internal paths, database schemas, and framework versions.
site:example.com intext:"Fatal error" intext:"on line"
site:example.com intext:"mysqli_connect()" intext:"failed"
site:example.com intext:"Warning: include(" intext:"failed to open stream"Pattern: The "Dork of Dorks" — Credential Search
filetype:txt intext:"username" intext:"password" intext:"email" site:example.com
filetype:csv intext:"password" site:example.com
intext:"login:" intext:"password:" filetype:log4. Vulnerability Case Studies
4.1 Exposed Environment & Config Files
Environment files (.env, .config, .ini, .yaml) are the single most common critical misconfiguration found via dorking. They contain database credentials, API keys, JWT secrets, OAuth tokens, and SMTP credentials — often for production systems.
Dork Strings
# Primary .env discovery
filetype:env "DB_PASSWORD"
filetype:env "APP_KEY"
filetype:env intext:"DB_HOST" intext:"DB_USER"
ext:env "SECRET_KEY"
# Laravel / Symfony / Node
filetype:env "APP_ENV=production"
filetype:env intext:"MAIL_PASSWORD"
filetype:env intext:"STRIPE_SECRET"
filetype:env intext:"AWS_SECRET_ACCESS_KEY"
# Configuration files across frameworks
ext:config intext:"connectionString"
ext:xml intext:"<password>" intext:"<username>"
ext:yml intext:"password:" intext:"host:"
ext:yaml intext:"db_pass:" -site:github.com
ext:ini intext:"password" intext:"user"
# Exposed application config backups
inurl:"config.php.bak"
inurl:"wp-config.php.bak" OR inurl:"wp-config.php~"
filetype:bak inurl:"config"
ext:old inurl:"settings"What You're Looking For
# Typical .env content (DO NOT use against real systems)
DB_HOST=prod-db.example.internal
DB_DATABASE=customers_production
DB_USERNAME=root
DB_PASSWORD=Sup3rS3cur3!
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
STRIPE_SECRET=sk_live_XXXXXXXXXXXXXXXXXXXX
JWT_SECRET=my-super-secret-jwt-keyWhy This Happens
Developers use .env files locally, push them to production servers, and forget to add them to .gitignore or robots.txt. Web server misconfigurations (e.g., no file extension MIME type restriction in Apache/Nginx) allow direct HTTP access to these files.
4.2 SQL Dumps & Database Backups
Database backups are time-bombs. DBAs and developers routinely dump databases for migration, testing, or backup purposes and place them in web-accessible directories "temporarily."
Dork Strings
# SQL dumps
ext:sql "INSERT INTO" "VALUES"
ext:sql intext:"CREATE TABLE"
ext:sql intext:"DROP TABLE IF EXISTS"
filetype:sql intext:"-- phpMyAdmin SQL Dump"
filetype:sql intext:"Dumping data for table"
# Compressed backups (often contain full DB dumps)
ext:gz inurl:"backup" intext:"sql"
ext:zip inurl:"backup" intext:"database"
ext:tar.gz inurl:"db_backup"
ext:7z inurl:"database" site:example.com
# MySQL-specific
filetype:sql intext:"mysql_connect"
inurl:"mysqldump" ext:sql
filetype:sql intext:"ENGINE=InnoDB"
# Specific backup file naming conventions
inurl:"backup_db" ext:sql
inurl:"db_dump" ext:sql
inurl:"database.sql"
inurl:"prod_backup" ext:sqlWhy This Happens
mysqldump outputs .sql files by default. Developers often place them at the web root (/var/www/html/backup.sql) for easy SCP or SFTP retrieval, then forget them. Apache's default configuration serves any readable file in the web root.
Impact Assessment
A single SQL dump can contain:
- Full user table with bcrypt/MD5 hashes (crackable offline)
- PII: email addresses, phone numbers, physical addresses
- Payment metadata (if GDPR/PCI compliance was not enforced at schema level)
- Internal business logic exposed via stored procedures
4.3 Open Directories & Sensitive Log Files
An "open directory" (or "directory listing") occurs when a web server is configured to render a browsable list of files in a directory when no index.html/index.php exists. This is the digital equivalent of leaving your filing cabinet open.
Dork Strings — Open Directories
# Classic open directory signatures
intitle:"index of"
intitle:"Index of /" "parent directory"
intitle:"index of" "last modified"
# Targeted directory types
intitle:"index of" "backup"
intitle:"index of" "/etc"
intitle:"index of" "logs"
intitle:"index of" ".ssh"
intitle:"index of" "private"
intitle:"index of" "passwords"
intitle:"index of" "/uploads"
# Combined with site scope
intitle:"index of" site:example.com
intitle:"index of" ext:sql site:example.comDork Strings — Log Files
Log files are a treasure chest for OSINT and red team operators. They contain usernames, internal IP addresses, error stack traces, API endpoints, and sometimes plaintext passwords (from failed login attempts that included the password in the URL).
# General log file exposure
ext:log inurl:"/logs/"
ext:log intext:"error" intext:"warning"
filetype:log "password"
filetype:log intext:"username" intext:"failed"
# Application-specific logs
ext:log intext:"PHP Fatal error"
ext:log inurl:"access_log"
ext:log inurl:"error_log"
ext:log intext:"Apache" intext:"GET /"
# Authentication logs (extremely sensitive)
filetype:log intext:"authentication failure"
filetype:log intext:"Invalid user" intext:"ssh"
filetype:log intext:"Failed password for"
# Application framework logs
inurl:"/storage/logs/laravel.log"
inurl:"/application/logs/"
filetype:log intext:"Stack trace:"Dork Strings — SSH & Cryptographic Key Material
ext:pem intext:"BEGIN RSA PRIVATE KEY"
ext:key intext:"PRIVATE KEY"
filetype:ppk intext:"PuTTY-User-Key-File"
intext:"-----BEGIN OPENSSH PRIVATE KEY-----" filetype:txt
intext:"-----BEGIN EC PRIVATE KEY-----"4.4 IoT Devices & Exposed Dashboards
The real-time web is full of industrial control systems, IP cameras, network appliances, and monitoring dashboards that were deployed with default credentials and zero authentication — fully indexed by Google and Shodan.
Dork Strings — Network Devices
# Router & switch admin panels
intitle:"RouterOS" inurl:"/winbox"
intitle:"NETGEAR" inurl:"/setup.cgi"
intitle:"Cisco IOS" inurl:"/level/15/exec/-/show"
intitle:"DD-WRT" inurl:"/Management.asp"
intitle:"pfSense" inurl:"/index.php" -demo
# Printers (often expose network configs and recent print jobs)
intitle:"Printer Status" inurl:"/printer/main.html"
intitle:"HP LaserJet" inurl:"/hp/device/info_deviceStatus.xml"
intitle:"RICOH" inurl:"/web/guest/en/websys/"Dork Strings — IP Cameras & Video Feeds
intitle:"Live View / - AXIS"
intitle:"webcamXP" inurl:":8080"
inurl:"/axis-cgi/mjpg/video.cgi"
intitle:"Network Camera" inurl:"/view/index.shtml"
intitle:"IP Camera" inurl:"/view/viewer_index.shtml"
inurl:"/CgiStart?page=Single"Dork Strings — Industrial Control Systems (ICS/SCADA)
intitle:"SCADA" inurl:"/main.html"
intitle:"Modbus" inurl:":502"
intitle:"Schneider Electric" inurl:"/index.html"
intext:"PLCopen" inurl:"/webvisu.htm"
intitle:"Siemens" inurl:"/portal/portal.mwsl"⚠️ Critical Warning: ICS/SCADA systems control physical infrastructure. Unauthorized access is a federal crime in most jurisdictions and can cause physical harm. Research access ONLY in authorized lab environments.
Dork Strings — Monitoring & Analytics Dashboards
# Grafana (unauthenticated instances)
intitle:"Grafana" inurl:":3000"
inurl:"/d/" intitle:"Grafana"
# Kibana (exposed Elasticsearch data)
intitle:"Kibana" inurl:":5601/app/kibana"
# Prometheus
intitle:"Prometheus" inurl:":9090/graph"
# Jupyter Notebooks (often contain API keys in cells)
intitle:"Jupyter Notebook" inurl:"/tree"
# Exposed Jenkins CI/CD
intitle:"Dashboard [Jenkins]"
intitle:"Jenkins" inurl:"/manage"Dork Strings — Database Admin Interfaces
intitle:"phpMyAdmin" inurl:"/phpmyadmin"
intitle:"phpMyAdmin" "running on" inurl:"main.php"
intitle:"Adminer" inurl:"/adminer.php"
intitle:"MongoDB" inurl:":27017"
intitle:"Redis" inurl:":6379"
inurl:"/solr/admin"
intitle:"Elasticsearch" inurl:":9200/_cat/indices"5. The OSINT Operator's Reference Sheet
A consolidated cheat sheet for field use:
┌─────────────────────────────────────────────────────────────────────┐
│ DORKING QUICK REFERENCE │
├──────────────────┬──────────────────────────────────────────────────┤
│ OPERATOR │ USE CASE │
├──────────────────┼──────────────────────────────────────────────────┤
│ site: │ Scope to domain / enumerate subdomains │
│ intitle: │ Page title match → admin panels, dir listings │
│ inurl: │ URL path match → endpoints, params, file paths │
│ filetype:/ext: │ File extension → env, sql, log, bak, pem, yml │
│ intext: │ Body content match → credentials, error msgs │
│ cache: │ View archived/deleted page content │
│ before:/after: │ Time-bound searches for historical data │
│ - │ Exclude term → reduce noise, filter false pos │
│ " " │ Exact phrase match → reduce false positives │
│ * │ Wildcard → flexible phrase matching │
│ OR / | │ Multi-term matching → broaden results │
└──────────────────┴──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ TOP SENSITIVE EXTENSIONS │
├───────────┬──────────────────────────────────────────────────────┤
│ .env │ App environment — DB creds, API keys, secrets │
│ .sql │ Database dumps — schemas, user hashes, data │
│ .log │ App/server logs — usernames, IPs, errors, passwords │
│ .bak/.old │ Backup files — source code, configs, DB dumps │
│ .pem/.key │ Cryptographic keys — SSH, SSL, code signing │
│ .yaml/.yml│ Infrastructure configs — K8s, Docker, Ansible │
│ .config │ Application configs — often contain credentials │
│ .ppk │ PuTTY private keys │
│ .htpasswd │ Apache password files │
│ .csv/.xls │ Data exports — PII, customer records, credentials │
└───────────┴──────────────────────────────────────────────────────┘6. Mastery Lab: 15 Progressive Exercises
The following tasks are progressive. Each builds on the last. All tasks should be performed on domains you own, in authorized bug bounty scopes, or against legal OSINT practice targets (e.g., scanme.nmap.org, hackthebox.eu, tryhackme.com, vulnhub.com, HackerOne/Bugcrowd public programs).
Tier 1 — Fundamentals (Tasks 1–4)
Task 1: Basic Site Enumeration
Objective: Map the visible surface of a target domain. Difficulty: ★☆☆☆☆
site:example.comExpected output: All indexed pages under the domain. Next steps: Count results. Check for subdomains, unusual pages, admin paths in URLs.
Variations to try:
site:example.com -site:www.example.com # Subdomain enumeration
site:example.com filetype:pdf # Document enumeration
site:example.com inurl:login # Authentication surfaceTask 2: File Type Mapping
Objective: Enumerate all file types publicly indexed for a domain. Difficulty: ★☆☆☆☆
Run this sequence for a target domain:
site:example.com filetype:pdf
site:example.com filetype:doc OR filetype:docx
site:example.com filetype:xls OR filetype:xlsx
site:example.com filetype:txt
site:example.com filetype:xmlAnalysis prompt: What can the document metadata (author, creation date, software version) reveal about internal tooling? Use ExifTool on downloaded files.
Task 3: Title-Based Panel Discovery
Objective: Identify exposed admin and management interfaces. Difficulty: ★★☆☆☆
intitle:"admin" OR intitle:"dashboard" OR intitle:"control panel" site:example.comBonus: Cross-reference found panels against searchsploit for known CVEs matching the software version disclosed in the page title.
Task 4: Open Directory Hunting
Objective: Find directory listings on a target domain. Difficulty: ★★☆☆☆
intitle:"index of" site:example.com
intitle:"index of" "parent directory" site:example.comDocument: List every exposed directory. Note file types present. Flag any directories containing config, backup, log, or credential files for escalation.
Tier 2 — Intermediate (Tasks 5–9)
Task 5: Credential File Discovery
Objective: Locate exposed .env and config files.
Difficulty: ★★★☆☆
site:example.com filetype:env
site:example.com ext:config
site:example.com inurl:".env" -inurl:".env."
site:example.com intext:"DB_PASSWORD"Report format:
URL: [discovered URL]
Severity: Critical
Contents: [types of credentials found]
Remediation: Remove file from web root; rotate all exposed credentials immediatelyTask 6: Log File Exposure Mapping
Objective: Surface application and server log files. Difficulty: ★★★☆☆
site:example.com ext:log
site:example.com inurl:"/logs/" -inurl:"robots.txt"
site:example.com filetype:log intext:"error"OSINT extraction targets within logs:
- Internal IP address ranges (RFC 1918 vs. public)
- Usernames from authentication failures
- API endpoint paths not visible in public documentation
- Software/framework version strings from stack traces
Task 7: SQL Dump & Backup Discovery
Objective: Identify exposed database artifacts. Difficulty: ★★★☆☆
site:example.com ext:sql
site:example.com inurl:"backup" ext:zip OR ext:gz OR ext:tar
site:example.com inurl:"db_backup"
site:example.com filetype:sql intext:"INSERT INTO"Escalation path: If a .sql dump is found, analyze the schema for user tables. Extract password hashes for offline cracking with Hashcat: hashcat -m 3200 hashes.txt rockyou.txt (bcrypt mode).
Task 8: Cloud Storage Bucket Enumeration
Objective: Find publicly accessible cloud storage linked to a target. Difficulty: ★★★☆☆
site:s3.amazonaws.com "targetname"
site:blob.core.windows.net "targetname"
site:storage.googleapis.com "targetname"
inurl:"targetname.s3.amazonaws.com"Verification: Once a bucket is found, attempt enumeration with aws s3 ls s3://bucketname --no-sign-request. A successful listing indicates a public bucket. Document ALL exposed files before reporting.
Task 9: Version Fingerprinting for CVE Identification
Objective: Extract software version information for vulnerability mapping. Difficulty: ★★★☆☆
site:example.com intext:"Powered by WordPress"
site:example.com intext:"jQuery v" intext:"Copyright"
intitle:"Apache" intext:"Server at" intext:"Port 80" site:example.com
site:example.com intext:"PHP/" -intext:"documentation"Workflow:
- Extract version string from dork result
- Query NVD:
https://nvd.nist.gov/vuln/search?query=wordpress+5.8 - Cross-reference:
searchsploit wordpress 5.8 - Correlate with available exploits in Metasploit:
search type:exploit name:wordpress
Tier 3 — Advanced (Tasks 10–13)
Task 10: IoT Device Surface Mapping
Objective: Enumerate internet-exposed devices associated with a target organization. Difficulty: ★★★★☆
Use a combination of Google and Shodan for maximum coverage.
Google:
intitle:"Network Camera" site:example.com
intitle:"SCADA" OR intitle:"HMI" site:example.com
intitle:"phpMyAdmin" site:example.comShodan (supplement):
org:"Target Organization Name" port:22,23,80,443,8080,8443,9200,6379,5432,3306,27017Deliverable: A device inventory table:
| IP/URL | Device Type | Firmware | Auth? | CVE |
|--------|-------------|----------|-------|-----|Task 11: Multi-Platform Credential Harvesting
Objective: Search paste sites, code repositories, and document hosts for credential exposure. Difficulty: ★★★★☆
site:pastebin.com "example.com" password
site:github.com "example.com" "api_key" OR "secret"
site:gitlab.com "example.com" filetype:env
site:trello.com "example.com" password
site:docs.google.com "example.com" "password"Automation reference: Tools like GitDorker, trufflehog, and gitleaks automate credential scanning across GitHub. The manual dork approach surfaces what automation misses — plain text pastes and document embeds.
Task 12: Error Message Intelligence Extraction
Objective: Map internal architecture from exposed error messages. Difficulty: ★★★★☆
site:example.com intext:"Fatal error:" intext:"on line"
site:example.com intext:"SQLSTATE[" -intext:"documentation"
site:example.com intext:"ORA-01" -intext:"tutorial"
site:example.com intext:"Exception in thread" intext:"at java."
site:example.com intext:"Traceback (most recent call last)"Intelligence extraction table:
Error Type What It Reveals PHP Fatal error Framework version, file system path, function names SQLSTATE DB type (MySQL/MSSQL/Oracle), query structure Java exception Package structure, internal class names, server type Python traceback Python version, library versions, file paths Ruby on Rails MVC structure, gem names, environment (dev/prod)
Task 13: Exposed Monitoring Infrastructure
Objective: Locate unauthenticated monitoring, logging, and CI/CD systems. Difficulty: ★★★★☆
intitle:"Grafana" inurl:":3000" site:*.example.com
intitle:"Kibana" inurl:":5601" site:*.example.com
intitle:"Jenkins" inurl:"/manage" site:*.example.com
intitle:"Prometheus" inurl:":9090" site:*.example.com
inurl:"/actuator" site:example.com # Spring Boot actuator
inurl:"/_cat/indices" site:example.com # Elasticsearch
intitle:"Jupyter Notebook" site:*.example.comWhy this is critical: Grafana with anonymous access exposes business metrics and internal service names. Jenkins with no auth allows arbitrary code execution on build agents. Kibana exposes raw log data that may contain credentials.
Tier 4 — Expert (Tasks 14–15)
Task 14: Full Subdomain & Infrastructure Profile
Objective: Build a complete infrastructure map of a target organization using passive techniques only. Difficulty: ★★★★★
This is a multi-phase exercise combining several techniques:
Phase 1: Subdomain Harvesting
site:*.example.com -site:www.example.com -site:mail.example.com
site:*.*.example.com # Third-level subdomainsPhase 2: Cloud Infrastructure
site:s3.amazonaws.com "example"
site:blob.core.windows.net "example"
site:azurewebsites.net "example"
site:*.amazonaws.com "example.com"Phase 3: Certificate Transparency (supplement)
Query crt.sh: https://crt.sh/?q=%.example.com for all issued TLS certificates.
Phase 4: Technology Stack
site:example.com intext:"Powered by" -intext:"Google"
site:example.com intext:"Built with" -intext:"love"
site:example.com ext:js inurl:"/static/" -inurl:".min.js" # Unminified JSPhase 5: Employee & Organizational OSINT
site:linkedin.com "example.com" "engineer" OR "developer" OR "DevOps"Deliverable: Complete infrastructure dossier with:
- All discovered subdomains and their likely functions
- Technology stack per property
- Cloud assets and their access levels
- Exposed services requiring remediation
- Risk-ranked finding list
Task 15: Advanced Persistent Dork — Automated Monitoring Rig
Objective: Build a dork-based monitoring system that alerts on new exposures. Difficulty: ★★★★★
This task moves beyond manual search into operational infrastructure. The goal: automate daily dorking and alert on new results.
Toolchain:
# Install requirements
pip install googlesearch-python requestsPython skeleton:
#!/usr/bin/env python3
"""
Dork Monitor — Passive OSINT Alert System
WARNING: Rate-limit your queries. Excessive querying violates Google ToS.
Use Google Custom Search API with proper API key for production.
"""
import hashlib
import json
import os
from datetime import datetime
DORK_LIST = [
'site:example.com filetype:env',
'site:example.com ext:sql intext:"INSERT INTO"',
'intitle:"index of" site:example.com',
'site:*.example.com -site:www.example.com',
'site:example.com intext:"Fatal error:"',
]
RESULTS_FILE = "dork_baseline.json"
def hash_results(results: list) -> str:
serialized = json.dumps(sorted(results), sort_keys=True)
return hashlib.sha256(serialized.encode()).hexdigest()
def load_baseline() -> dict:
if os.path.exists(RESULTS_FILE):
with open(RESULTS_FILE) as f:
return json.load(f)
return {}
def save_baseline(baseline: dict):
with open(RESULTS_FILE, "w") as f:
json.dump(baseline, f, indent=2)
def alert(dork: str, new_results: list):
timestamp = datetime.now().isoformat()
print(f"\n[!] ALERT [{timestamp}]")
print(f" Dork: {dork}")
print(f" New results detected:")
for r in new_results:
print(f" → {r}")
def run_dork(dork: str) -> list:
"""
Replace this with Google Custom Search API call.
DO NOT use scraping against Google in production.
API endpoint: https://customsearch.googleapis.com/customsearch/v1
"""
# Placeholder — implement with API key
return []
def main():
baseline = load_baseline()
for dork in DORK_LIST:
results = run_dork(dork)
current_hash = hash_results(results)
previous_hash = baseline.get(dork, {}).get("hash")
if current_hash != previous_hash:
previous_results = baseline.get(dork, {}).get("results", [])
new_items = [r for r in results if r not in previous_results]
if new_items:
alert(dork, new_items)
baseline[dork] = {
"hash": current_hash,
"results": results,
"last_checked": datetime.now().isoformat()
}
save_baseline(baseline)
print("[*] Dork monitor cycle complete.")
if __name__ == "__main__":
main()Production enhancements:
- Integrate with Google Custom Search API (
cxparameter + API key) - Push alerts to Slack webhook or email via SMTP
- Store results in SQLite for trend analysis
- Schedule with
cron(Linux) or Task Scheduler (Windows) - Add Shodan API integration for exposed service monitoring
7. Operational Security & Legal Boundaries
The Law
Dorking itself — querying a public search engine — is generally legal. However:
Action Legal Status Querying Google for public info ✅ Legal in most jurisdictions Accessing an exposed file via direct URL ⚠️ Legal grey area; governed by CFAA (US), CMA (UK), etc. Downloading exposed data you are not authorized to access ❌ Illegal under CFAA, GDPR, and equivalents Testing systems without written authorization ❌ Illegal
Key principle: Finding a URL via Google does not grant you permission to access, download, or interact with the resource. Viewing an accidentally public file may constitute unauthorized access under strict interpretations of CFAA §1030.
OPSEC for Researchers
When performing authorized assessments:
- Route through authorized infrastructure: Use your engagement's designated exit node or VPN. Don't dork from personal IPs.
- Log everything: Your engagement authorization, timestamps, query strings, and findings. Defense against legal blowback.
- Use Google Custom Search API: Reduces fingerprinting versus browser-based searches.
- Screenshot, don't download: For evidence documentation of sensitive files, screenshot the browser. Do not download files you lack authorization to access.
- Rotate user agents: Automated dorking tools should rotate User-Agent strings to avoid Google CAPTCHA blocks.
Responsible Disclosure
If you discover exposed sensitive files on a system you do not have authorization to test:
- Do not download or store the data
- Identify the correct contact:
security@domain.com, HackerOne/Bugcrowd program, or via WHOIS - Report with minimal detail initially: "I found an exposed configuration file at [URL]. It contains what appears to be database credentials."
- Give reasonable remediation time: 90 days is standard per coordinated vulnerability disclosure norms
- Document the discovery and your actions in case of legal questions
8. Further Resources
Essential Tools
Tool Purpose URL Google Hacking Database Pre-built dork library exploit-db.com/google-hacking-database Shodan IoT/service search engine shodan.io Censys Certificate/host search censys.io FOFA Chinese-market IoT search fofa.info ZoomEye Cyberspace mapping zoomeye.org crt.sh Certificate transparency crt.sh VirusTotal URL/domain intelligence virustotal.com theHarvester OSINT automation github.com/laramies/theHarvester SpiderFoot OSINT platform spiderfoot.net Maltego Visual link analysis maltego.com
Recommended Reading
- "Google Hacking for Penetration Testers" — Johnny Long, Bill Gardner, Justin Brown
- "The Web Application Hacker's Handbook" — Stuttard & Pinto
- "Open Source Intelligence Techniques" — Michael Bazzell (7th edition)
- PTES Technical Guidelines —
pentest-standard.org - OWASP Testing Guide —
owasp.org/www-project-web-security-testing-guide
Practice Platforms (Authorized Only)
HackTheBox.eu— Retired machine write-ups for OSINT practiceTryHackMe.com— Structured OSINT roomsVulnHub.com— Downloadable vulnerable VMsHackerOne/Bugcrowd— Real-world targets with legal authorizationOPSEC Labs— Build your own: deploy a misconfigured Docker stack locally
Closing Transmission
The hidden web is not hidden by encryption or access controls. It is hidden by assumption — the assumption that because something is hard to find, no one will find it. Dorking destroys that assumption with a search bar.
Every .env file sitting exposed in a web root is a developer who assumed their mistake would go unnoticed. Every database dump in a /backup directory is a DBA who assumed Google doesn't index file extensions. Every default-credential admin panel is a sysadmin who assumed obscurity was security.
Your job — as a red teamer, penetration tester, or security researcher — is to think like an adversary before an adversary does. Google is their first tool. It should be yours too.
Master the grammar. Query with precision. Report what you find.
The Dorking Manifesto is a living document. The Google Hacking Database grows daily. The exposed attack surface grows with it.
Until the next powerfull blog, Keep growing keep practising and BYee bYEe!!!!
Tags: #OSINT #RedTeam #Dorking #GoogleHacking #PenTest #InfoSec #Recon #BugBounty #CyberSecurity #PassiveRecon
© 2026— Red Team Operations Research | Credits: @rot-ig