The Hidden Fingerprints of Bot Protection: How Every Major Vendor Leaves Traces in Your Browser

Every time you load a webpage, a silent profiling session begins. Before you've read a single word, before you've clicked anything — the…

Dima Kynal

~10 min read · April 13, 2026 (Updated: April 13, 2026) · Free: Yes

Every time you load a webpage, a silent profiling session begins. Before you've read a single word, before you've clicked anything — the site has already made a judgment about you. Are you a human? A crawler? A credential-stuffing bot? A scraper?

The systems making that judgment don't advertise themselves. They're designed to be invisible. But invisibility isn't the same as leaving no trace. Every major bot protection vendor — Cloudflare, Akamai, PerimeterX, DataDome, Kasada, and a dozen others — leaves a distinct fingerprint in your browser: specific HTTP response headers, injected cookies, third-party script URLs, and JavaScript globals hanging off the window object.

If you know what to look for, you can read those fingerprints like a label on a product. This article documents what each vendor leaves behind and why — not as an evasion guide, but as an anatomy of how modern bot protection actually works at the protocol level.

Why Bot Protection Systems Leave Traces At All

You might wonder: if these systems are meant to be invisible, why leave any fingerprint at all?

The answer is that invisibility and functionality are in tension. A bot protection system needs to:

Set a session cookie so it can track behavioral signals across requests
Inject a JavaScript agent so it can collect client-side telemetry (mouse movement, keystroke timing, WebGL fingerprints)
Add response headers so its edge nodes can communicate with each other and with the origin server
Load third-party scripts from its own CDN to run challenge logic

Each of these requirements leaves something observable in the browser. The cookie has a name. The script comes from a domain. The header has a key. The global has a property name on window. Vendors minimize these traces — they use opaque cookie values, obfuscated scripts, and cryptic header names — but the metadata itself is hard to hide.

There's also a secondary reason: interoperability. When Akamai's edge nodes need to pass bot-scoring metadata to an origin server, they do it via headers. When Cloudflare's challenge system needs to communicate a clearance token to downstream infrastructure, it uses a cookie. These aren't security mistakes — they're architectural necessities that happen to be observable.

The Four Signal Types

Across all vendors, fingerprints cluster into four categories:

Response headers are the most reliable signal. They're set by the vendor's infrastructure before the response reaches the browser, and they're difficult to spoof from the client side. The naming conventions are often vendor-specific enough to be definitive: cf-ray only comes from Cloudflare, akamai-grn only comes from Akamai, x-kpsdk-ct only comes from Kasada.

Cookies are the most behavioral signal. They store session state, challenge clearance tokens, and encoded telemetry. They change across requests and often carry cryptographic signatures that tie them to a specific session. The names are stable and identifiable even when the values are opaque.

Script URLs reveal vendor relationships. A bot protection system's JavaScript agent has to be loaded from somewhere — usually the vendor's own CDN for integrity and update agility. The presence of challenges.cloudflare.com or client-api.arkoselabs.com in the page's script list is a reliable vendor identifier regardless of any other signals.

JavaScript globals are the most ephemeral but often the most revealing. When a vendor's script runs, it typically attaches state to the window object. bmak is Akamai. grecaptcha is Google. _pxAppId is PerimeterX. These globals expose not just the vendor's presence but sometimes their configuration — an _pxAppId value tells you the specific site's account ID with PerimeterX.

Vendor by Vendor

Cloudflare

Cloudflare is the most commonly encountered protection layer on the web, which means its fingerprints are the most familiar. The canonical signal is cf-ray — a unique identifier attached to every request that passes through Cloudflare's network. It looks like cf-ray: 8a1b2c3d4e5f6789-LHR, where the suffix is the edge datacenter that handled the request.

For bot management specifically, the key signals are the __cf_bm cookie (Bot Management behavioral token), cf_clearance (issued after a JS or CAPTCHA challenge is passed), and the _cfuvid cookie used for rate limiting. On the script side, anything loaded from challenges.cloudflare.com indicates active challenge logic — this is where Turnstile and the managed challenge system live.

Cloudflare's architecture means you often see it as a CDN and a bot protection layer simultaneously. cf-cache-status: HIT tells you Cloudflare is serving cached content; __cf_bm tells you it's also running bot scoring on that traffic.

Akamai Bot Manager

Akamai operates at a different tier — primarily enterprise, often financial services and e-commerce. Its bot management system is called Bot Manager, and it's built on a foundation of behavioral telemetry that's collected via an injected JavaScript agent.

The primary cookie is _abck. This is Akamai's workhorse: it accumulates encoded behavioral data across the session and is periodically refreshed as the client sends telemetry. The bm_sz cookie carries the initial sizing/configuration data. On the JavaScript side, bmak is the global object through which the telemetry agent operates — it collects mouse movements, keyboard timing, touch events, and device fingerprints and encodes them into sensor_data payloads sent back to Akamai's servers.

The header fingerprints are also distinctive: x-akamai-request-id, akamai-grn (Global Request Number), and x-akamai-edgescape (which can expose rich geolocation and network metadata about the visitor).

DataDome

DataDome positions itself as a real-time bot protection layer with sub-millisecond response times. Its integration typically involves a tag loaded from tags.datadome.co or dd.js, which activates a DataDome global on the window.

The primary cookie is datadome — a base64-encoded token that carries session classification data. The x-datadome-request response header indicates the result of DataDome's real-time analysis. DataDome also exposes a ddCaptcha global when a challenge needs to be triggered client-side, which makes it possible to detect not just the presence of DataDome but whether the current session is being challenged.

PerimeterX / HUMAN Security

PerimeterX was acquired by HUMAN Security in 2022, but the client-side fingerprints remain largely unchanged in production deployments. The cookie family is distinctive: _px, _px2, _px3, _pxvid, and _pxde form a layered session state, with each generation representing an evolution in the token format. _px3 in particular carries an HMAC-signed token.

On the header side, x-px-block-score appears on responses where PerimeterX has intervened — it carries the risk score that triggered the block. The _pxAppId global is particularly useful: it exposes the site's specific account identifier with PerimeterX, formatted as a string like PX1a2b3c4d.

Kasada

Kasada is less well-known than Cloudflare or Akamai but is the protection layer behind several major e-commerce and travel platforms. Its fingerprint is unusually distinctive: the company uses a specific UUID — 149e9513-01fa-4fb0-aad4-566afd725d1b — embedded in script paths, making detection almost trivially reliable from script URL inspection alone.

The headers are also unique: x-kpsdk-ct (client token), x-kpsdk-cd (client data, carrying an encoded device fingerprint), and x-kpsdk-v (SDK version). These headers are sent with every protected request, making Kasada one of the more header-verbose vendors.

Imperva Incapsula

Imperva (formerly Incapsula) generates a characteristic cookie family: incap_ses_* (session cookies with a numeric suffix tied to the Imperva account) and visid_incap_* (visitor ID cookies, similarly suffixed). These prefix patterns are reliable identifiers even without knowing the specific account numbers.

The x-iinfo response header is Imperva's internal routing metadata — it's present on nearly all Imperva-proxied traffic and contains encoded request classification data. More recent Imperva deployments also use a cookie called reese84, which is a JavaScript-generated token used in newer versions of their bot detection challenge.

AWS WAF

AWS WAF is the native web application firewall for AWS-hosted applications. Its bot control feature issues an aws-waf-token cookie after a browser integrity check passes. The x-amzn-waf-action header indicates the WAF decision: ALLOW, BLOCK, or COUNT. For applications using AWS API Gateway or Lambda, x-amzn-requestid and x-amzn-trace-id (the X-Ray trace ID) are also present, though these are infrastructure signals rather than bot-protection-specific.

The JavaScript integration appears via an awsWafIntegration or AwsWafIntegration global, depending on the SDK version.

F5 BIG-IP / Shape Security

F5 acquired Shape Security in 2020, bringing with it one of the most sophisticated JavaScript obfuscation systems in the industry. Shape's protection is often characterized by a heavily obfuscated script that changes on every deployment — the actual code is nearly impossible to statically analyze. Detection must therefore rely on the cookie and header signals rather than script content.

The TS01* cookie family (named after BIG-IP's persistence cookie convention) and BIGipServer* cookies are load balancer artifacts that appear alongside Shape's bot protection layer. The x-sh-pointer header is a Shape-specific telemetry marker.

Arkose Labs (FunCaptcha)

Arkose Labs (formerly Funcaptcha) takes a different approach: instead of invisible bot scoring, it presents users with interactive visual puzzles — 3D object rotation, image matching, audio questions. The philosophy is that making attacks expensive in human time is more durable than trying to detect automation.

The primary detection signal is the script endpoint: client-api.arkoselabs.com for the challenge API, funcaptcha.com for the legacy domain (still active). The Arkose session script is served from a path containing dapib and is cryptographically tied to the session — it must be executed to compute a response parameter called tguess, which means bots can't skip running the JavaScript.

Arkose is used by Microsoft (account creation), Roblox, and a growing number of fintech platforms.

Google reCAPTCHA

reCAPTCHA is the most ubiquitous CAPTCHA system in the world. The v3 variant is invisible — it runs continuously in the background, scoring user behavior without presenting any challenge. The grecaptcha global is the public API surface, while _grecaptcha holds internal state. The NID cookie is a Google identity cookie that reCAPTCHA uses for cross-site scoring continuity.

Script URLs are reliable identifiers: google.com/recaptcha/api.js for v2/v3 and google.com/recaptcha/enterprise.js for the Enterprise tier.

hCaptcha

hCaptcha emerged as a privacy-focused alternative to reCAPTCHA and became the default CAPTCHA for Cloudflare's older challenge pages, Discord, and many other high-traffic platforms. Its fingerprint is straightforward: scripts from hcaptcha.com and newassets.hcaptcha.com, with a hcaptcha global providing the public API.

The hc_accessibility cookie stores the user's accessibility preferences (enabling audio challenges), which is a reliable hCaptcha indicator even when the challenge widget isn't immediately visible.

GeeTest

GeeTest is the dominant CAPTCHA and bot protection provider across Asia-Pacific markets and is increasingly deployed on globally-facing platforms. It's known for its slider puzzles (drag a piece into a gap) and icon-click challenges. The v3 and v4 APIs each inject their own initialization functions: initGeetest and initGeetest4 respectively. These globals are reliable detection signals.

Static assets load from static.geetest.com, and the challenge API runs through api.geetest.com. GeeTest cookies typically use a geetest_ prefix.

Fastly (+ Signal Sciences)

Fastly is primarily a CDN, but after acquiring Signal Sciences in 2020 it also offers WAF and bot management capabilities. The CDN signals — x-fastly-request-id, x-served-by (which reveals the specific edge POP), x-timer (request timing trace), and x-cache — are present on all Fastly-backed traffic. The WAF signals — _sigsci global, signalsciences.net script URLs — only appear when Signal Sciences is actively deployed.

This dual identity means Fastly requires at least two corroborating signals before classifying it as a security layer rather than just a CDN.

Radware Bot Manager

Radware Bot Manager (formerly ShieldSquare) uses an Intent-based Deep Behavior Analysis engine and deploys via a JavaScript tag. Legacy deployments still surface SUID cookies and shieldsquare.com script references — a reliable indicator of integrations that predate the Radware rebrand. Newer deployments reference radware.com and sdp.radware.com. The sdp global (Security Defense Platform) is the primary JavaScript object.

Netacea

Netacea is architecturally the most unusual vendor on this list. It's a server-side-only bot detection system with zero client-side JavaScript injection — a deliberate design choice that makes it invisible to attackers who rely on client-side reverse engineering. Detection from the browser must rely entirely on response headers (x-netacea-info) and any cookies set by the server (_nt, _ntv). The empty scripts and globals arrays aren't an oversight — they're a product feature.

What Stacked Protection Looks Like

In practice, bot protection is rarely a single vendor. Major e-commerce and fintech sites frequently run two or three layers simultaneously:

Cloudflare as CDN (always-present cf-ray) + DataDome as bot manager (DataDome tag loaded, datadome cookie set). The CDN handles caching and DDoS; DataDome handles behavioral bot detection.
Akamai as edge CDN (x-akamai-request-id) + Arkose Labs on sensitive endpoints (login, signup, checkout). The CDN carries the traffic; Arkose fires only when a high-risk action is attempted.
Fastly as CDN + AWS WAF as the rule engine. The Fastly headers (x-served-by) reveal the CDN layer; aws-waf-token reveals the WAF layer underneath.

Stacking is common because no single vendor covers all threat surfaces equally well. Cloudflare is excellent at edge-level filtering but lighter on behavioral analysis. DataDome is strong on behavioral scoring but isn't a CDN. Arkose is uniquely good at defending against human fraud farms but is too friction-heavy for all traffic. Organizations mix and match based on their threat model.

The Arms Race Underneath

The fingerprints documented here are a snapshot of an ongoing arms race. Bot operators read the same signals and use them to detect detection — knowing which protection system is running tells an attacker which evasion playbook to reach for. Vendors respond by rotating cookie names, obfuscating script content, shortening header values, and moving signals from client-visible locations to server-side channels.

This is why Netacea's server-side-only approach is philosophically interesting: by eliminating client-side footprints entirely, it denies attackers the reconnaissance step. But it comes with a tradeoff — without client-side telemetry, the behavioral signals are less rich and the detection surface narrows to request metadata.

The equilibrium that's emerged is roughly this: vendors minimize their observable fingerprint while maximizing their collected signal. The cookie name is opaque, but the cookie value contains megabytes of encoded telemetry. The script URL is generic, but the script itself performs thousands of browser API calls to build a device fingerprint. Less to identify from the outside; more to analyze on the inside.

Understanding these fingerprints doesn't require any special tooling — DevTools is enough. Response headers are in the Network tab. Cookies are in Application. Globals are in the Console. Script URLs are in Sources. The information has always been there. Knowing what it means is the only skill required.

Interested in automating this detection? The Bot Shield Detector Firefox extension does exactly that — intercepting HTTP signals, scanning DOM-level footprints, and using AI to reason about what it finds. The original write-up on how it was built is here.

#web-security #javascript #web-scraping #cybersecurity #software-development