When a file is committed, the repository writes a blob containing the file's content, a tree object referencing that blob, and a commit object referencing the tree. Git's object model is append-only. Those objects do not get overwritten. A subsequent commit that removes the file creates new objects alongside the original ones. The credential remains in the history, reachable through the reflog, present in every clone that existed before any rewrite.
This is the mechanical fact that determines everything else about where a scanner should run.
In practice, the path to a committed credential is shorter than most post-incident reviews suggest. A developer testing against a cloud sandbox needs real credentials to get real responses. Those credentials live in environment variables, configuration files, or hardcoded in the code under test. The .gitignore entry that should exclude the config file was written for a previous directory layout. The git add . stages everything in scope. The commit happens.
Three seconds after the push, the credential is in the CI runner's working copy. Twelve seconds later, it's in the clone the contractor pulls for the code review. By the time a post-push scanner fires and opens an alert, the blast radius is already unknown — which means remediation begins with investigation, not rotation.
Rotating the credential closes the exposure, but it does not close the incident. The next question is always where the credential propagated before rotation, and that question gets harder to answer the longer the scanner waited. The Internet Archive learned this in October 2024: API keys leaked from a GitLab breach were not rotated after the initial compromise was discovered. Two weeks later, attackers used those same keys to access the organisation's Zendesk platform and exfiltrate over 800,000 support tickets. The attackers noted the failure directly in a mass email to affected users:
"It's dispiriting to see that even after being made aware of the breach 2 weeks ago, IA [Internet Archive] has still not done the due diligence of rotating many of the API keys that were exposed."
The rotation that would have been a ten-minute remediation became the evidence in a second breach notification.
Between February and March 2026, a misconfigured GitHub Actions workflow in the Trivy repository exposed a Personal Access Token. Aqua Security detected the breach and rotated credentials, but the rotation was incomplete. Three weeks later, attackers used the credentials that survived to impersonate legitimate maintainers, push a backdoored binary through the official release pipeline, and install an infostealer on the workstations of every developer who pulled the compromised version. By the time the second incident was identified, the question was no longer which credential was exposed — but how many developer machines had already exfiltrated data.
The Signal Problem
Source code contains high-entropy strings for reasons unrelated to credentials: compiled assets, hashes, UUIDs, encoded configuration values, test fixtures. A regular expression matching the structural pattern of an AWS key will also match strings that share that structure but were never credentials. DB_PASSWORD=placeholder and DB_PASSWORD=aK93$mNxQ... look identical at the variable name level.
Broad pattern matching produces a scanner that fires frequently. A team that deploys it accumulates a backlog of dismissed alerts within weeks. The dismissal isn't careless since the developer scanned the list, recognised the noise, moved on. When an actual credential surfaces in the same output, it gets processed the same way. This failure mode has a name in every other detection context. At the SIEM level it's alert fatigue. At the EDR level it's detection noise. At the commit level the compressed timeline makes it worse: the developer's decision window is seconds, not hours.
Matching a pattern and then scoring the matched value against Shannon entropy works well for one category of secret and fails structurally for another. A secret generated by a cryptographically secure pseudorandom number generator — an AWS access key, an OAuth token, a JWT signing secret — has an expected entropy approaching the theoretical maximum for its character set. No human-constructed placeholder gets consistently close to that value. The entropy filter separates them reliably.
Infrastructure passwords do not behave the same way. admin123 and Tr0ub4dor&3 are both production credentials. Both score low enough on Shannon entropy to be discarded by a uniform threshold. The same threshold that keeps noise out of tokenclass detection keeps genuine credentialclass secrets out of the results — not because the threshold is miscalibrated, but because the two populations overlap in the entropy dimension by design. No single threshold resolves that. Raising it increases false negatives on credentialclass; lowering it floods the output with false positives.
The implication for scanner design is that entropy should gate tokenclass patterns and be absent from credentialclass patterns entirely. A password= assignment needs no entropy gate — the signal is the variable name and the non-empty value in a known credential namespace, not the statistical properties of the string that follows.
That number represents something specific operationally. A scanner with a low false-positive rate trains developers to take its output seriously. When it fires, it is almost always right. That reputation is built over months of consistent signal, and it is the actual mechanism by which the control functions — not the detection logic, not the deployment configuration. The scanner works because developers read it. Developers read it because it has earned that.
Where to Intercept
A pre-commit hook runs before the commit object is written — before the credential enters the persistent object graph, before it propagates to any runner or clone. Remediation is local: rotate, restage, recommit. The blast radius at this stage is exactly one developer's working copy.
The hook can be bypassed with --no-verify. When it can, it will be. Basak's [et al.] industrial study found that developers bypassed tool warnings for both confirmed secrets and false positives — the bypass behaviour was not selective. A scanner that fires with equal urgency on a placeholder in a test fixture and an active token in a CI configuration file trains developers to treat the interrupt as noise. The bypass starts deliberate and becomes reflexive.
This is where exposure context matters as a signal. A credential found in a CI configuration file compromises the pipeline execution environment and every secret the pipeline can reach. The same pattern found in a test fixture carries a contained blast radius and lower operational urgency. Collapsing both into the same binary output — blocked or not blocked — optimises for neither. It generates friction on low-risk findings and fails to communicate the stakes on high-risk ones.
A CI gate at the PR stage catches what slipped past the hook, at the cost of the credential already being committed and propagating to every runner and reviewer that touched the branch before the alert fired. Post-merge scanning runs later still, against a surface that cleared both prior controls; developer context for what happened may be hours old and blast radius is unknown before investigation completes.
Running all three is defensible, considering the cost at each stage is not equivalent. Pre-commit remains the only stage where the catch is genuinely preventive — but only if the scanner earns its interruptions consistently enough that --no-verify stays a deliberate choice rather than a habit.
The scanner that fires at git commit is not doing more of what a CI gate does. It is doing something categorically different.
Placement determines when intervention is possible. Precision determines whether it happens. A scanner positioned at the commit boundary but generating high false positive rates on credentialclass patterns — infrastructure passwords, LDAP bind credentials, IaC secrets that sit below the entropy threshold — produces a bypass rate that compounds over time into a control that exists on paper and fails in practice. The two conditions are not independent. Neither is sufficient without the other.
Further Reading
On the mechanics of secret leakage in public repositories — scale, timing, and primary vectors: Meli, McNiece & Reaves (2019) — How Bad Can It Git? Characterizing Secret Leakage in Public Repositories (NDSS) → https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git-characterizing-secret-leakage-in-public-repositories/
On developer bypass behaviour and the operational cost of false positives: Basak, Reaves & Williams (2022) — Why Secret Detection Tools Are Not Enough → https://link.springer.com/article/10.1007/s10664-021-10109-y
On the Internet Archive breach sequence — unrotated tokens and second compromise: BleepingComputer (October 2024) — Internet Archive Breached Again Through Stolen Access Tokens → https://www.bleepingcomputer.com/news/security/internet-archive-breached-again-through-stolen-access-tokens/
On the Trivy supply chain compromise — incomplete rotation, credential reuse, and cascading impact: Wiz Research (March 2026) — Trivy Compromised by TeamPCP → https://www.wiz.io/blog/trivy-compromised-teampcp-supply-chain-attack
On Git's object model — the persistence properties referenced in this article: Chacon & Straub — Pro Git, Chapter 10: Git Internals (free) → https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
André "Hadnu" Ataíde is a security analyst with a focus on threat intelligence and application security. He builds open-source security tooling at github.com/had-nu.