Part 2 of 2 β How leakfix detects, classifies, and fixes secrets in Git using dual scanners, entropy analysis, and local LLMs β all with zero data leaving your machine.
Reading time: ~15 minutes | Tags: software-architecture, llm, security, python, open-source, developer-tools, ollama
Missed Part 1? Start there β 29 Million Secrets Leaked on GitHub: Detect Secret Leaks in AI-Generated Code (Part 1)
Why Build This?
The agentic AI era has fundamentally changed how code gets written. Tools like Claude Code, GitHub Copilot, and Cursor write entire modules in seconds β including, sometimes, real credentials.
GitGuardian's 2026 report showed that 29 million secrets were leaked on GitHub last year, with AI-assisted development showing roughly double the secret leak rate of manual coding. The existing tooling β Gitleaks, ggshield, GitGuardian β solves detection. Nobody had solved remediation.
leakfix was built to fill that gap. This article is about how β the design decisions, the architecture, the tradeoffs, and why a local LLM with 0.6 billion parameters can outperform a complex rule-based system for false positive classification.
The Core Design Principles
Before diving into architecture, it's worth naming the principles that shaped every design decision:
- Privacy-first: Your secrets (the very things you're trying to protect) should never leave your machine during analysis
- Accuracy over speed: False negatives (missing real secrets) are catastrophic; false positives (flagging test values) destroy developer trust
- Complete workflow: Detection β Classification β Remediation β Prevention in one tool
- Local-first, cloud-optional: Ollama for local; any OpenAI-compatible API as a fallback
- Build on proven tools: Don't reinvent secret detection β orchestrate Gitleaks and ggshield intelligently
High-Level Architecture (HLD)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β leakfix CLI v1.9.1 β
β (click + rich + textual + terminaltexteffects) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β TUI LAYER β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β wizard_app.py β Textual App (leakfix ui / leakfix setup) β β
β β ShimmerHeader (animated, full-width, Apple Intelligence style) β β
β β OptionList Β· ProgressBar Β· Buttons Β· Reactive state β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ui.py β Rich console output: banners Β· panels Β· gradient rules β β
β β 3-tier fallback: TTE animated β rich-gradient β Rich shimmer β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββ¦ββββββββββββββββββββββββ¦βββββββββββββββββββββββββββββ£
β SCAN LAYER β CLASSIFY LAYER β REMEDIATE LAYER β
β β β β
β ββββββββββββββββ β ββββββββββββββββββββ β ββββββββββββββββββββββ β
β β Scanner β β β Classifier β β β Fixer β β
β β β β β β β β β β
β β ββββββββββββ β β β Heuristics β β β File patching β β
β β β Gitleaks β β β β ββ Entropy β β β git-filter-repo β β
β β ββββββββββββ β β β ββ Patterns β β β History rewrite β β
β β ββββββββββββ β β β ββ File type β β ββββββββββββββββββββββ β
β β β ggshield β β β β β β β
β β ββββββββββββ β β β LLM Layer β β ββββββββββββββββββββββ β
β β β β β β ββ Ollama β β β Git History β β
β β Merge & β β β ββ OpenAI API β β β Rewriter β β
β β Deduplicate β β ββββββββββββββββββββ β ββββββββββββββββββββββ β
β ββββββββββββββββ β β β
β βββββββββββββββββββββββ©ββββββββββββββββββββββββ©βββββββββββββββββββββββββββββ£
β PREVENTION LAYER β
β β
β ββββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββ β
β β Pre-commit Hook β β Guard / Watcher β β .gitignore Mgr β β
β ββββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β REPORTING LAYER β
β β
β Reporter (HTML / JSON / Terminal / Rich tables) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β ORG / ENTERPRISE LAYER β
β β
β OrgScanner: GitHub API / GitLab API / Local directory sweep β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββEach layer has a single responsibility and communicates through well-defined data models. Let's go through each one.
Low-Level Design (LLD): Module by Module
1. The Data Model: Finding
Everything starts here. A Finding is the atomic unit of information that flows through the entire system:
@dataclass
class Finding:
secret_value: str # The actual leaked value
file: str # Relative path from repo root
line: int # Line number
commit: str # Git commit SHA (for history findings)
author: str # Author email
date: str # Commit date
rule_id: str # e.g., "aws-access-key", "github-pat"
entropy: float # Shannon entropy score
severity: str # "high" | "medium" | "low"
scanner: str # "gitleaks" | "ggshield" | "both"The scanner field is critical. When it's "both", the classifier can immediately confirm the finding without further analysis β dual scanner agreement is a strong signal.
2. The Scan Layer: Scanner
Scanner
βββ scan_working_directory() β Scan files on disk
βββ scan_staged() β Scan only staged files (pre-commit)
βββ scan_history() β Scan full git history
βββ scan_all() β Both above combined
β
βββ _run_gitleaks() β Subprocess: gitleaks detect/protect
βββ _run_ggshield() β Subprocess: ggshield secret scan
β
βββ _scan_with_both_scanners() β ThreadPoolExecutor (parallel)
βββ _merge_scanner_findings() β Deduplicate + mark "both"
β
βββ _filter_ignored() β Apply .leakfixignore patternsThe Dual-Scanner Parallel Architecture
with ThreadPoolExecutor(max_workers=2) as executor:
future_gitleaks = executor.submit(gitleaks_func)
future_ggshield = executor.submit(self._run_ggshield, ggshield_history)
for future in as_completed([future_gitleaks, future_ggshield]):
if future == future_gitleaks:
gitleaks_findings = future.result()
else:
ggshield_findings = future.result()
return self._merge_scanner_findings(gitleaks_findings, ggshield_findings)Both scanners run concurrently. The wall-clock time is max(gitleaks_time, ggshield_time) rather than sum.
Merging Strategy
The merge function normalizes rule names across scanners (e.g., github-pat vs GitHub Personal Access Token) and deduplicates by (file, line, normalized_rule_id). When both scanners agree, the merged finding is tagged scanner="both" β a high-confidence indicator.
Severity Derivation
Gitleaks doesn't provide severity natively. leakfix derives it:
HIGH_RULES = {
"generic-api-key", "aws-access-key", "github-pat",
"slack-token", "private-key", "openssh-private-key", ...
}
def _derive_severity(rule_id: str, entropy: float) -> str:
if rule_id.lower() in HIGH_RULES or entropy >= 4.5:
return "high"
if entropy >= 3.5:
return "medium"
return "low"Shannon entropy is the key signal here. Real secrets generated by services (AWS, GitHub, OpenAI) have high entropy because they're randomly generated. Placeholder values like CHANGEME or your-token-here have low entropy because they contain repeated patterns and readable words.
3. The Classification Layer: Classifier
This is the most sophisticated component. It's a 10-stage decision pipeline that combines fast heuristics with LLM-based contextual analysis:
Finding
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 0: Dual-scanner agreement? β
β β scanner == "both" β CONFIRMED (skip all other stages) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 1: File pattern check (O(n) string matching) β
β β .example, .sample, .template, /docs/, /fixtures/ β
β β README, CONTRIBUTING, CHANGELOG β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 2: Known tool placeholder patterns (regex) β
β β "glpat-your-gitlab-personal-access-token" β
β β "ghp_" + 32 x's (GitHub placeholder) β
β β "sk-" + 32 x's (OpenAI placeholder) β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 3: Generic placeholder substrings (case-insensitive) β
β β "your-", "xxx", "placeholder", "changeme", "dummy" β
β β "replace-me", "todo", "insert-here" β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 4: Constant name detection β
β β regex: ^[A-Za-z0-9_]+$ with uppercase + underscore β
β β e.g., "SECRET_KEY_LABEL", "API_TOKEN_VALUE" β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 4b: Word-like value detection β
β β All alpha+digits, no symbols, entropy < 3.2 β
β β e.g., "password", "mysecret", "testvalue" β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 5: Shannon entropy threshold β
β β entropy < 3.0 β definitely not a real secret β
β β Real AWS keys: entropy ~3.8-4.5 β
β β "changeme123": entropy ~2.7 β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match (entropy β₯ 3.0)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 6: LLM with code context (if enabled) β
β β Load Β±15 lines around the secret β
β β Load first 5 lines of file (header/docstring context) β
β β Build structured prompt with metadata β
β β Call Ollama or OpenAI-compatible API β
β β Parse REAL/PLACEHOLDER verdict β
β β Returns: CONFIRMED or LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM returned REVIEW_NEEDED or not enabled
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 7: Comment line context β
β β Is the secret on a comment line? (#, //, /*, etc.) β
β β Returns: REVIEW_NEEDED β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 8: Test directory check β
β β /test, /spec, /mock, /fixture, /stub in path β
β β Returns: LIKELY_FALSE_POSITIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 9: Medium entropy without LLM β
β β entropy < 3.5 and LLM not enabled β
β β Returns: REVIEW_NEEDED β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β No match (entropy β₯ 3.5, all heuristics clear)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 10: Default β
β β High entropy + no placeholder patterns found β
β β Returns: CONFIRMED π¨ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββWhy This Order?
The pipeline is ordered by computational cost (cheapest first) and confidence (highest confidence signals first). Stage 0 (dual scanner) and Stage 1β5 (pure string operations) are O(1) or O(n) operations that run in microseconds. The LLM call (Stage 6) involves a network round-trip to a local server and adds ~100ms. By deferring the LLM to Stage 6, we only invoke it for genuinely ambiguous cases.
Parallel Classification
For large scan results (> 3 findings), classifications run in parallel:
with ThreadPoolExecutor(max_workers=4) as executor:
future_to_idx = {
executor.submit(self.classify_finding, f, llm_enabled): i
for i, f in enumerate(findings)
}
for future in as_completed(future_to_idx):
idx = future_to_idx[future]
results[idx] = future.result()With 4 workers and an Ollama response time of ~100ms per finding, classifying 20 findings takes ~500ms instead of ~2000ms.
4. The LLM Integration Layer
This is where leakfix diverges most sharply from existing tools.
The Prompt Architecture
The classification prompt is carefully structured to elicit a consistent, parseable response:
You are a senior security engineer reviewing code for leaked credentials.
## Secret Details
- Detected value: `{secret_value}` (masked to first 7 chars + ***)
- Secret type (gitleaks rule): {rule_id}
- File: {file_path}
- Entropy score: {entropy:.2f} (real secrets typically > 4.5)
## File Header (first 5 lines):
{file_header}
## Code Context (Β±15 lines around the secret):
{context_lines}
## Classification Rules
[...explicit rules for REAL vs PLACEHOLDER...]
## Decision
Answer with EXACTLY one word on the first line: REAL or PLACEHOLDER
Then: "Reason: <one sentence>"Three design choices here matter a lot:
- Constrained output format: Asking for
REALorPLACEHOLDERas the first word makes parsing trivial and prevents the model from giving wishy-washy responses. - Context window management: Β±15 lines gives enough context to understand the code's purpose without exceeding the tiny models' context limits.
- Temperature = 0: For classification tasks, we want the most deterministic answer. No creativity needed.
Multi-Provider Support
leakfix abstracts the LLM provider behind a routing function:
Config: llm_provider
β
ββ "ollama" β Client(host="http://localhost:11434")
β client.chat(model=model, messages=[...], temperature=0)
β
ββ "openai_compatible" β urllib.request (no dependencies)
POST {base_url}/chat/completions
Bearer {api_key}The OpenAI-compatible path uses Python's built-in urllib.request β no openai package required. This means any compatible endpoint works: LM Studio, Jan, LocalAI, Groq, Together AI, or actual OpenAI.
Why Local LLMs Work for This Task
The key insight that makes local LLMs viable here: secret classification is a low-complexity reasoning task.
You don't need GPT-4-level intelligence to determine that:
# .env.example file
API_KEY = "your-api-key-here" # Replace with your actual keyβ¦is a placeholder. A 0.6B parameter model like qwen3:0.6b handles this correctly in every test. The code context (surrounding lines, file name, comments) provides all the information needed.
This is the fundamental principle of task-model matching: use the smallest model that can reliably solve your specific task. For binary classification with structured context, tiny models excel.
5. The Fix Layer: Fixer
This is what makes leakfix unique. Once a secret is confirmed, the Fixer performs two operations:
Operation 1: File Patching
For secrets in the working directory:
- Load the file and locate the exact line
- Call the LLM with a fix prompt (or use a simple substitution rule)
- Replace the secret value with a safe placeholder (
CHANGE_ME, empty string, orREDACTED) - Write the file back atomically
The fix prompt is designed to be minimally invasive:
You are a security engineer. Return ONLY the safe replacement VALUE.
Rules:
- Make the MINIMAL change possible β only replace the secret value itself
- If SECRET is a default like ${VAR:-secret}, return empty string ""
- If SECRET is a variable like PASSWORD=secret, return empty string ""
- If SECRET is hardcoded like password = "secret", return "CHANGE_ME"
- If SECRET is an API key or token, return "CHANGE_ME"
- When in doubt, return "REDACTED"
Respond: {"replacement": "<value>", "reason": "<why>", "confident": true/false}The confident flag is important: if the LLM isn't sure its replacement is syntactically correct, it returns "REDACTED" as a safe fallback rather than breaking the code.
Operation 2: Git History Rewriting
For secrets buried in commit history:
leakfix fix --history
β
βΌ
Scan history β Find secret SHA + file + value
β
βΌ
Build git-filter-repo mailmap
β
βΌ
git-filter-repo --blob-callback "replace(secret_value, replacement)"
β
βΌ
Force-push rewritten history (with user confirmation)git-filter-repo is the Git project's own recommended tool for history rewriting β faster and safer than git filter-branch. leakfix generates the appropriate blob callback and coordinates the entire rewrite process.
β οΈ Architecture note: History rewriting changes all commit SHAs. For shared repositories, this requires force-pushing and team coordination. leakfix always prompts for confirmation before any destructive operation.
6. The TUI Layer: Apple Intelligence Design System
In v1.9.0, leakfix gained a full terminal UI β accessible via leakfix ui or leakfix setup. It's built on two modules:
ui.py β Rich Console Components
ui.py owns the stateless output layer: banners, panels, gradient rules, and progress displays. It defines the Apple Intelligence design system used across all CLI output:
# Apple Intelligence color palette
APPLE_PURPLE = "#BC82F3" # Primary brand / selected state
APPLE_BLUE_PURPLE = "#8D9FFF" # Accent / section labels
APPLE_PINK = "#F5B9EA" # Highlight / hover state
APPLE_WHITE = "#F5F5F7" # Primary text
APPLE_DIM = "#6E6E73" # Subtitles / hints
APPLE_SUCCESS = "#30D158" # Confirmations
APPLE_ERROR = "#FF6778" # Errors only
APPLE_DARK_BG = "#1C1C1E" # App backgroundThe print_banner() function has a three-tier fallback:
terminaltexteffectsanimated beams (if installed)rich-gradientstatic gradient- Plain Rich shimmer animation with
Liveβ always works with zero extra deps
wizard_app.py β Textual TUI Application
wizard_app.py is a Textual application β a reactive, event-driven terminal UI framework. After the v1.9.1 redesign, ShimmerHeader moved from a centered 60%-width panel to a full-width docked header in the style of Claude Code's terminal:
# Before (v1.9.0): centered, border all sides, 7 rows
ShimmerHeader {
width: 60%;
height: 7;
border: heavy #BC82F3;
content-align: center middle;
}
# After (v1.9.1): full-width, bottom border only, 5 rows
ShimmerHeader {
width: 100%;
height: 5;
background: #1C1C1E;
border-bottom: heavy #BC82F3; /* animates through purpleβblueβpink */
dock: top;
text-align: left;
}The bottom border cycles through GLOW_GRADIENT = [APPLE_PURPLE, APPLE_BLUE_PURPLE, APPLE_PINK, APPLE_VIOLET] at 20 fps via set_interval(1/20, self._tick).
The leakfix ui Command
leakfix ui # open the TUI wizard directly
leakfix setup # same TUI, focused on LLM configurationBoth launch LeakfixWizardApp, a Textual App[dict | None] that returns a config dict on exit. The wizard guides through dependency checks, model selection (qwen3:0.6b / llama3.2:3b / phi4), and download progress via async subprocess streaming. The Textual reactive system means all UI updates β including the animated border β are scheduled on the event loop without blocking the setup logic.
8. The Prevention Layer
Pre-Commit Hook
#!/bin/bash
# leakfix pre-commit hook
leakfix scan --staged --hook-mode [--smart]
exit_code=$?
if [ $exit_code -eq 1 ]; then
echo "β Commit blocked: secrets detected in staged files"
echo "Run: leakfix fix to remove them"
exit 1
fi
exit 0The --hook-mode flag enables a compact output format optimized for terminal display during commit. The --smart flag activates LLM filtering in the hook, reducing false positives that would otherwise block legitimate commits.
Guard Mode (File Watcher)
Watcher
β
βββ Uses: watchdog (cross-platform file system events)
β
βββ Watches for patterns:
β .env*, *.pem, *.key, *secret*, *credential*,
β *token*, *.p12, *.pfx, firebase*.json, *adminsdk*
β
βββ On CREATE or MODIFY:
β β Alert immediately
β β Log to ~/.leakfix/guard.log
β β (Optional) auto-add to .gitignore
β
βββ Daemon mode: ~/.leakfix/guard.pid for process managementGuard mode catches the problem before staging β as soon as a dangerous file appears on disk.
9. The Org Scanner
For enterprise-scale scanning:
OrgScanner
β
βββ scan_directory(path)
β β Find all .git repos under path (os.walk)
β β ThreadPoolExecutor(max_workers=parallel)
β β Clone each repo to temp dir, scan, aggregate
β
βββ scan_github(org, token)
β β GitHub REST API: /orgs/{org}/repos (paginated)
β β Filter by language, archived status, fork status
β β Clone each β scan β delete temp clone
β
βββ scan_gitlab(group, token, url)
β GitLab REST API: /groups/{group}/projects (paginated)
β Same clone-scan-delete patternAll clones are done to a temporary directory and cleaned up after scanning β no permanent storage of org code.
Data Flow: End-to-End
User: leakfix scan --smart
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLI Layer (cli.py) β
β - Parse args β
β - Check LLM setup β
β - Instantiate Scanner, Classifier, Reporter β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Scanner.scan_working_directory() β
β βββ ThreadPoolExecutor: gitleaks + ggshield in parallel β
β βββ Parse JSON output from each β
β βββ Normalize file paths (relative to repo root) β
β βββ Merge findings: mark "both" where both scanners agree β
β βββ Deduplicate by (file, line) β
β βββ Filter .leakfixignore patterns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β List[Finding]
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Classifier.classify_findings(findings, llm_enabled=True) β
β βββ ThreadPoolExecutor: 4 workers for parallel classificationβ
β βββ For each Finding: β
β β Run 10-stage pipeline β
β β Stage 6: Ollama(qwen3:0.6b) with code context β
β βββ Return List[ClassifiedFinding] β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β List[ClassifiedFinding]
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Reporter._format_smart_scan() β
β βββ Rich table: severity | file:line | type | scanner | cls β
β βββ Color coding: red=CONFIRMED, yellow=REVIEW, green=FP β
β βββ Summary: X confirmed, Y FP filtered, Z need review β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
User sees: prioritized, classified findingsThe Shannon Entropy Deep Dive
Entropy is the mathematical backbone of secret detection. Here's why it works:
Shannon entropy measures the randomness of a string:
H = -Ξ£ p(c) Γ logβ(p(c))
for each unique character cWhere p(c) is the probability of character c appearing.
String Entropy Notes
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
"aaaaaaaaa" 0.0 No randomness
"changeme" 2.81 Human-readable word
"your-api-key" 3.17 Placeholder text
"sk-abc123def456" 3.64 Borderline
"AKIAIOSFODNN7EXAMPLE" 3.82 AWS key format
"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" 4.21 Real-looking AWS secret
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." 4.58 JWT tokenleakfix uses threshold 3.0 for the "definitely not a secret" cutoff, and 3.5 for the "probably needs LLM review" zone. The LLM then handles everything in that 3.0β4.5 gray zone with full code context.
Why This Architecture Wins
vs. Pure Regex (Gitleaks standalone)
Gitleaks catches 90%+ of real secrets but also generates significant false positives on template files, documentation, and test fixtures. Without a classification layer, every finding looks equally urgent. Alert fatigue sets in.
vs. Cloud Classification (GitGuardian)
GitGuardian's secret validation sends findings to GitGuardian's cloud for analysis. For a tool designed to catch secret leaks, this is an uncomfortable irony β your secrets transit a third party's infrastructure during the detection process.
With leakfix + Ollama, the loop is completely closed: secrets are detected locally, analyzed locally, and fixed locally.
vs. Rule-Based Classification Alone
Pure heuristics (pattern matching + entropy) work well for obvious cases but fail on:
- Secrets with medium entropy (3.0β4.0) that are real but moderately random
- Template files with realistic-looking placeholder values
- Config files mixing real and example values on adjacent lines
The LLM's understanding of code context bridges this gap. It reads the surrounding code like a developer would: "this is an .env.example file, the comments say # copy to .env and fill in, the other values are all placeholders β so this one is too."
Configuration System
leakfix stores config in ~/.leakfix/config.json:
{
"llm_enabled": true,
"llm_provider": "ollama",
"llm_model": "qwen3:0.6b",
"llm_base_url": "http://localhost:11434",
"llm_api_key": "",
"setup_complete": true
}The setup_wizard.py handles:
- Dependency checking (gitleaks, git-filter-repo, ollama CLI, ollama Python package)
- Model selection and download via
ollama pull - Validation that the selected model can respond
Config is loaded at startup and cached β no repeated disk reads during classification.
Extending leakfix
Adding Custom Rules
leakfix uses an extended Gitleaks config (gitleaks-extended.toml) alongside the default rules. To add custom rules for your organization:
# .gitleaks.toml in your repo root (overrides the built-in config)
[[rules]]
id = "my-company-internal-token"
description = "Company internal service token"
regex = '''myco-[0-9a-f]{32}'''
keywords = ["myco-"]Adding Custom Ignore Patterns
Create .leakfixignore in your repo root:
# Ignore test fixtures
tests/fixtures/
tests/data/
# Ignore example configs
*.example
*.sample
config/Performance Characteristics
On a typical mid-sized repository (~50,000 lines, 500 files):
Mode Time Notes
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
scan (no LLM) ~2β5s Gitleaks only
scan (dual scanner) ~3β8s Parallel gitleaks + ggshield
scan --smart with qwen3:0.6b ~5β15s + LLM for uncertain findings
scan --history ~15β60s Depends on commit count
fix (working dir) ~5β30s Depends on finding count
fix --history ~30sβ10min Depends on repository sizeThe LLM overhead (Stage 6) is proportional to the number of findings that survive heuristics β typically 20β40% of raw findings in a codebase with test fixtures and documentation.
Lessons Learned
1. Tiny models are surprisingly capable for classification tasks
qwen3:0.6b correctly classifies 95%+ of test cases when given good code context. The structured prompt (constrained output, explicit rules, binary decision) is the key. General-purpose tasks need big models; focused classification tasks don't.
2. Parallelism matters more than you'd expect
Scanning a repository involves many subprocess calls (git, gitleaks, ggshield). Threading with ThreadPoolExecutor reduced wall-clock time by 40β60% versus sequential execution in tests.
3. The false positive problem is the hardest part
Getting accurate detection is mostly solved (gitleaks is excellent). Getting accurate classification β distinguishing a real AWS key from your-aws-key-here β is the hard part. The 10-stage pipeline with LLM fallback is the result of extensive iteration.
4. History rewriting requires serious UX care
git-filter-repo is powerful and destructive. Getting the confirmation flows right, the backup reminders, and the force-push warnings took as much effort as the underlying implementation.
What's Next
The architecture has room to grow:
- VSCode extension: Inline secret warnings as you type, powered by the same classifier
- CI/CD integration: GitHub Actions / GitLab CI native integration with PR comments
- Secret validity checking: Not just detection, but verifying if a found AWS key actually has active permissions
- Custom model fine-tuning: A leakfix-specific model trained on millions of real/fake secret examples
Already shipped in v1.9.1:
- β
Apple Intelligence TUI (
leakfix ui) β full-width animated header, Claude Code terminal aesthetic, Textual reactive architecture - β
ui.pydesign system β shared Rich components, gradient banners, 3-tier animation fallback
Contribute
leakfix is MIT-licensed and actively welcoming contributions. The codebase is clean Python 3.11+ with no complex dependencies:
leakfix/
cli.py # CLI entrypoint (click) β all commands including `ui`
scanner.py # Dual-scanner coordination (gitleaks + ggshield)
classifier.py # 10-stage classification pipeline
fixer.py # File patching + history rewriting
hooks.py # Pre-commit hook management
watcher.py # Guard mode file watcher
org_scanner.py # GitHub/GitLab/directory org scanning
reporter.py # Output formatting (rich, HTML, JSON)
setup_wizard.py # Interactive setup + dependency check
gitignore.py # .gitignore management
ui.py # Apple Intelligence design system (Rich banners, panels, shimmer)
wizard_app.py # Textual TUI β animated header, LLM setup wizard
utils.py # Shared utilitiesπ¦ GitHub: princebharti/gitleakfix π¦ PyPI: leakfix
Summary
leakfix's architecture is built on five core innovations:
- Dual-scanner parallelism β Gitleaks + ggshield run concurrently; agreement between scanners means high-confidence confirmation
- 10-stage classification pipeline β Fast heuristics first, LLM last; only invoke the expensive path when needed
- Local-first LLM with code context β 0.6B parameter models can classify secrets accurately when given structured context and constrained output format; secrets never leave the machine
- Integrated remediation β The only open-source tool that closes the loop from detection to fix, including git history rewriting
- Apple Intelligence TUI β A Textual reactive UI with animated shimmer headers, full-width Claude Code aesthetic, and a 3-tier graceful fallback design system in
ui.py
The combination makes leakfix not just another detection tool, but a complete secret management CLI for the era of agentic AI development.
Want to try it right now?
brew tap princebharti/tap
brew install gitleakfix
leakfix ui # interactive TUI wizard
leakfix scan --smart # or jump straight to scanningHave questions about the architecture? Open an issue or start a discussion on GitHub.
References: