🧠 Building a Local-First AI Security Tool: Detecting & Fixing Git Secrets (Architecture Behind…

Part 2 of 2 — How leakfix detects, classifies, and fixes secrets in Git using dual scanners, entropy analysis, and local LLMs — all with…

Prince Bharti

~16 min read · April 13, 2026 (Updated: April 13, 2026) · Free: Yes

Part 2 of 2 — How leakfix detects, classifies, and fixes secrets in Git using dual scanners, entropy analysis, and local LLMs — all with zero data leaving your machine.

Reading time: ~15 minutes | Tags: software-architecture, llm, security, python, open-source, developer-tools, ollama

Missed Part 1? Start there → 29 Million Secrets Leaked on GitHub: Detect Secret Leaks in AI-Generated Code (Part 1)

Why Build This?

The agentic AI era has fundamentally changed how code gets written. Tools like Claude Code, GitHub Copilot, and Cursor write entire modules in seconds — including, sometimes, real credentials.

GitGuardian's 2026 report showed that 29 million secrets were leaked on GitHub last year, with AI-assisted development showing roughly double the secret leak rate of manual coding. The existing tooling — Gitleaks, ggshield, GitGuardian — solves detection. Nobody had solved remediation.

leakfix was built to fill that gap. This article is about how — the design decisions, the architecture, the tradeoffs, and why a local LLM with 0.6 billion parameters can outperform a complex rule-based system for false positive classification.

The Core Design Principles

Before diving into architecture, it's worth naming the principles that shaped every design decision:

Privacy-first: Your secrets (the very things you're trying to protect) should never leave your machine during analysis
Accuracy over speed: False negatives (missing real secrets) are catastrophic; false positives (flagging test values) destroy developer trust
Complete workflow: Detection → Classification → Remediation → Prevention in one tool
Local-first, cloud-optional: Ollama for local; any OpenAI-compatible API as a fallback
Build on proven tools: Don't reinvent secret detection — orchestrate Gitleaks and ggshield intelligently

High-Level Architecture (HLD)

╔══════════════════════════════════════════════════════════════════════════╗
║                       leakfix CLI  v1.9.1                                ║
║              (click + rich + textual + terminaltexteffects)              ║
╠═══════════════════════════════════════════════════════════════════════════╣
║                           TUI LAYER                                       ║
║                                                                           ║
║   ┌──────────────────────────────────────────────────────────────────┐   ║
║   │  wizard_app.py — Textual App (leakfix ui / leakfix setup)        │   ║
║   │    ShimmerHeader (animated, full-width, Apple Intelligence style) │   ║
║   │    OptionList · ProgressBar · Buttons · Reactive state           │   ║
║   └──────────────────────────────────────────────────────────────────┘   ║
║   ┌──────────────────────────────────────────────────────────────────┐   ║
║   │  ui.py — Rich console output: banners · panels · gradient rules  │   ║
║   │    3-tier fallback: TTE animated → rich-gradient → Rich shimmer  │   ║
║   └──────────────────────────────────────────────────────────────────┘   ║
╠══════════════════════╦═══════════════════════╦════════════════════════════╣
║    SCAN LAYER        ║   CLASSIFY LAYER       ║    REMEDIATE LAYER        ║
║                      ║                        ║                           ║
║  ┌──────────────┐    ║  ┌──────────────────┐  ║  ┌────────────────────┐  ║
║  │  Scanner     │    ║  │   Classifier     │  ║  │      Fixer         │  ║
║  │              │    ║  │                  │  ║  │                    │  ║
║  │ ┌──────────┐ │    ║  │ Heuristics       │  ║  │ File patching      │  ║
║  │ │ Gitleaks │ │    ║  │   ├─ Entropy     │  ║  │ git-filter-repo    │  ║
║  │ └──────────┘ │    ║  │   ├─ Patterns    │  ║  │ History rewrite    │  ║
║  │ ┌──────────┐ │    ║  │   └─ File type   │  ║  └────────────────────┘  ║
║  │ │ ggshield │ │    ║  │                  │  ║                           ║
║  │ └──────────┘ │    ║  │ LLM Layer        │  ║  ┌────────────────────┐  ║
║  │      ↓       │    ║  │   ├─ Ollama      │  ║  │  Git History       │  ║
║  │   Merge &    │    ║  │   └─ OpenAI API  │  ║  │  Rewriter          │  ║
║  │  Deduplicate │    ║  └──────────────────┘  ║  └────────────────────┘  ║
║  └──────────────┘    ║                        ║                           ║
╠══════════════════════╩═══════════════════════╩════════════════════════════╣
║                         PREVENTION LAYER                                  ║
║                                                                            ║
║   ┌──────────────────┐   ┌────────────────────┐   ┌──────────────────┐   ║
║   │  Pre-commit Hook │   │   Guard / Watcher  │   │ .gitignore Mgr   │   ║
║   └──────────────────┘   └────────────────────┘   └──────────────────┘   ║
╠════════════════════════════════════════════════════════════════════════════╣
║                          REPORTING LAYER                                   ║
║                                                                            ║
║           Reporter (HTML / JSON / Terminal / Rich tables)                  ║
╠════════════════════════════════════════════════════════════════════════════╣
║                    ORG / ENTERPRISE LAYER                                  ║
║                                                                            ║
║      OrgScanner: GitHub API / GitLab API / Local directory sweep          ║
╚════════════════════════════════════════════════════════════════════════════╝

Each layer has a single responsibility and communicates through well-defined data models. Let's go through each one.

Low-Level Design (LLD): Module by Module

1. The Data Model: Finding

Everything starts here. A Finding is the atomic unit of information that flows through the entire system:

@dataclass
class Finding:
    secret_value: str    # The actual leaked value
    file: str            # Relative path from repo root
    line: int            # Line number
    commit: str          # Git commit SHA (for history findings)
    author: str          # Author email
    date: str            # Commit date
    rule_id: str         # e.g., "aws-access-key", "github-pat"
    entropy: float       # Shannon entropy score
    severity: str        # "high" | "medium" | "low"
    scanner: str         # "gitleaks" | "ggshield" | "both"

The scanner field is critical. When it's "both", the classifier can immediately confirm the finding without further analysis — dual scanner agreement is a strong signal.

2. The Scan Layer: Scanner

Scanner
  ├── scan_working_directory()     → Scan files on disk
  ├── scan_staged()                → Scan only staged files (pre-commit)
  ├── scan_history()               → Scan full git history
  ├── scan_all()                   → Both above combined
  │
  ├── _run_gitleaks()              → Subprocess: gitleaks detect/protect
  ├── _run_ggshield()              → Subprocess: ggshield secret scan
  │
  ├── _scan_with_both_scanners()   → ThreadPoolExecutor (parallel)
  ├── _merge_scanner_findings()    → Deduplicate + mark "both"
  │
  └── _filter_ignored()           → Apply .leakfixignore patterns

The Dual-Scanner Parallel Architecture

with ThreadPoolExecutor(max_workers=2) as executor:
    future_gitleaks = executor.submit(gitleaks_func)
    future_ggshield = executor.submit(self._run_ggshield, ggshield_history)
    for future in as_completed([future_gitleaks, future_ggshield]):
        if future == future_gitleaks:
            gitleaks_findings = future.result()
        else:
            ggshield_findings = future.result()
return self._merge_scanner_findings(gitleaks_findings, ggshield_findings)

Both scanners run concurrently. The wall-clock time is max(gitleaks_time, ggshield_time) rather than sum.

Merging Strategy

The merge function normalizes rule names across scanners (e.g., github-pat vs GitHub Personal Access Token) and deduplicates by (file, line, normalized_rule_id). When both scanners agree, the merged finding is tagged scanner="both" — a high-confidence indicator.

Severity Derivation

Gitleaks doesn't provide severity natively. leakfix derives it:

HIGH_RULES = {
    "generic-api-key", "aws-access-key", "github-pat",
    "slack-token", "private-key", "openssh-private-key", ...
}
def _derive_severity(rule_id: str, entropy: float) -> str:
    if rule_id.lower() in HIGH_RULES or entropy >= 4.5:
        return "high"
    if entropy >= 3.5:
        return "medium"
    return "low"

Shannon entropy is the key signal here. Real secrets generated by services (AWS, GitHub, OpenAI) have high entropy because they're randomly generated. Placeholder values like CHANGEME or your-token-here have low entropy because they contain repeated patterns and readable words.

3. The Classification Layer: Classifier

This is the most sophisticated component. It's a 10-stage decision pipeline that combines fast heuristics with LLM-based contextual analysis:

Finding
  │
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 0: Dual-scanner agreement?                                │
│    → scanner == "both"  →  CONFIRMED (skip all other stages)   │
└─────────────────────────────────────────────────────────────────┘
  │ No
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 1: File pattern check (O(n) string matching)             │
│    → .example, .sample, .template, /docs/, /fixtures/          │
│    → README, CONTRIBUTING, CHANGELOG                           │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 2: Known tool placeholder patterns (regex)               │
│    → "glpat-your-gitlab-personal-access-token"                 │
│    → "ghp_" + 32 x's (GitHub placeholder)                     │
│    → "sk-" + 32 x's (OpenAI placeholder)                      │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 3: Generic placeholder substrings (case-insensitive)     │
│    → "your-", "xxx", "placeholder", "changeme", "dummy"        │
│    → "replace-me", "todo", "insert-here"                       │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 4: Constant name detection                               │
│    → regex: ^[A-Za-z0-9_]+$ with uppercase + underscore       │
│    → e.g., "SECRET_KEY_LABEL", "API_TOKEN_VALUE"               │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 4b: Word-like value detection                            │
│    → All alpha+digits, no symbols, entropy < 3.2               │
│    → e.g., "password", "mysecret", "testvalue"                 │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 5: Shannon entropy threshold                             │
│    → entropy < 3.0 → definitely not a real secret              │
│    → Real AWS keys: entropy ~3.8-4.5                           │
│    → "changeme123": entropy ~2.7                               │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match (entropy ≥ 3.0)
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 6: LLM with code context (if enabled)                    │
│    → Load ±15 lines around the secret                          │
│    → Load first 5 lines of file (header/docstring context)     │
│    → Build structured prompt with metadata                     │
│    → Call Ollama or OpenAI-compatible API                      │
│    → Parse REAL/PLACEHOLDER verdict                            │
│    → Returns: CONFIRMED or LIKELY_FALSE_POSITIVE               │
└─────────────────────────────────────────────────────────────────┘
  │ LLM returned REVIEW_NEEDED or not enabled
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 7: Comment line context                                  │
│    → Is the secret on a comment line? (#, //, /*, etc.)        │
│    → Returns: REVIEW_NEEDED                                     │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 8: Test directory check                                  │
│    → /test, /spec, /mock, /fixture, /stub in path              │
│    → Returns: LIKELY_FALSE_POSITIVE                             │
└─────────────────────────────────────────────────────────────────┘
  │ No match
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 9: Medium entropy without LLM                            │
│    → entropy < 3.5 and LLM not enabled                        │
│    → Returns: REVIEW_NEEDED                                     │
└─────────────────────────────────────────────────────────────────┘
  │ No match (entropy ≥ 3.5, all heuristics clear)
  ▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 10: Default                                              │
│    → High entropy + no placeholder patterns found              │
│    → Returns: CONFIRMED 🚨                                      │
└─────────────────────────────────────────────────────────────────┘

Why This Order?

The pipeline is ordered by computational cost (cheapest first) and confidence (highest confidence signals first). Stage 0 (dual scanner) and Stage 1–5 (pure string operations) are O(1) or O(n) operations that run in microseconds. The LLM call (Stage 6) involves a network round-trip to a local server and adds ~100ms. By deferring the LLM to Stage 6, we only invoke it for genuinely ambiguous cases.

Parallel Classification

For large scan results (> 3 findings), classifications run in parallel:

with ThreadPoolExecutor(max_workers=4) as executor:
    future_to_idx = {
        executor.submit(self.classify_finding, f, llm_enabled): i
        for i, f in enumerate(findings)
    }
    for future in as_completed(future_to_idx):
        idx = future_to_idx[future]
        results[idx] = future.result()

With 4 workers and an Ollama response time of ~100ms per finding, classifying 20 findings takes ~500ms instead of ~2000ms.

4. The LLM Integration Layer

This is where leakfix diverges most sharply from existing tools.

The Prompt Architecture

The classification prompt is carefully structured to elicit a consistent, parseable response:

You are a senior security engineer reviewing code for leaked credentials.
## Secret Details
- Detected value: `{secret_value}`  (masked to first 7 chars + ***)
- Secret type (gitleaks rule): {rule_id}
- File: {file_path}
- Entropy score: {entropy:.2f}  (real secrets typically > 4.5)
## File Header (first 5 lines):
{file_header}
## Code Context (±15 lines around the secret):
{context_lines}
## Classification Rules
[...explicit rules for REAL vs PLACEHOLDER...]
## Decision
Answer with EXACTLY one word on the first line: REAL or PLACEHOLDER
Then: "Reason: <one sentence>"

Three design choices here matter a lot:

Constrained output format: Asking for REAL or PLACEHOLDER as the first word makes parsing trivial and prevents the model from giving wishy-washy responses.
Context window management: ±15 lines gives enough context to understand the code's purpose without exceeding the tiny models' context limits.
Temperature = 0: For classification tasks, we want the most deterministic answer. No creativity needed.

Multi-Provider Support

leakfix abstracts the LLM provider behind a routing function:

Config: llm_provider
  │
  ├─ "ollama"            → Client(host="http://localhost:11434")
  │                         client.chat(model=model, messages=[...], temperature=0)
  │
  └─ "openai_compatible" → urllib.request (no dependencies)
                           POST {base_url}/chat/completions
                           Bearer {api_key}

The OpenAI-compatible path uses Python's built-in urllib.request — no openai package required. This means any compatible endpoint works: LM Studio, Jan, LocalAI, Groq, Together AI, or actual OpenAI.

Why Local LLMs Work for This Task

The key insight that makes local LLMs viable here: secret classification is a low-complexity reasoning task.

You don't need GPT-4-level intelligence to determine that:

# .env.example file
API_KEY = "your-api-key-here"  # Replace with your actual key

…is a placeholder. A 0.6B parameter model like qwen3:0.6b handles this correctly in every test. The code context (surrounding lines, file name, comments) provides all the information needed.

This is the fundamental principle of task-model matching: use the smallest model that can reliably solve your specific task. For binary classification with structured context, tiny models excel.

5. The Fix Layer: Fixer

This is what makes leakfix unique. Once a secret is confirmed, the Fixer performs two operations:

Operation 1: File Patching

For secrets in the working directory:

Load the file and locate the exact line
Call the LLM with a fix prompt (or use a simple substitution rule)
Replace the secret value with a safe placeholder (CHANGE_ME, empty string, or REDACTED)
Write the file back atomically

The fix prompt is designed to be minimally invasive:

You are a security engineer. Return ONLY the safe replacement VALUE.
Rules:
- Make the MINIMAL change possible — only replace the secret value itself
- If SECRET is a default like ${VAR:-secret}, return empty string ""
- If SECRET is a variable like PASSWORD=secret, return empty string ""
- If SECRET is hardcoded like password = "secret", return "CHANGE_ME"
- If SECRET is an API key or token, return "CHANGE_ME"
- When in doubt, return "REDACTED"
Respond: {"replacement": "<value>", "reason": "<why>", "confident": true/false}

The confident flag is important: if the LLM isn't sure its replacement is syntactically correct, it returns "REDACTED" as a safe fallback rather than breaking the code.

Operation 2: Git History Rewriting

For secrets buried in commit history:

leakfix fix --history
     │
     ▼
Scan history → Find secret SHA + file + value
     │
     ▼
Build git-filter-repo mailmap
     │
     ▼
git-filter-repo --blob-callback "replace(secret_value, replacement)"
     │
     ▼
Force-push rewritten history (with user confirmation)

git-filter-repo is the Git project's own recommended tool for history rewriting — faster and safer than git filter-branch. leakfix generates the appropriate blob callback and coordinates the entire rewrite process.

⚠️ Architecture note: History rewriting changes all commit SHAs. For shared repositories, this requires force-pushing and team coordination. leakfix always prompts for confirmation before any destructive operation.

6. The TUI Layer: Apple Intelligence Design System

In v1.9.0, leakfix gained a full terminal UI — accessible via leakfix ui or leakfix setup. It's built on two modules:

ui.py — Rich Console Components

ui.py owns the stateless output layer: banners, panels, gradient rules, and progress displays. It defines the Apple Intelligence design system used across all CLI output:

# Apple Intelligence color palette
APPLE_PURPLE      = "#BC82F3"   # Primary brand / selected state
APPLE_BLUE_PURPLE = "#8D9FFF"   # Accent / section labels
APPLE_PINK        = "#F5B9EA"   # Highlight / hover state
APPLE_WHITE       = "#F5F5F7"   # Primary text
APPLE_DIM         = "#6E6E73"   # Subtitles / hints
APPLE_SUCCESS     = "#30D158"   # Confirmations
APPLE_ERROR       = "#FF6778"   # Errors only
APPLE_DARK_BG     = "#1C1C1E"   # App background

The print_banner() function has a three-tier fallback:

terminaltexteffects animated beams (if installed)
rich-gradient static gradient
Plain Rich shimmer animation with Live — always works with zero extra deps

wizard_app.py — Textual TUI Application

wizard_app.py is a Textual application — a reactive, event-driven terminal UI framework. After the v1.9.1 redesign, ShimmerHeader moved from a centered 60%-width panel to a full-width docked header in the style of Claude Code's terminal:

# Before (v1.9.0): centered, border all sides, 7 rows
ShimmerHeader {
    width: 60%;
    height: 7;
    border: heavy #BC82F3;
    content-align: center middle;
}
# After (v1.9.1): full-width, bottom border only, 5 rows
ShimmerHeader {
    width: 100%;
    height: 5;
    background: #1C1C1E;
    border-bottom: heavy #BC82F3;  /* animates through purple→blue→pink */
    dock: top;
    text-align: left;
}

The bottom border cycles through GLOW_GRADIENT = [APPLE_PURPLE, APPLE_BLUE_PURPLE, APPLE_PINK, APPLE_VIOLET] at 20 fps via set_interval(1/20, self._tick).

The leakfix ui Command

leakfix ui          # open the TUI wizard directly
leakfix setup       # same TUI, focused on LLM configuration

Both launch LeakfixWizardApp, a Textual App[dict | None] that returns a config dict on exit. The wizard guides through dependency checks, model selection (qwen3:0.6b / llama3.2:3b / phi4), and download progress via async subprocess streaming. The Textual reactive system means all UI updates — including the animated border — are scheduled on the event loop without blocking the setup logic.

8. The Prevention Layer

Pre-Commit Hook

#!/bin/bash
# leakfix pre-commit hook
leakfix scan --staged --hook-mode [--smart]
exit_code=$?
if [ $exit_code -eq 1 ]; then
    echo "❌ Commit blocked: secrets detected in staged files"
    echo "Run: leakfix fix    to remove them"
    exit 1
fi
exit 0

The --hook-mode flag enables a compact output format optimized for terminal display during commit. The --smart flag activates LLM filtering in the hook, reducing false positives that would otherwise block legitimate commits.

Guard Mode (File Watcher)

Watcher
  │
  ├── Uses: watchdog (cross-platform file system events)
  │
  ├── Watches for patterns:
  │     .env*, *.pem, *.key, *secret*, *credential*,
  │     *token*, *.p12, *.pfx, firebase*.json, *adminsdk*
  │
  ├── On CREATE or MODIFY:
  │     → Alert immediately
  │     → Log to ~/.leakfix/guard.log
  │     → (Optional) auto-add to .gitignore
  │
  └── Daemon mode: ~/.leakfix/guard.pid for process management

Guard mode catches the problem before staging — as soon as a dangerous file appears on disk.

9. The Org Scanner

For enterprise-scale scanning:

OrgScanner
  │
  ├── scan_directory(path)
  │     → Find all .git repos under path (os.walk)
  │     → ThreadPoolExecutor(max_workers=parallel)
  │     → Clone each repo to temp dir, scan, aggregate
  │
  ├── scan_github(org, token)
  │     → GitHub REST API: /orgs/{org}/repos (paginated)
  │     → Filter by language, archived status, fork status
  │     → Clone each → scan → delete temp clone
  │
  └── scan_gitlab(group, token, url)
        → GitLab REST API: /groups/{group}/projects (paginated)
        → Same clone-scan-delete pattern

All clones are done to a temporary directory and cleaned up after scanning — no permanent storage of org code.

Data Flow: End-to-End

User: leakfix scan --smart
         │
         ▼
┌────────────────────────────────────────────────────────────────┐
│ CLI Layer (cli.py)                                             │
│   - Parse args                                                  │
│   - Check LLM setup                                            │
│   - Instantiate Scanner, Classifier, Reporter                  │
└────────────────────────────────────────────────────────────────┘
         │
         ▼
┌────────────────────────────────────────────────────────────────┐
│ Scanner.scan_working_directory()                               │
│   ├── ThreadPoolExecutor: gitleaks + ggshield in parallel     │
│   ├── Parse JSON output from each                              │
│   ├── Normalize file paths (relative to repo root)            │
│   ├── Merge findings: mark "both" where both scanners agree   │
│   ├── Deduplicate by (file, line)                             │
│   └── Filter .leakfixignore patterns                          │
└────────────────────────────────────────────────────────────────┘
         │  List[Finding]
         ▼
┌────────────────────────────────────────────────────────────────┐
│ Classifier.classify_findings(findings, llm_enabled=True)       │
│   ├── ThreadPoolExecutor: 4 workers for parallel classification│
│   ├── For each Finding:                                        │
│   │     Run 10-stage pipeline                                  │
│   │     Stage 6: Ollama(qwen3:0.6b) with code context        │
│   └── Return List[ClassifiedFinding]                          │
└────────────────────────────────────────────────────────────────┘
         │  List[ClassifiedFinding]
         ▼
┌────────────────────────────────────────────────────────────────┐
│ Reporter._format_smart_scan()                                  │
│   ├── Rich table: severity | file:line | type | scanner | cls  │
│   ├── Color coding: red=CONFIRMED, yellow=REVIEW, green=FP    │
│   └── Summary: X confirmed, Y FP filtered, Z need review      │
└────────────────────────────────────────────────────────────────┘
         │
         ▼
User sees: prioritized, classified findings

The Shannon Entropy Deep Dive

Entropy is the mathematical backbone of secret detection. Here's why it works:

Shannon entropy measures the randomness of a string:

H = -Σ p(c) × log₂(p(c))
    for each unique character c

Where p(c) is the probability of character c appearing.

String                                       Entropy    Notes
─────────────────────────────────────────────────────────────────────────
"aaaaaaaaa"                                    0.0      No randomness
"changeme"                                     2.81     Human-readable word
"your-api-key"                                 3.17     Placeholder text
"sk-abc123def456"                              3.64     Borderline
"AKIAIOSFODNN7EXAMPLE"                         3.82     AWS key format
"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"   4.21     Real-looking AWS secret
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."    4.58     JWT token

leakfix uses threshold 3.0 for the "definitely not a secret" cutoff, and 3.5 for the "probably needs LLM review" zone. The LLM then handles everything in that 3.0–4.5 gray zone with full code context.

Why This Architecture Wins

vs. Pure Regex (Gitleaks standalone)

Gitleaks catches 90%+ of real secrets but also generates significant false positives on template files, documentation, and test fixtures. Without a classification layer, every finding looks equally urgent. Alert fatigue sets in.

vs. Cloud Classification (GitGuardian)

GitGuardian's secret validation sends findings to GitGuardian's cloud for analysis. For a tool designed to catch secret leaks, this is an uncomfortable irony — your secrets transit a third party's infrastructure during the detection process.

With leakfix + Ollama, the loop is completely closed: secrets are detected locally, analyzed locally, and fixed locally.

vs. Rule-Based Classification Alone

Pure heuristics (pattern matching + entropy) work well for obvious cases but fail on:

Secrets with medium entropy (3.0–4.0) that are real but moderately random
Template files with realistic-looking placeholder values
Config files mixing real and example values on adjacent lines

The LLM's understanding of code context bridges this gap. It reads the surrounding code like a developer would: "this is an .env.example file, the comments say # copy to .env and fill in, the other values are all placeholders — so this one is too."

Configuration System

leakfix stores config in ~/.leakfix/config.json:

{
  "llm_enabled": true,
  "llm_provider": "ollama",
  "llm_model": "qwen3:0.6b",
  "llm_base_url": "http://localhost:11434",
  "llm_api_key": "",
  "setup_complete": true
}

The setup_wizard.py handles:

Dependency checking (gitleaks, git-filter-repo, ollama CLI, ollama Python package)
Model selection and download via ollama pull
Validation that the selected model can respond

Config is loaded at startup and cached — no repeated disk reads during classification.

Extending leakfix

Adding Custom Rules

leakfix uses an extended Gitleaks config (gitleaks-extended.toml) alongside the default rules. To add custom rules for your organization:

# .gitleaks.toml in your repo root (overrides the built-in config)
[[rules]]
id = "my-company-internal-token"
description = "Company internal service token"
regex = '''myco-[0-9a-f]{32}'''
keywords = ["myco-"]

Adding Custom Ignore Patterns

Create .leakfixignore in your repo root:

# Ignore test fixtures
tests/fixtures/
tests/data/
# Ignore example configs
*.example
*.sample
config/

Performance Characteristics

On a typical mid-sized repository (~50,000 lines, 500 files):

Mode                              Time          Notes
────────────────────────────────────────────────────────────────
scan (no LLM)                     ~2–5s         Gitleaks only
scan (dual scanner)               ~3–8s         Parallel gitleaks + ggshield
scan --smart with qwen3:0.6b      ~5–15s        + LLM for uncertain findings
scan --history                    ~15–60s       Depends on commit count
fix (working dir)                 ~5–30s        Depends on finding count
fix --history                     ~30s–10min    Depends on repository size

The LLM overhead (Stage 6) is proportional to the number of findings that survive heuristics — typically 20–40% of raw findings in a codebase with test fixtures and documentation.

Lessons Learned

1. Tiny models are surprisingly capable for classification tasks

qwen3:0.6b correctly classifies 95%+ of test cases when given good code context. The structured prompt (constrained output, explicit rules, binary decision) is the key. General-purpose tasks need big models; focused classification tasks don't.

2. Parallelism matters more than you'd expect

Scanning a repository involves many subprocess calls (git, gitleaks, ggshield). Threading with ThreadPoolExecutor reduced wall-clock time by 40–60% versus sequential execution in tests.

3. The false positive problem is the hardest part

Getting accurate detection is mostly solved (gitleaks is excellent). Getting accurate classification — distinguishing a real AWS key from your-aws-key-here — is the hard part. The 10-stage pipeline with LLM fallback is the result of extensive iteration.

4. History rewriting requires serious UX care

git-filter-repo is powerful and destructive. Getting the confirmation flows right, the backup reminders, and the force-push warnings took as much effort as the underlying implementation.

What's Next

The architecture has room to grow:

VSCode extension: Inline secret warnings as you type, powered by the same classifier
CI/CD integration: GitHub Actions / GitLab CI native integration with PR comments
Secret validity checking: Not just detection, but verifying if a found AWS key actually has active permissions
Custom model fine-tuning: A leakfix-specific model trained on millions of real/fake secret examples

Already shipped in v1.9.1:

✅ Apple Intelligence TUI (leakfix ui) — full-width animated header, Claude Code terminal aesthetic, Textual reactive architecture
✅ ui.py design system — shared Rich components, gradient banners, 3-tier animation fallback

Contribute

leakfix is MIT-licensed and actively welcoming contributions. The codebase is clean Python 3.11+ with no complex dependencies:

leakfix/
  cli.py          # CLI entrypoint (click) — all commands including `ui`
  scanner.py      # Dual-scanner coordination (gitleaks + ggshield)
  classifier.py   # 10-stage classification pipeline
  fixer.py        # File patching + history rewriting
  hooks.py        # Pre-commit hook management
  watcher.py      # Guard mode file watcher
  org_scanner.py  # GitHub/GitLab/directory org scanning
  reporter.py     # Output formatting (rich, HTML, JSON)
  setup_wizard.py # Interactive setup + dependency check
  gitignore.py    # .gitignore management
  ui.py           # Apple Intelligence design system (Rich banners, panels, shimmer)
  wizard_app.py   # Textual TUI — animated header, LLM setup wizard
  utils.py        # Shared utilities

📦 GitHub: princebharti/gitleakfix 📦 PyPI: leakfix

Summary

leakfix's architecture is built on five core innovations:

Dual-scanner parallelism — Gitleaks + ggshield run concurrently; agreement between scanners means high-confidence confirmation
10-stage classification pipeline — Fast heuristics first, LLM last; only invoke the expensive path when needed
Local-first LLM with code context — 0.6B parameter models can classify secrets accurately when given structured context and constrained output format; secrets never leave the machine
Integrated remediation — The only open-source tool that closes the loop from detection to fix, including git history rewriting
Apple Intelligence TUI — A Textual reactive UI with animated shimmer headers, full-width Claude Code aesthetic, and a 3-tier graceful fallback design system in ui.py

The combination makes leakfix not just another detection tool, but a complete secret management CLI for the era of agentic AI development.

Want to try it right now?

brew tap princebharti/tap
brew install gitleakfix
leakfix ui           # interactive TUI wizard
leakfix scan --smart # or jump straight to scanning

Have questions about the architecture? Open an issue or start a discussion on GitHub.

References:

#ai #local-llm #genai #security #git