Designing a production-grade, autonomous vulnerability research platform.

This architecture is designed to be entirely self-contained, utilizing an S3-compatible object storage backend (MinIO) as the state machine…

Paul Wilcox

~4 min read · April 3, 2026 (Updated: April 3, 2026) · Free: Yes

This architecture is designed to be entirely self-contained, utilizing an S3-compatible object storage backend (MinIO) as the state machine for a sequential vulnerability scanning pipeline.

The Core Concept: The "Inbox" Model

The pipeline is driven by "Inboxes" (buckets). Each stage of the pipeline reads a JSON ticket from its input inbox, appends findings or verdicts, and moves the updated ticket to the next stage.

Logic: If no findings are detected after all scanners, the ticket is discarded.
Confidence: High-confidence findings move to the sandbox for exploitation proof.
Triage: Uncertain or low-confidence findings move to a "Potato" queue for manual human review.

The Sequential Pipeline

1. inbox-prime (The Entry Point)

Purpose: Starting point for new tickets.
Trigger: A Watcher service detects a new repository or a file change.
Content: Basic ticket JSON (repo URL, commit SHA, etc).
Next Stage: Trivy scanning.

2. inbox-tri (Vulnerability & Misconfig)

Process: Initial scan for vulnerabilities and misconfigurations.
Next Stage: Deep secret scanning.

3. inbox-lea (Deep Secret Discovery)

Process: Deep Git history scanning for leaked credentials.
Sysadmin Note: Verified secrets appended under leak_findings.
Next Stage: SAST pattern probing.

4. inbox-pat (Static Analysis & Custom Rules)

Process: Pattern matching for specific code flaws (C++, CUDA, etc.) using custom SAST rules.
Next Stage: Dependency analysis.

5. inbox-dep (Dependency CVEs)

Process: Precise scanning for dependency-level CVEs via OSV-Scanner.
Next Stage: Automated LLM verdict.

6. inbox-jul (The "Juliet" Pre-Sandbox Queue)

Purpose: High-confidence findings verified by a local LLM "ladder."
Process: Ready for exploit confirmation.
Next Stage: Exploit sandbox.

7. inbox-lim (The "Lima" Proven Gold)

Purpose: Exploit confirmed via automated sandbox (crash/RCE/leak proof).
Result: Final artifacts (logs/dumps) are ready for CVE submission.

Triage and Discard Paths

inbox-pot (The "Potato" Queue) For findings where the automated verdict is uncertain or low-confidence. This requires manual intervention to review the JSON and decide whether to move it to the sandbox or the trash.

inbox-trash Final discard for zero-finding tickets or rejected "potatoes."

Operations & Monitoring

Visualizing the Flow Monitoring is handled via a centralized dashboard (e.g., Grafana). Key metrics to track:

Potato Queue: High alerts if >5 tickets are stale for more than 24 hours.
Confidence Histogram: Visualizing verdict accuracy to tune LLM thresholds.

Maintenance Tips

Logs: Monitor consumer services on each node using journalctl.
Emergency Stop: All workers can be halted by stopping the consumer service across the cluster.

Field Report: Rebuilding and Results

Every homelabber knows the "Hardware Problem Monster." While the physical infrastructure is currently undergoing a complete rebuild, the data from the initial run has proven the concept's value.

The Filter in Action: 70 down to 16

The primary goal of this pipeline was to reduce "alert fatigue" for the sysadmin (the Potato Queue). In the most recent production run, the four-scanner battery flagged over 70 potential vulnerabilities.

In a traditional setup, that's 70 tickets requiring manual triage. However, once the LLM "Ladder" (Ollama running CPU optimized LLM's) processed those findings:

Initial Scans: 70+ Potential CVEs.
LLM Filtered: 16 High-Confidence Targets.
Noise Reduction: ~77%

By using the LLM layer as a "Verdict Vault," the pipeline successfully discarded dozens of false positives and low-priority misconfigurations before a human ever had to look at a screen.

Lessons from the Rebuild

Rebuilding the lab from the ground up has allowed for a few optimizations:

Compute Decoupling: Ensuring that the scan workers (Trivy, TruffleHog, etc.) are entirely stateless. If a node goes down, the ticket simply stays in its MinIO inbox until a new worker checks it out.
CPU-Only Inference: Doubling down on the decentralized, CPU-only approach for the LLM verdicts to ensure the pipeline remains accessible without requiring high-end GPU clusters.
Enhanced Monitoring: Integrating more granular Grafana tracking for the "Potato" queue to identify exactly where tickets are getting stuck in the manual triage phase.

The Clean Room Protocol: Protecting the Lab

One of the biggest challenges in modern AI-driven research is the privacy-vs-performance trade-off. To solve this, will utilizes a Sanitization Gateway before any data leaves the local mesh.

Once the local "Ironclaw" director has a high-confidence finding, it triggers a sanitization routine:

PII & Metadata Stripping: All local paths, developer names, and repository metadata are scrubbed.
Contextual Anonymization: The code is abstracted into a "generic" form that retains the logic flaw but removes the specific identity of the target.
The Secure Handshake: This anonymized "logic packet" is sent to a secure cloud LLM for a final, high-reasoning sanity check.

This creates a "One-Way Valve" for information: Intelligence flows in from the cloud's reasoning capabilities, but sensitive lab data never flows out.

The mission remains the same: Building a decentralized, autonomous security research lab that respects privacy and runs on local iron.

#cve #llm #automation #objectives-key-results #pipeline