Abstract
This article introduces a new approach to automatically finding software vulnerabilities by combining the power of Large Language Models (LLMs) with traditional code analysis techniques. The system we propose is a mix of different tools working together, which helps address some of the weaknesses of current methods โ like how static analysis often leads to false alarms, or how LLMs can sometimes get things wrong or miss important details due to limited context.
At the heart of our approach is the process of turning code from languages like C, C++, or Java into something called a Code Property Graph (CPG), which shows both the structure and meaning of the code. This CPG is then examined by a system made up of multiple agents, specially trained to look for security risks, identify dangerous method call chains, and generate examples of potential exploits, known as Proofs of Concept (PoCs).
After that, we have a step where we test and validate these PoCs, helping to reduce false alarms and improving the accuracy of the results. Early tests using publicly available data show that this approach detects vulnerabilities more accurately, with fewer false positives, proving that combining LLMs with traditional code analysis can lead to more reliable and scalable security solutions.
๐ง Introduction
The old-school tools we use โ the ones that check code before it runs (static analysis) and the ones that check it while it's running (dynamic analysis) โ are just not cutting it anymore. They're too slow to keep up with how fancy modern bugs are.
Static Analysis (SAST)
Static Analysis is basically reading the code without running it. The problem is, it gets confused easily. It flags tons of stuff that are actually fine (high false positives) because it can't figure out exactly how the data will flow or what the full context will be.
Dynamic Analysis
Dynamic Analysis is better because it actually executes the code. But it has its own major flaw: it can't check everything. If the tool doesn't hit a specific feature or obscure execution path, it'll miss a vulnerability completely (high false negatives).
We need something smarter because these traditional methods are basically overwhelmed.
๐ค Enter Large Language Models (LLMs)
Large Language Models (LLMs) โ the same technology behind tools like GitHub Copilot โ have completely changed how we think about analyzing code.
Why LLMs Matter
LLMs are amazing at actually understanding code โ not just what it says line-by-line, but what it means, how different pieces interact, and what the developer intended.
But They Aren't Perfect
If you try to use them directly for vulnerability detection, two major problems appear:
- Hallucination โ They can confidently produce incorrect security findings.
- Limited Context Window โ They cannot process very large codebases at once.
โ๏ธ The Proposed Hybrid Solution
So how do we fix this? We stop relying on a single approach.
This article proposes a hybrid, multi-agent system that combines traditional static analysis with LLM intelligence.
Step 1: Build a Structural Map (CPG)
Static analysis constructs a Code Property Graph (CPG) โ a structured representation of:
- Function calls
- Control flow
- Data flow
Step 2: Feed Structured Data to LLM
Instead of raw code, the LLM receives a clean structured graph and performs taint analysis to detect vulnerabilities.
Step 3: Automatic Verification
For every detected vulnerability, the system generates a Proof of Concept (PoC) exploit.
- If the PoC works โ vulnerability is real
- If it fails โ discard finding
Step 4: Multi-Agent System
Different specialized agents collaborate to:
- Detect vulnerabilities
- Validate findings
- Improve reliability
๐ Key Contributions
1. Smarter Bug Detection
A hybrid architecture that combines structural code understanding (CPG) with LLM reasoning.
2. Eliminating False Positives
Automatic PoC generation ensures only real vulnerabilities are reported.
3. Proven Performance
The system outperforms existing tools on benchmark datasets.
๐ Literature Survey
Traditional Methods
Static Analysis Tools
Examples: FindBugs, Fortify SCA, Coverity
- Use pattern matching and data flow analysis
- High false positives due to limited context
Dynamic Analysis
- Uses runtime execution (e.g., fuzzing)
- Limited coverage โ false negatives
Deep Learning-Based Detection
- Early models: Bi-LSTM (VulDeePecker, SySeVR)
- Advanced models: GNNs on CPG (Devign, REVEAL)
- Problem: Performance drops in real-world datasets
LLM-Based Detection
Approaches
- Base Prompting โ Simple but unreliable
- Fine-Tuning โ Improves performance significantly
- RAG (Retrieval-Augmented Generation) โ Reduces hallucination
- Hybrid Approaches โ Combine static analysis + LLMs
Positioning of This Article
This article combines:
- Static analysis (CPG)
- Fine-tuned LLM
- Multi-agent validation
- PoC-based verification
Result โ A more reliable and scalable vulnerability detection system.
๐๏ธ System Architecture
Phase 1: Multi-Modal Code Analysis
- CPG generation using static analysis
- Code slicing for focused context
- RAG for external knowledge retrieval
Phase 2: Multi-Agent Detection
- Detection Agent โ Finds vulnerabilities
- Verification Agent โ Validates results
- Consensus Mechanism โ Ensures reliability
Phase 3: Automated Mitigation
- Patch generation
- Static re-check
- Unit testing
Phase 4: Explainability & Reporting
- XAI techniques (SHAP, LIME)
- Developer-friendly reports
- Visualization of model reasoning

๐ ๏ธ Implementation Plan
Phase 1: Foundation
- Toolchain setup (Python, Neo4j, Soot)
- CPG generation
- Basic taint analysis
Phase 2: LLM Integration
- Fine-tuning models (CodeBERT, WizardCoder)
- Prompt engineering
- Multi-agent system design
Phase 3: Validation
- PoC execution system
- End-to-end testing
- Performance evaluation
๐ Conclusion
By combining structured code analysis with intelligent LLM reasoning and multi-agent validation, this article presents a robust and scalable approach to vulnerability detection.
The integration of CPG, PoC validation, and agent collaboration addresses the key limitations of both traditional and modern techniques โ resulting in improved accuracy, reduced false positives, and practical real-world applicability.