Abstract

This article introduces a new approach to automatically finding software vulnerabilities by combining the power of Large Language Models (LLMs) with traditional code analysis techniques. The system we propose is a mix of different tools working together, which helps address some of the weaknesses of current methods โ€” like how static analysis often leads to false alarms, or how LLMs can sometimes get things wrong or miss important details due to limited context.

At the heart of our approach is the process of turning code from languages like C, C++, or Java into something called a Code Property Graph (CPG), which shows both the structure and meaning of the code. This CPG is then examined by a system made up of multiple agents, specially trained to look for security risks, identify dangerous method call chains, and generate examples of potential exploits, known as Proofs of Concept (PoCs).

After that, we have a step where we test and validate these PoCs, helping to reduce false alarms and improving the accuracy of the results. Early tests using publicly available data show that this approach detects vulnerabilities more accurately, with fewer false positives, proving that combining LLMs with traditional code analysis can lead to more reliable and scalable security solutions.

๐Ÿง  Introduction

The old-school tools we use โ€” the ones that check code before it runs (static analysis) and the ones that check it while it's running (dynamic analysis) โ€” are just not cutting it anymore. They're too slow to keep up with how fancy modern bugs are.

Static Analysis (SAST)

Static Analysis is basically reading the code without running it. The problem is, it gets confused easily. It flags tons of stuff that are actually fine (high false positives) because it can't figure out exactly how the data will flow or what the full context will be.

Dynamic Analysis

Dynamic Analysis is better because it actually executes the code. But it has its own major flaw: it can't check everything. If the tool doesn't hit a specific feature or obscure execution path, it'll miss a vulnerability completely (high false negatives).

We need something smarter because these traditional methods are basically overwhelmed.

๐Ÿค– Enter Large Language Models (LLMs)

Large Language Models (LLMs) โ€” the same technology behind tools like GitHub Copilot โ€” have completely changed how we think about analyzing code.

Why LLMs Matter

LLMs are amazing at actually understanding code โ€” not just what it says line-by-line, but what it means, how different pieces interact, and what the developer intended.

But They Aren't Perfect

If you try to use them directly for vulnerability detection, two major problems appear:

  1. Hallucination โ€” They can confidently produce incorrect security findings.
  2. Limited Context Window โ€” They cannot process very large codebases at once.

โš™๏ธ The Proposed Hybrid Solution

So how do we fix this? We stop relying on a single approach.

This article proposes a hybrid, multi-agent system that combines traditional static analysis with LLM intelligence.

Step 1: Build a Structural Map (CPG)

Static analysis constructs a Code Property Graph (CPG) โ€” a structured representation of:

  • Function calls
  • Control flow
  • Data flow

Step 2: Feed Structured Data to LLM

Instead of raw code, the LLM receives a clean structured graph and performs taint analysis to detect vulnerabilities.

Step 3: Automatic Verification

For every detected vulnerability, the system generates a Proof of Concept (PoC) exploit.

  • If the PoC works โ†’ vulnerability is real
  • If it fails โ†’ discard finding

Step 4: Multi-Agent System

Different specialized agents collaborate to:

  • Detect vulnerabilities
  • Validate findings
  • Improve reliability

๐Ÿ” Key Contributions

1. Smarter Bug Detection

A hybrid architecture that combines structural code understanding (CPG) with LLM reasoning.

2. Eliminating False Positives

Automatic PoC generation ensures only real vulnerabilities are reported.

3. Proven Performance

The system outperforms existing tools on benchmark datasets.

๐Ÿ“š Literature Survey

Traditional Methods

Static Analysis Tools

Examples: FindBugs, Fortify SCA, Coverity

  • Use pattern matching and data flow analysis
  • High false positives due to limited context

Dynamic Analysis

  • Uses runtime execution (e.g., fuzzing)
  • Limited coverage โ†’ false negatives

Deep Learning-Based Detection

  • Early models: Bi-LSTM (VulDeePecker, SySeVR)
  • Advanced models: GNNs on CPG (Devign, REVEAL)
  • Problem: Performance drops in real-world datasets

LLM-Based Detection

Approaches

  • Base Prompting โ†’ Simple but unreliable
  • Fine-Tuning โ†’ Improves performance significantly
  • RAG (Retrieval-Augmented Generation) โ†’ Reduces hallucination
  • Hybrid Approaches โ†’ Combine static analysis + LLMs

Positioning of This Article

This article combines:

  • Static analysis (CPG)
  • Fine-tuned LLM
  • Multi-agent validation
  • PoC-based verification

Result โ†’ A more reliable and scalable vulnerability detection system.

๐Ÿ—๏ธ System Architecture

Phase 1: Multi-Modal Code Analysis

  • CPG generation using static analysis
  • Code slicing for focused context
  • RAG for external knowledge retrieval

Phase 2: Multi-Agent Detection

  • Detection Agent โ†’ Finds vulnerabilities
  • Verification Agent โ†’ Validates results
  • Consensus Mechanism โ†’ Ensures reliability

Phase 3: Automated Mitigation

  • Patch generation
  • Static re-check
  • Unit testing

Phase 4: Explainability & Reporting

  • XAI techniques (SHAP, LIME)
  • Developer-friendly reports
  • Visualization of model reasoning
None

๐Ÿ› ๏ธ Implementation Plan

Phase 1: Foundation

  • Toolchain setup (Python, Neo4j, Soot)
  • CPG generation
  • Basic taint analysis

Phase 2: LLM Integration

  • Fine-tuning models (CodeBERT, WizardCoder)
  • Prompt engineering
  • Multi-agent system design

Phase 3: Validation

  • PoC execution system
  • End-to-end testing
  • Performance evaluation

๐Ÿ“Œ Conclusion

By combining structured code analysis with intelligent LLM reasoning and multi-agent validation, this article presents a robust and scalable approach to vulnerability detection.

The integration of CPG, PoC validation, and agent collaboration addresses the key limitations of both traditional and modern techniques โ€” resulting in improved accuracy, reduced false positives, and practical real-world applicability.