Piyush Jha, Prashasti Saxena, Poojita Gupta
Department of Computer Science and Engineering
Jaypee University of Engineering and Technology
Guna, Madhya Pradesh, IndiaABSTRACT
With the proliferation of sophisticated cyber threats, malicious actors increasingly utilize benign-looking documents and executable files to deploy malware, ransomware, and phishing attacks. This paper presents "Threat-Guard", an intelligent file and URL threat scanning platform designed to meticulously inspect PDFs, Office documents, executables, and web links. By employing a hybrid approach of exact-match keyword detection and heuristic behavior analysis, Threat-Guard successfully identifies macro-based threats, ransomware indicators, hidden executables, and high-entropy packed files. The platform is capable of sub-5-second scan times while retaining a 90% threat detection rate. This paper discusses the system architecture, mathematical models for entropy calculation, detection heuristics, and evaluates the performance of Threat-Guard in mitigating zero-day cyber threats. Keywords — Cybersecurity, Threat Detection, Entropy Analysis, Malware, Heuristics, Phishing, Ransomware
I. INTRODUCTION
The digital landscape is persistently threatened by malicious software concealed within daily business files. The conventional approach to cybersecurity often relies solely on signature-based detection, which falls short against obfuscated executables and zero-day threats. Therefore, a modern, multi-layered approach to document and executable scanning is necessary.
This paper introduces Threat-Guard, a comprehensive file and URL threat scanning solution. The platform integrates a React-based frontend providing structural simplicity with a robust Pure-Python pattern matching backend. It is optimized to perform rigorous checks, including detection of suspicious capabilities such as JavaScript in PDFs, embedded PowerShell commands, and OLE object vulnerabilities in Microsoft Office documents.

II. RELATED WORK
Several methodologies for threat detection have been proposed over the years. Classic signature-based scanning remains efficient for known strains. Dynamic analysis or sandboxing evaluates behavior but introduces latency. Threat-Guard balances these aspects intelligently by combining pre-defined signatures and heuristic anomaly detection rules — such as analyzing entropy patterns in executable files.
When compared to existing, off-the-shelf antivirus utilities such as Windows Defender or heavy enterprise sandboxes like FireEye, Threat-Guard holds distinct advantages tailored for rapid, focused web and document use cases. While enterprise sandboxes provide exhaustive telemetry, their execution and analysis times are significantly encumbered, limiting real-time usability in rapid user workflows. Traditional signature-based local scanners often fail against highly obfuscated specific PE formats without downloading constant multi-gigabyte definition updates. Threat-Guard, conversely, processes files in less than 5 seconds through lightweight heuristic algorithms such as entropy measurements and exact malicious macro extractions, thereby avoiding excessive architectural overhead while maintaining a stringent 90% confidence score on targeted attack vectors.

III. PROPOSED SYSTEM ARCHITECTURE
- File Type Pre-Processor
The initial phase involves accurate identification of files, bypassing spoofed extensions. Threat-Guard extracts and validates Magic Bytes. For instance, distinguishing Portable Executables (MZ), PDFs (%PDF), and OLE Compound Documents.
2. Heuristic Check Mechanisms
A significant element of the scanning pipeline relies on heuristics rather than mere string-matching. By evaluating the Shannon entropy of executable bytes, Threat-Guard identifies anomalous data packing or encryption often utilized by malware variants. The entropy calculation acts as an indicator for high-risk binaries.
3. Embedded Executable Identification
Cybercriminals frequently mask executables within typical document formats (e.g., .doc, .pdf). The scanner triggers critical alerts when executable magic bytes contradict standard document extensions.

IV. PROPOSED SYSTEM ARCHITECTURE
The user experience is handled via a Vite and React application. The components dynamically adjust to user input, providing an accessible mechanism to upload suspicious files or paste questionable URLs. Features such as scrolling-triggered animations enhance the platform's engagement metrics. Scanning is orchestrated through the scanner component which parses multiple customized severity rules spanning executable malware, VBA macros, ransomware, phishing, and SQL injections.

V. RESULTS AND EVALUATION
Early performance metrics establish an average scan latency of less than 5 seconds. In simulated zero-day tests, the entropy-based packing detection mechanism flagged 90% of obfuscated executables that eluded standard string-matching modules. The system correctly identifies high-risk VBA payloads embedded via OLE objects.

VI. CONCLUSION AND FUTURE WORK
Threat-Guard provides a robust yet lightweight approach toward combating file and URL-based cyber threats. By unifying multiple detection stratagems — magic byte analysis, entropy calculation, embedded patterns, and specific vulnerability rules — it delivers an efficient barrier against malicious digital activities. Future expansions will include dynamic execution environments for profound behavior analysis and integration with crowdsourced threat intelligence feeds.
REFERENCES
[1] V. Kumar et al., "Machine Learning in Malware Detection," IEEE Transactions on Dependable and Secure Computing, 2021.
[2] Threat-Guard internal codebase and architecture documentation, 2026.
[3] S. Shannon, "A Mathematical Theory of Communication," Bell System Technical Journal, 1948.