Open-Source AI-Augmented SOC on Azure: Automating Tier 1–3 SOC Operations with Fine-Tuned LLMs

Key Takeaways

Francisco Agballog

~12 min read · April 25, 2026 (Updated: April 25, 2026) · Free: Yes

Wazuh on Azure AKS delivers a zero-license-cost SIEM/XDR infrastructure that collects data from endpoints, network devices, Azure AD, and EHR audits through a single Helm installation.
The highly optimized Mistral 7B model deployed in Azure Container Apps performs automatic Tier 1 alert triaging. In the case of a 50-endpoint business, the stack usually costs from $350 to $800 monthly in Azure cloud resources.
Presidio by Microsoft ensures PHI data isolation by identifying most HIPAA Safe Harbor markers prior to entering an AI model, training process, and vector indexing. Subcategories are detected by custom recognizers.
Azure API Management acts as a single AI control panel for authentication via JWT, tiered rate limiting, circuit breaker, cost metering, and immutable HIPAA headers in all responses.
The cold start issue is solved by utilizing CIC-IDS-2017 and UNSW-NB15 datasets as seeds, supplemented by alerts generated by GPT-4o for infrequent attack types.
The implementation will take seven months and involve a precision/recall test for not releasing AI autonomy prematurely.

The Problem of SOC Talent and Costs

According to the 2023 ISC² Cybersecurity Workforce Study, there is a global talent shortage of around 3.4 million that keeps increasing every reporting year. For smaller and mid-sized businesses, this becomes relevant as they cannot compete financially with larger organizations, but still possess an equal amount of private information and have to deal with the same security threats. This means a poorly staffed SOC or an over-reliance on manual processes, which are prone to fail as analysts retire.

There are Managed Security Services Providers who can mitigate this issue; however, real 24/7 monitoring by an MSSP costs from $3,000 to $5,000 monthly. Commercial SIEM solutions complicate the problem even further by using per-GB ingestion rates, making expansion difficult. In this paper, I will describe an architectural approach that solves both problems: an open source solution implemented on Azure and covering three tiers of SOC operations (Tier 1, 2, and 3), with compliance set to HIPAA and NIST CSF requirements, and a cost infrastructure for SMBs.

Reference: https://www.isc2.org/Insights/2025/12/2025-ISC2-Cybersecurity-Workforce-Study

Architecture Description

This system is structured in five logical tiers, starting from data and ending with human review: log collection, Wazuh XDR/SIEM at the core level on the Azure Kubernetes Service, the Azure API Management portal that applies all security and compliance policies, the three-tier autonomous AI system, and the human tier. All these tiers can be individually deployed. Teams can deploy the AI tiers gradually and use Wazuh as a traditional SIEM during the training process.

Reference: https://documentation.wazuh.com/current/cloud-security/azure/index.html

Figure 1: End-to-end architecture — ingestion → Wazuh AKS → APIM → Tier 1/2/3 AI → human oversight and compliance

Ingestion layer

Windows and Linux agents forward events to the Wazuh manager over an encrypted TLS link. Network devices send information to Wazuh over the Syslog protocol. Logs for sign-ins and audits of Azure AD accounts are forwarded to Wazuh using the Wazuh Azure plugin, which queries Azure Monitor REST API and Microsoft Graph APIs without requiring an agent on Azure resources. HL7 and FHIR audit logs generated by electronic health record (EHR) systems are decoded by custom Wazuh XML decoders. Security Configuration Assessment and File Integrity Monitoring modules are built into the system, covering compliance scanning, and include an out-of-the-box HIPAA ruleset.

Reference: https://documentation.wazuh.com/current/compliance/hipaa/index.html

Wazuh core on AKS

Wazuh manager, indexer (which is based on OpenSearch and comes bundled), and dashboard are deployed as a Helm workload on AKS. The cluster used comprises two nodes of size Standard_DS2_v2 and can handle fewer than 100 endpoints. OpenSearch acts as both the real-time store for alerts and the historical store for Tier 3 anomaly detection. Data older than 90 days is backed up using lifecycle policies in cold tier Azure Blob storage for about $0.001 per GB per month.

Three-Tier AI Pipeline

Each layer corresponds to an analyst role and can be accessed via an HTTP endpoint secured within APIM.

Tier 1 — Alert triage: A calibrated Mistral 7B model running on Ollama hosted on Azure Container Apps evaluates each alert and categorizes it either as true positive, likely false positive, or escalation, alongside a confidence score. If the score goes beyond the set threshold, the process triggers automated action such as IP block by Wazuh active response, host isolation, or ticket closure.

Tier 2 — Incident investigation: RAG pipeline backed up by pgvector index running on Azure Database for PostgreSQL Flexible Server fetches MITRE ATT&CK techniques, CVE descriptions, and internal runbooks. The second LLM call enhances an escalated alert, develops a timeline of event correlation, and outputs a HIPAA-compliant report on the incident.

Tier 3 — Threat hunting: Isolation Forest anomaly detection model works nightly on OpenSearch index and identifies behavioral anomalies and associates them with MITRE ATT&CK tactics via LLM agent. IOC correlation takes place against STIX/TAXII feeds provided by self-hosted OpenCTI. Results are then fed into the training loops of Tiers 1 and 2.

HIPAA Compliance: Establishing the PHI Border

Critical to the architecture in healthcare deployments, the PHI boundary is located in exactly the right place — between the logs and everything that follows. The system is built to ensure that no AI model, training dataset, vector index, or report contains any PHI data.

It is ensured by Microsoft Presidio — a Python library provided by Microsoft Research free-of-charge. It uses named-entity recognition as well as deterministic regular expressions to ensure the security of PHI. The majority of HIPAA Safe Harbor Identifiers are identified through automation, although some might need to be recognized by customized recognizer methods such as vehicle identifiers, device identifiers, and code systems. The Microsoft Presidio can recognize all of the following identifiers: names, dates, geographical locations below state-level, telephone numbers, fax numbers, email addresses, social security numbers, medical record numbers, health insurance beneficiary numbers, account numbers, URLS, IP addresses, and biometric identifiers. However, some might need to be recognized by customized recognizer methods such as vehicle identifiers.

SHA-256 hash prefixes are applied to the values before they are encrypted with AES-256, ensuring that if the same value is detected in multiple alerts, the token is the same. This preserves cross-event correlations critical for Tier 2 analysis without revealing the actual value of PHI. Detokenization requires Azure Key Vault read access via Azure RBAC.

Figure 2: PHI Tokenization Data Flow — Presidio identifies PHI, AES-256 Key Vault encrypts the data, and the delineation between sanitized AI & MLOps workflows and identifying data is clearly demarcated.

HIPPA: Any call without the X-PHI-Sanitized: true header will be automatically rejected, resulting in an HTTP 400 response from APIM. Any requests missing the X-PHI-Sanitized: true header will trigger an Azure Monitor Severity 0 Alert on bypass of the security mechanism.

As for completeness: while the built-in recognizers in Presidio recognize most Safe Harbor identifiers, not all can be identified through these default methods. An example would be device serial numbers, which would require custom recognizers. In deployment scenarios where regulatory requirements apply, it is recommended to utilize custom patterns in addition to Presidio and conduct a de-identification risk assessment as per 45 CFR §164.514(b).

MLOps: Solving the Cold Start Problem and Leveraging Continuous Retraining

The classic objection to AI-based SOC solutions is the cold start problem: the model cannot make meaningful decisions with less than thousands of labeled examples, while a small-to-mid-size enterprise cannot label enough examples without having the analyst whose job they are meant to automate. Here we solve this problem in two related steps.

Phase 1: Seed training

The Tier 1 model gets trained on a combination of CIC-IDS-2017, which has 2.8 million labeled network flows in 14 different classes (courtesy of Canadian Institute for Cybersecurity), and UNSW-NB15 with 2.5 million labeled instances across nine types of attacks (courtesy of Australian Centre for Cyber Security). Additionally, samples of alerts from MITRE ATT&CK are used for more realistic labeling. Since DNS tunneling, LSASS credential dumping, and HL7 message alteration are not well represented by these data sets, we generate synthetic Wazuh alerts in JSON format using GPT-4o. The seed model runs in shadow mode for months one and two: it labels all alerts and logs them, but human input still takes precedence. Shadow mode generates a labeled data set for the continuous loop.

References: https://www.unb.ca/cic/datasets/ids-2017.html https://research.unsw.edu.au/projects/unsw-nb15-dataset

Phase 2: Weekly retraining

In the third month, human annotation done in Label Studio gets automated using a pipeline in Azure ML done weekly. The pipeline involves the following steps: preparation of the data, consistency checking of PHI, elimination of duplicate data, and splitting data into 80% and 20%; LoRA fine-tuning in Azure ML spot GPU cluster; and gate evaluation, where the models get discarded if their precision falls below 92%, or recall becomes less than 88%. Successful ones are deployed in MLflow model registry, which mandates compliance label tagging, and application containers get managed using blue-green deployment.

Figure 3: MLOps pipeline — seed bootstrapping → weekly LoRA fine-tuning on spot GPU → precision/recall gate → blue/green deployment → drift-triggered retraining

FINOPS: The NC4as_T4_v3 spot GPU cluster scales to zero after ten minutes of inactivity. Azure ML automatically retries any job that was interrupted by spot preemption and resumes from the previous checkpoint. Spot instances cost approximately 60–70% less than on-demand in most locations; savings differ by location and time of day..

Azure API Management as the AI Control Plane

In this architecture, all three AI tiers are served by a single APIM instance, which ensures that all security and compliance policies are enforced without burdening any of the individual model services with these responsibilities. The design allows switching between models and even adding yet another tier simply by changing backend registrations and modifying policies without altering ingestion or compliance layers.

All calls come from authenticated callers having a valid Azure AD JWT token containing a claim specifying the user's role: SOC.Tier1, SOC.Tier2, SOC.Tier3, or SOC.Admin. The inbound policy checks the JWT signature and routes the request further only after a successful validation. In case of failure, the system responds with HTTP 401 code and logs the event to Event Hub to include it in the HIPAA audit trail. The API Gateway uses the Managed Identity to securely retrieve secrets from Key Vault at run time.

Requests for processing are identified by the X-SOC-Tier header value (Tier 1, Tier 2, Tier 3). It determines the destination service and the rate limit for each tier: 1,000 requests/min in Tier 1 (real-time alerts processing), 200 requests/min in Tier 2 (compute-intensive RAG), and 50 requests/min in Tier 3 (batch processing). Upon failing to make a call, the system retries up to three more times using an exponential delay strategy and opens a circuit breaker for further requests. After a successful call, the metrics counter ai_call_count gets incremented in Azure Monitor. Alerts trigger when calls per hour exceed 500,000, a value sufficient to identify potential infinite loops.

Infrastructure as Code: One Module, Multiple Scalings

The entire Azure infrastructure is described in one Terraform module that is dependent on the number of assets. This variable determines what type of virtual machine will be used, how many nodes will be in the AKS cluster, how big the GPU cluster will be, and what the budget alerts should trigger.

variable "asset_count" { default = 50 }

locals {

aks_vm_sku = var.asset_count < 100 ? "Standard_DS2_v2" : "Standard_DS4_v2"

ml_vm_sku = var.asset_count < 100 ? "Standard_NC4as_T4_v3" : "Standard_NC8as_T4_v3"

}

Budgets for the modules' consumption have been set according to the spending levels that are 80% of actual spending and 100% of forecasted spending, ensuring that there is prior notification in case of any overages as opposed to being notified after overages have occurred. The GPU cluster gets scaled down to zero after ten minutes of idling and does not contribute anything to the overall cost per month for the six days in a week where there is no training process in progress.

Code Repository: The code repository mentioned at the end of this document contains the Terraform module code, the PHI Tokenizer pipeline code, Azure ML training code, APIM policy XML, FinOps alerting ARM templates, and the GitHub Actions workflow. The complete code is production-ready and deployable using a simple Terraform apply command.

Deployment cost estimates by scale

This is only based on the Azure environment and uses the East US 2 region pricing for Q1 2026. Wazuh, Mistral 7B, Presidio, OpenCTI, and Label Studio do not incur any license fees.

The major expenses will be generated by the AKS clusters, as they will operate constantly throughout the day. GPU computations are less costly than AKS due to the spot cluster that is turned on for just six hours weekly for training purposes. In cases where occasional downtimes are tolerable in one-region deployments, the APIM Developer SKU at $49 monthly is ideal since it lacks any SLA uptime guarantee. Otherwise, the Standard SKU at $280 per unit per month has 99.95% uptime guarantees.

A phased deployment approach is suggested for the small IT team (1–2 members). A single sprint will be too difficult to handle by the team, and the system will lack enough testing opportunities. The following timeline is recommended to be covered within 7 months, including labeling of the training set.

The confidence threshold starts off very high (90%) and gradually becomes lower once the behavior of producing predictions by the machine learning algorithm has been proven against reality. In effect, this process reflects the steps taken to implement software in CI/CD, i.e., canary and finally full release.

CI/CD Pipeline

Infrastructure and model updates are deployed in a pipeline of five stages through GitHub Actions. The entire process is free of static credentials, thanks to the use of OIDC federation with Azure AD. In Stage 2, there are four quality gates, including pytest tests for the PHI tokenizer and APIM policy schema, Trivy scans for container and Terraform IaC vulnerability and misconfiguration, terraform validate and terraform plan, and a Docker build and publish to Azure Container Registry. These four must all pass before Stage 3 commences.

In Stage 4, there is a deployment to a mirror staging of a production environment, followed by load testing with 1,000 alerts per minute, and an automated PHI boundary audit. There is also a GitHub Environment Protection Rule that requires explicit approval by a named security engineer prior to Stage 5 deployment. Finally, Stage 5 will do a rolling update without downtime through Helm, switch the revision of the Container App, and perform smoke testing that will revert to the last known good revision on failure.

Figure 4: CI/CD pipeline — GitHub Actions trigger → parallel quality gates → dev → staging with manual approval → production blue/green with auto-rollback

Known Tradeoffs and Engineering Constraints

Model regression precedes its negative impacts

Fine-tuning Mistral 7B does not change its ability to follow instructions. Hence, a model that has deteriorated its performance against a newly introduced attack will show progression in classification accuracy, and not deterioration. The evaluation gate and confidence threshold reveal model degradation due to queue deepening as more human reviews happen before incidents occur.

Preemptive action occurs within Azure ML and not the application code

Azure ML manages the training environment and reverts to the last checkpoint if spot nodes are reclaimed. This allows engineers to train without worrying about preemptive action. Schedule your job to take twice as long as the job to allow for Azure ML retry attempts after preemptive actions.

APIM Developer SKU offers no SLA protection

Wazuh continues to send alerts and queuing incidents during gateway outages, while only the AI layer stops operating. Organizations looking to ensure 99.95% availability should use the Standard SKU costing $280 per month per unit. Multi-region active/active deployment needs the Premium SKU, which lies outside the SMB target segment.

Enhancing Presidio with custom recognizers

Presidio provides out-of-the-box recognition for most HIPAA Safe Harbor identifiers but lacks the capabilities to detect others — specifically the device serial number and organization-specific identifier types. Any regulated organization using Presidio needs to conduct a de-identification risk assessment per 45 CFR §164.514(b) and incorporate those custom recognizers into their deployment.

Conclusion

Security operations centers still represent one of the few remaining bastions of the traditional software delivery system that have resisted adoption of the MLOps chain of tools and technologies that is revolutionizing software development and distribution. The problem has not been a lack of technical readiness; models, cloud infrastructure, and open source libraries have had the capability for quite some time. Rather, the problem has been connecting log ingestion, model inference, and controls together in such a way as to make sense for a small team lacking a full security operation.

This architecture serves to fill that gap for the industry segment hit hardest by a shortage of analysts — small to mid-sized companies operating in regulated sectors. In an environment where only a 1–2 person IT staff deploys the platform described, the coverage can be provided that would traditionally require five to eight dedicated analysts, all at an infrastructural cost well below even the monthly pay rate of a single analyst, with hard enforcement of the PHI boundary at every level of the technology stack.

https://wazuh.com/blog/cloud-security-posture-management-on-microsoft-azure/

https://wazuh.com/ambassadors-program/

#open-source #ai-augmentation #wazuh #information-security #soc

< Go to the original

Open-Source AI-Augmented SOC on Azure: Automating Tier 1–3 SOC Operations with Fine-Tuned LLMs

Key Takeaways

Reporting a Problem