Phase 3 (Part 2): Watching Governance Catch Its Author ๐
Part 1 covered the wiring: Log Analytics workspace, managed identity permissions, the expanded initiative, the version bump. Useful work, but inert, none of it had been exercised.
This post is about exercising it. Spinning up something deliberately broken, watching the system find it, and letting the remediation engine do its job. It's also about the thing I didn't expect: the system catching a mistake I hadn't noticed I'd made.
The Test Resource
I created a storage account called governancelabstorage1 in rg-governance-lab โ tagged correctly so it wouldn't fail the Phase 1 tag policy, deployed in the right region so it wouldn't fail Allowed locations, but with no diagnostic settings configured. That made it a clean target for the DeployIfNotExists policy: compliant on everything except the one thing I wanted to test.
Or so I thought.
Forcing the Scan
Azure Policy evaluates new resources automatically, but the dashboard takes its time to update. I forced it from Cloud Shell:
az policy state trigger-scan --resource-group rg-governance-labA few minutes later, compliance dropped from 100% to 75%. Two policies were flagging the test resource as non-compliant. One of those was a surprise.
The Expected Failure
The diagnostic settings policy fired exactly as designed. The new storage account had no diagnostic settings, the policy noticed, marked it non-compliant, and waited for remediation.
I created a remediation task scoped to the resource group. The managed identity โ which I'd granted Log Analytics Contributor and Monitoring Contributor at the management group level back in Part 1 โ used those roles to deploy a diagnostic setting named storageAccountsDiagnosticsLogsToWorkspace, wired to the law-governance-lab workspace exactly as parameterised in the initiative.
The full chain held without manual intervention. Policy evaluated, marked the gap, remediation kicked off, managed identity authenticated, deployment landed. Self-healing, end to end.
The Unexpected Failure
The other non-compliance flag was the interesting one. My custom Block Public Blob Storage policy was reporting the test storage account as non-compliant โ which shouldn't have been possible, because I'd been careful about the toggles during creation.
Except I hadn't. Looking at the storage account's Configuration page, Allow Blob anonymous access was Enabled. The Portal's "Advanced" tab during creation has two near-identical-sounding toggles, and I'd missed which one set the account-level property.
A two-click fix to disable it, another forced scan, and the policy flipped to compliant. But the more interesting part is what just happened:
The policy caught a misconfiguration I didn't know I'd made.
That's a different thing from "I deployed a deliberately bad resource and the policy correctly flagged it." It's the system doing the job I built it for โ without me having to be paying attention.
What I Learned
A few things crystallised that the wiring work in Part 1 didn't.
Audit mode earns its place. The public-network-access policy I added in Audit is still showing the test resource as non-compliant, and that's correct. It's quietly accumulating evidence I'll use in Phase 5 to scope an Audit-then-Deny rollout properly. Cold-deploying that policy to Deny on day one would have been the wrong move โ there's a real configuration in the lab that violates it, and I want to see it before I block it.
Portal UX is a governance hazard. Two toggles, similar names, defaults that surprised me. That's not a problem with me, or even with the Portal โ it's the reason preventative controls exist. Humans miss things. Policies don't.
75% compliance is information, not a failure. I caught myself wanting to see 100% on the dashboard. But 75% accurately reflects reality: four policies clean, one policy correctly flagging a real audit-mode finding I haven't fixed yet. Chasing 100% by silencing the audit policy would defeat the entire purpose.
What's Next
Phase 3 is closed. The full preventative-and-remedial pattern is wired, exercised, and proven against real misconfiguration.
Phase 4 is Policy as Code. Everything I've built in the Portal so far comes out as Bicep, gets committed to a Git repo, and gets deployed through GitHub Actions with OIDC federated authentication. Pull requests get a what-if preview of policy changes before they merge. Versioning becomes git history instead of a metadata field.
That's the phase where governance stops being a thing I build and starts being a thing other people can review, audit, and roll back. It's also the phase where the project earns its "production-ready" framing โ because nothing built only in the Portal really is.
The system caught its author. Twice in one phase. That's not a flaw in the test โ that's the test working.