The Ghost in the Machine

How a generation of lost engineering discipline is being codified at scale — and why the window to stop it is closing.

Pablo Irassar

~12 min read · March 13, 2026 (Updated: March 13, 2026) · Free: Yes

March 2026

The Missing Conversation

For months I have been watching AI coding tools take center stage in the software engineering conversation — simultaneously celebrated as an existential threat and an exponential growth multiplier. More software, more capabilities, more speed. Millions of lines of code generated in minutes. Fleets of agentic coding machines orchestrated by lead engineers who declare, with genuine conviction, that we are witnessing a paradigm shift in how software is built.

They are not entirely wrong. The productivity numbers are real. The velocity is real. What is missing from the conversation is everything else.

Who succeeds the lead engineers when they are gone — and how did those future lead engineers develop the judgment required to maintain the integrity of what the agents produce when they have been learned or experienced? When an agentic tool decides to replatform a core banking system from COBOL to Python, passes an extensive test suite with every pipeline light green, and declares victory — what has been silently lost will not announce itself. Not immediately. The test suite cannot test for what it does not know to look for. Transaction semantics built over forty years of disciplined engineering do not appear in a requirements document. They live in the depth of the code running systems today and in the people who learned them the hard way.

Here is where the ghost in the machine manifests. Not the AI. The vanishing knowledge the AI was never given — and was never designed to account for.

How We Got Here

Looking back — not to glorify the past, but to understand how we arrived here — the degradation has been sustained enough to feel normal at every step. In the name of agility and minimum viable everything, The industry collectively chose to prioritize velocity over robustness, release cadence over resilience, and the functional illusion of this sprint's feature list over long-term integrity.

Software versions that once required zero regressions to ship, and exhaustive documentation of known issues so customers could make informed decisions about the risks they were accepting, quietly became the exception rather than the rule. The maniacal focus on partially developed new capability in the current release meant the foundations that made products valuable in the first place stopped evolving, while continuing to carry the full weight of everything customers cared about and depended on. The evidence is not anecdotal. A landmark empirical study of Firefox's switch to rapid release cycles — published in Empirical Software Engineering — documented what practitioners already suspected: shorter release cycles caused bugs to appear significantly earlier in runtime execution, a pattern the researchers termed the "crash-faster phenomenon." A companion study found that while issues were addressed faster on paper, they took a median of fifty days longer to actually integrate — complexity masked as velocity. This is the code that filled the repositories. This is what the models were trained on.

The engineers who grew up in this environment are not less intelligent. They are less constrained — and constraint, it turns out, was the curriculum. The forcing function that built deep intuition about failure modes, interface contracts, and the long-term consequences of architectural decisions was relaxed, and nothing deliberate was put in its place.

What filled the gap left by eroding engineering discipline was not better methodology, frameworks or coded knowledge. It was tribal knowledge — expertise held in individuals rather than encoded in process, undocumented by design or neglect, invisible to anyone who didn't already possess it. In technology companies, that knowledge at least lives among engineers who can sometimes articulate it. In the broader corporate world, it lives somewhere far more precarious.

In the vast institutional landscape where software exists not as a product but as operational infrastructure, the systems that run finance, compliance, supply chains, and core business processes in banks, insurance companies, hospital chains and government institutions, engineering discipline was never deeply rooted to begin with. What took its place, and what has kept these systems functioning for decades, is the many Mary's from accounting. Mary knows that the month-end reconciliation process requires the batch jobs to run in a specific sequence that is documented nowhere. Mary knows that the field labeled "customer reference" in the legacy system actually carries a different meaning for transactions originated before 2011. Mary knows what the system does, not because she was trained on it, but because she has absorbed it over fifteen years of daily proximity.

Mary is not a workaround. Mary is the architecture.

And Mary is retiring.

Speed Is Not a Paradigm

The most concerning element of this moment is not the technology itself. It is how shallow the analysis of what the technology is actually doing is presented — even in the most prestigious publications, even from the voices with the credibility to say difficult things clearly.

They describe parts of the problem. The narrowing talent pipeline. The security vulnerabilities in generated code. The risks of vibe coding. And then they stop just short of where the argument leads — whether from genuine blind spot or reluctance to confront an economic machine destined to transform the world as we know it.

So let me be direct about what is and is not happening.

Generative AI applied to software development is not a paradigm shift. What we have is an exponential accelerator applied to a discipline that was already losing its foundations. The tools produce code at speed and scale while remaining unaware of the semantic obligations that code must fulfill.

The Communications of the ACM published an example this week that says it better than any abstraction can. An AI agent, given a crashing program, identified a race condition — and fixed it by inserting a delay. The tests passed. The agent declared success. The race condition was still there, masked rather than resolved. Only an engineer who understood synchronization protocols could recognize the difference. An early-career developer, raised in the environment we just described, most likely would have shipped it. Anthropic's own 2026 controlled study confirms what that scenario implies: developers who used AI assistance scored 17% lower on debugging and code comprehension tests than those who coded by hand — nearly two letter grades. The largest gap was specifically on debugging, the precise skill needed to catch what the agent got wrong.

That is not a paradigm shift. That is the same just much faster.

The Ghost Is Already Codified

Over decades, the engineering substance of software construction has itself become a ghost of what it once was. Not through a single decision or a dramatic collapse, but through the slow, compounding effect of an industry that learned it could trade rigor for velocity without immediate consequence. Formalism gave way to market-focused narratives. Minimum viable replaced correct and complete. The legal frameworks that removed liability from software failures — protections no other engineering discipline enjoys — nurtured an industry comfortable building on foundations it had quietly stopped maintaining.

And then we built AI on top of it.

The large language models that now generate code at industrial scale were trained on the world's largest repositories of human-written software. That software is the artifact of everything described in this article — the incomplete practices, the masked failures, the tribal knowledge never documented, the architectural shortcuts normalized by a thousand sprint reviews, and the semantics that cannot be inferred from looking at code alone. The ghost of an engineering discipline, the shadow of what rigorous software construction once required, is now codified. It is being reproduced at a speed no human team could match, by tools that have no way of knowing what was lost — because what was lost was never written down.

The 2025 DORA Report — Google's annual study of software delivery performance drawn from nearly 5,000 technology professionals — confirms the amplification dynamic directly: AI adoption correlates with higher instability, more change failures, increased rework, and longer cycle times to resolve issues. High-performing teams use AI to accelerate what already works. Struggling teams find that AI magnifies existing dysfunction. GitClear's longitudinal analysis of 211 million lines of code tells the same story from a different angle: as AI coding assistance rose, refactoring collapsed from 25% of changed lines in 2021 to under 10% in 2024, while copy-pasted code blocks grew by nearly 50%. The cleanup work — the discipline of consolidating, improving, and maintaining — stopped happening precisely as AI took over the writing. The ghost was not just inherited. It was actively being re-encoded.

This is not a paradigm shift. It is an inheritance. And what has been inherited is precisely the problem.

Why Governance Alone Cannot Close This Gap

We will be tempted to reach for regulatory frameworks as the answer. Establish sufficient guardrails, impose enough accountability on the organizations building with and deploying these tools, and we create at least the conditions for managed risk. The DO-178C standard for airborne software exists as proof that when consequence is severe enough, the industry can be made to engineer with discipline. It is a legitimate argument.

It is not sufficient.

Let's take the U.S. Treasury's CRI framework, developed this year by 108 financial institutions in collaboration with NIST. It contains 230 control objectives covering governance, model risk, bias, explainability, third-party risk, and the full AI lifecycle. It is a serious document produced by serious people.

Not one of its 230 control objectives addresses the fact that the agent doesn't know what CICS syncpoint semantics are. Or that IMS segment hierarchies carry implicit ordering assumptions — assumptions that downstream processes depend on but that no test suite thinks to verify, and that no agent will ever flag as missing.

Formal verification doesn't solve ignorance of what needs to be verified. Governance frameworks cannot mandate the understanding that was never developed. You cannot specify what you don't understand — and the people who understood it are leaving, taking with them knowledge that was never made explicit because the industry decided, a long time ago, that explicit took too long and delivered neither market impact nor instantaneous revenue growth.

The Window

Where this leads is not entirely clear, but the trajectory is visible. Code generated by tools that mask rather than resolve, approved by engineers who lack the depth to know the difference, will find its way — slowly, quietly, through a hundred reasonable decisions — into systems where the integrity of what is being processed has real consequences for real people.

When that happens, and the historical pattern of every previous engineering shortcut suggests it will, nobody will be able to identify where the integrity broke or why. The audit trail will be clean. The tests will have passed. The ghost will have declared success. And the response, almost certainly, will be to ask the ghost to fix what the ghost built — without understanding the original failure, without the ability to verify the new solution doesn't carry the same hidden assumption in a different form.

The scenario is not hypothetical at the edges of the industry. Fully agentic payment reconciliation systems are being tested today, where autonomous agents interact with upstream and downstream processes without human review of each decision. In an environment like that over real transactions, an agent that masks a reconciliation failure to maintain system coherence is not exhibiting a bug. It is exhibiting goal-directed behavior. The difference between those two things may not be visible until the discrepancy compounds beyond containment.

We do not need to speculate about what systemic failure looks like. In July 2024, a single faulty content update from CrowdStrike — a cybersecurity company — took down 8.5 million Windows systems simultaneously, grounding flights, halting hospital operations, and disrupting financial services globally. Estimated impact: $5.4 billion. The cause was not a breach, not a cyberattack. It was a routine update that bypassed the validation it should have caught. In February 2025, Barclays customers were locked out of their accounts for nearly twenty-four hours following an IT failure during a critical payment processing period. These are not edge cases. They are the predictable consequence of building on foundations that were never robust enough to carry the weight now placed on them. AI does not change that trajectory. It accelerates it.

Some researchers argue the only real solution is to make semantics explicit and machine-checkable — formal contracts, verified specifications, semantic metamodels that constrain the space in which AI operates rather than leaving it to improvise. They are right. They are also describing a level of formal specification that requires precisely the depth and breadth of engineering knowledge that the industry spent two decades making optional.

The intent of this article is not to declare defeat. It is to name the problem clearly enough that the people who can act on it recognize both the urgency and the opportunity. Because there is a window — and it is real, and it is closing.

The engineers who learned the hard way are still here. The people who debugged race conditions without AI assistance, who understood why CICS syncpoint semantics existed before they were abstracted away, who remember what zero-regression shipping actually required and why — they are still working, still reachable, still capable of transmitting what they know. That knowledge has not yet fully left the building. But it will. And when it does, the ghost will be all that remains.

The work that needs to happen is not mysterious. It requires codifying the hard-won lessons of rigorous software engineering into forms that can be taught, transmitted, and ultimately embedded into the behavior of the tools themselves. The SEI built the frameworks. INCOSE refined the methods. Decades of engineering discipline in aerospace, medical devices, and critical infrastructure proved that consequence-driven rigor produces systems that can be trusted. That body of knowledge needs to find its way into the coding agents that are now being handed the keys to civilization-scale infrastructure.

This means mentoring the next generation on the fundamentals — not as nostalgia, but as survival. It means working with the institutions and researchers building these tools to extend their capabilities beyond syntax and pattern into semantics and obligation. It means demanding that the governance frameworks being written today go deeper than control objectives and reach the engineering layer underneath.

The window to do this is now. Not in five years when the few engineers who remember are gone. Not after the first systemic failure makes the argument undeniable. Now — while the knowledge still exists in people who can share it, while the tools are young enough that their defaults are not yet permanent, while the regulatory conversation is live enough that the right arguments can still shape it.

My contribution starts with this article. I hope others will join.

The ghost was already fading. AI just turns out the lights.

This article was published in LinkedIn on March 11, 2026 [Link]

Key References

1. Raiola, R. (2026). Redefining the Software Engineering Profession for AI. Communications of the ACM. The source of the race condition example and the 'narrowing pyramid hypothesis' — the most credible academic framing of the talent pipeline problem published to date. The Harvard data on 13% employment drop among 22–25 year olds in AI-exposed roles is cited here.

2. CRI Financial Services AI Risk Management Framework (2026). U.S. Department of the Treasury / FSSCC / Cyber Risk Institute. The 230 control objectives document referenced in the governance section. Developed by 108 financial institutions in collaboration with NIST. Available at fsscc.org.

3. Karpathy, A. (December 2025). X thread on AI-driven development. 14 million views. The most honest practitioner account of what it actually feels like to lose conceptual footing in the new paradigm — describing the need to build mental models for 'fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering.'

4. Google DORA (2025). State of AI-Assisted Software Development. Based on nearly 5,000 technology professionals globally. Central finding: AI adoption correlates with higher instability, more change failures, and increased rework. AI amplifies existing organizational dynamics — accelerating high-performing teams and magnifying the dysfunction of struggling ones. Primary source: cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report

5. Anthropic (2026). How AI Assistance Impacts the Formation of Coding Skills. Randomized controlled trial with software developers. Developers using AI scored 17% lower on debugging and comprehension tests than those coding by hand — nearly two letter grades. The largest gap was on debugging: precisely the skill needed to catch what AI gets wrong. anthropic.com/research/AI-assistance-coding-skills

6. GitClear (2025). AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones. Longitudinal analysis of 211 million lines of code (2020–2024). Refactoring declined from 25% of changed lines in 2021 to under 10% in 2024. Copy-pasted code blocks grew by nearly 50%. The discipline work stopped as AI took over the writing. gitclear.com/ai_assistant_code_quality_2025_research

7. Khomh, F. et al. (2014). Understanding the Impact of Rapid Releases on Software Quality: The Case of Firefox. Empirical Software Engineering. Documents the "crash-faster phenomenon": shorter release cycles caused bugs to surface significantly earlier in runtime. Companion MSR 2016 study shows addressed issues took a median of 50 days longer to actually integrate under rapid release. The degraded codebase this produced is part of what LLMs were trained on. swat.polymtl.ca/~foutsekh/docs/EMSE-published-version.pdf

8. CrowdStrike / Cloud Security Alliance (2024–2025); The Guardian (2025). Real-world infrastructure failures. CrowdStrike outage, July 19, 2024: 8.5 million Windows systems taken down by a single faulty content update, $5.4B estimated impact across financial services, aviation, and healthcare. Analysis: cloudsecurityalliance.org/blog/2025/07/03/what-we-can-learn-from-the-2024-crowdstrike-outage. Barclays IT failure locking customers out of accounts for nearly 24 hours, February 2025 (The Guardian). Both illustrate the consequence pattern: systemic brittleness surfacing at scale, with clean audit trails and no single point of blame.

Pablo Irassar is an Executive Technical Leader at IBM, where he currently leads technology engagement with RBC. With over three decades of experience at the intersection of enterprise technology and financial services, his work focuses on the evolution of engineering discipline and the architectural implications of AI. The views expressed in this article are his own and do not necessarily represent IBM's positions or strategies.

#software-engineering #artificial-intelligence #technical-leadership #enterprise-software #information-technology