The illusion of control in AI-assisted engineering

Dashboards are green. Reviews complete on time. Audits pass. And the organization is slowly losing track of what its own systems actually do.

I keep running into this in teams that use AI-assisted engineering a lot. The controls are all still there. They're just not controlling what anyone thinks they're controlling.

The distinction nobody separates

When teams adopt AI-assisted development, the first question management asks is: Is a human still reviewing the output? Yes. Code review gates are there. Architecture sign-offs happen. Compliance checklists get filled in.

But the word "oversight" covers two different things. One is operational control: can you approve, reject, or modify the output? The other is epistemic control: do you actually understand what was built, why it works this way, and what it assumes?

AI-assisted workflows keep the first and slowly destroy the second. Every governance framework I've seen was designed when these two things were inseparable. The engineer who wrote a module was the same person who reviewed it and signed off on it. They understood it because they built it. That coupling is broken now, and most orgs haven't noticed.

What governance theater actually looks like

I watched this happen on a project last year. A team was building an integration layer between an enterprise platform and a third-party data provider. AI tooling generated the service contracts, transformation logic, and error-handling paths. Engineers reviewed everything. And here's what surprised me: the code review was thorough. More thorough than most reviews I've seen. Line-by-line, comments on the PR, long discussions about edge cases. Architecture board signed off.

Six months later, a cascading failure traced back to a silent retry pattern in the error-handling logic that misbehaved under load. The postmortem asked who understood the design intent behind that module. Nobody. The AI generated it. The engineer reviewed it carefully, caught formatting issues and a null-check bug, but never asked why the retry logic was structured that way because the review was focused on "is this correct," not "what is this assuming."

The review wasn't sloppy. That's what made it disturbing. The process did exactly what it was designed to do. It just wasn't designed for code that arrived without anyone having thought through the design.

The compliance version is worse. Imagine software company under financial regulation gets asked during an external audit to explain the reasoning behind an algorithmic approach touching customer data classification. The honest answer? It was AI-generated and the team validated the outputs without tracing the reasoning. No compliance framework accepts that. You either make up a rationale after the fact or admit a gap the auditors weren't looking for.

How responsibility diffuses

In a normal engineering workflow, you can point at someone and say: you designed this, you understood it, you signed off. Architecture review boards exist partly to create that paper trail. When something breaks, you know who to ask.

AI-assisted workflows put something in the authorship chain that can't be asked, can't explain itself, and can't take responsibility when a regulator shows up. Responsibility doesn't move to the AI. It just gets spread thin across everyone who touched the output. Everyone reviewed it. Nobody wrote it. When an incident review or an auditor asks "who understood this design," nobody answers.

You don't notice this in any one sprint. But after a year of it, an organization ends up with a growing pile of production systems where nobody recorded the design intent, nobody captured the assumptions, and the engineers maintaining the code can't explain why a service boundary is where it is or why the retry logic works the way it does. The code passes tests. The architecture diagrams exist. The understanding doesn't.

The slower risks

Leadership conversations about AI in engineering tend to focus on the obvious stuff: hallucinated code, security vulnerabilities, IP leakage. Those are real. The risks that worry me more take longer to show up.

One is architectural drift. When dozens of teams across an organization generate code through AI without shared context, each team's output embeds slightly different assumptions about data models, error handling, and service contracts. Service A assumes retries are idempotent. Service B doesn't. Both pass their own tests. The inconsistency only surfaces when they're under load together, and by then the damage spreads fast.

Another: AI-generated systems are unusually hard to change later. Normal technical debt is annoying but manageable because somebody on the team once understood the code. When a system was generated and reviewed but not really authored, the next team inherits a black box. They can't refactor confidently because nobody can tell them what the original constraints were. So they wrap it, route around it, or just don't touch it. The codebase hardens in a way that's different from normal decay.

Regulators are catching up, too. In financial services and telecom especially, the questions about AI-assisted code generation are getting specific. The organizations I'd worry about aren't the ones using AI the most. They're the ones that adopted it without changing their traceability and governance to match.

Reframing the question

The question isn't whether humans are in the loop. They are. The question is whether the human hitting "approve" on a PR could explain the design to an auditor, rebuild the module if the original was lost, or predict how it fails under conditions the tests don't cover.

Reviewing code for correctness and understanding a system's architecture are different activities. Review processes were designed when they were the same activity. AI-assisted workflows have split them apart, and no one updated the process to compensate.

Some of this is fixable. Not every AI-generated module needs deep human comprehension. Boilerplate, scaffolding, config, sure. But integration contracts, error-handling strategies, anything that encodes assumptions about how other systems behave? Someone needs to understand those, and "reviewed the PR" isn't the same as understanding them. Architectural reviews need to start asking "why is this designed this way" instead of just "does this look right." Traceability for AI-generated code needs to capture the assumptions, not just the output.

If leadership teams keep the review ceremonies running while the understanding underneath thins out, they're not managing risk. They're performing risk management. And that works until someone actually tests it, which in regulated industries tends to happen during an incident or an audit, when the consequences are already serious.

Questions worth sitting with

Can your organization tell the difference between systems your teams approved and systems they actually understand? Does anyone track that? I've never seen it tracked.

If a regulator or an acquirer asked your team to walk through the design intent behind a critical integration, who would take the call? Would they be explaining what they know, or reverse-engineering it on the spot?

Have your sign-off processes changed at all since AI adoption started? Every code review approval and architecture sign-off carries an implicit claim: "I understood this." If that claim has quietly become "I checked this," the paperwork is still flowing but the assurance behind it isn't there anymore.

I don't have good answers to any of this. I'm working through it myself.