Perspective 9 min read Apr 16, 2026

The Code Nobody Understands

Your marketing infrastructure utilises code nobody reviewed. AI has made it easier to produce. It didn't make it easier to understand. Here's what that costs — and what to do about it.

The "dark code" concept, and the core argument about AI-generated code that ships without comprehension, comes from Nate B Jones. The application to marketing infrastructure is ours. Watch the original video or read the Substack article.

Your marketing stack runs on code now

Marketing infrastructure is no longer config-and-clicks. Your GTM container has custom JavaScript tags. Your CRM automations are branching logic with API calls. Your consent orchestration is conditional code that determines what data gets collected, from whom, under which jurisdiction. Your AI-assisted content pipelines are prompt chains with evaluation logic. These are code — and they behave like code, which means they break like code.

When this code was written by a person who understood the business constraints — the attribution model, the compliance requirements, the consent architecture — the system reflected intent. The person who wrote the GTM tag knew why the trigger was scoped to a specific URL pattern. The person who built the CRM workflow knew what should happen when a deal moves backward.

When the code is AI-generated at speed, it reflects the prompt — which may not capture intent. The AI doesn't know your attribution model. It doesn't know your compliance environment. It produces code that works in the narrow sense of executing without errors. Whether it does the right thing is a question nobody asked.

What dark code actually is

Dark code is code that was generated by AI, passed automated checks, shipped to production — and was never understood by anyone. Not the AI that wrote it. Not the person who prompted it. Not the team that deployed it.

This is not buggy code. Buggy code fails visibly. This is not tech debt. Tech debt is a known trade-off. This is not spaghetti code. Spaghetti code was understood by someone, once — the person who wrote it made choices, even bad ones. Dark code was never comprehended by a human mind. It exists because the tooling made it trivially easy to produce and nobody had a process that required understanding it before shipping.

Two forces multiply it. First, AI-generated code is structurally harder to understand than human-written code — no human reasoning trail, no iterative build-up of logic that a reader can follow. Second, AI velocity discourages pausing to read the diff. When you can generate a working feature in thirty seconds, spending fifteen minutes reading and understanding the output feels like friction. The economics push toward shipping, not comprehending.

The marketing version of this is already familiar: a GTM container with 47 tags and three people who each added some of them, none of whom understand all of them. A CRM with automation workflows built by a contractor who left, modified by an intern who improvised, and maintained by nobody. AI accelerates this pattern by an order of magnitude. The container doesn't have 47 tags anymore. It has 147. And nobody read the code in any of them.

What makes this worse than traditional code sprawl: each person prompts the AI in a separate session. There's no shared context between sessions — no accumulated understanding of what was already built, what the architectural constraints are, or what the last person's code assumed. Three people across five sessions produce five isolated outputs with no holistic context layer connecting them. The AI doesn't remember. The humans don't coordinate. The infrastructure accumulates code that was never part of a coherent whole.

Why the obvious fixes don't work

The instinct is to solve dark code with more tooling. Three approaches come up repeatedly. Each addresses symptoms. None addresses whether anyone understood the code before it shipped.

"We have observability."

Observability tells you what broke. It does not tell you why the code exists, what it was supposed to do, or whether it's doing something subtly wrong that hasn't triggered an alert yet. A dashboard showing green metrics on a misconfigured consent tag is not observability — it's false confidence with a professional interface. Monitoring tells you when the system deviates from expected behavior. It cannot tell you whether the expected behavior was correct in the first place.

"We use AI to review AI-generated code."

This adds layers. When something fails, you troubleshoot through dark code reviewing dark code — an AI's assessment of code that neither it nor you wrote, evaluated against criteria that may themselves be AI-generated. The review is confident. The review may also be wrong. And you have no way to distinguish AI overconfidence from AI correctness without the domain knowledge that would have prevented the problem in the first place.

"AI will fix it when it breaks."

The companies building the AI tools themselves — the ones with the most sophisticated AI-assisted development workflows on the planet — require human comprehension before code ships. Amazon rebuilt their AI coding assistant around the principle that specs must precede code generation. If the organizations with the best AI can't trust AI to self-correct, the assumption that your marketing infrastructure can is not optimism. It's exposure.

The pattern across all three: each fix assumes the problem is operational. The problem is epistemic. Nobody knows what the code does. Better monitoring of code nobody understands gives you faster alerts about failures nobody can diagnose.

What actually works — three layers

The organizations shipping reliably with AI — including the ones building the AI — converge on three structural disciplines. None of them are tools. All of them are practices that require someone to understand the system before it ships.

Layer 1

Spec before code

Define what the system should do before building it. The spec becomes the evaluation criteria — you can test whether the code does what the spec says, not just whether it runs without errors. Amazon rebuilt Kiro — their AI coding assistant that turns natural-language prompts into structured requirements and task lists before generating a single line of code — around this principle after a December 2025 outage. This is what the Blueprint does in an Architect engagement: the spec is reviewed and approved, then the build executes against it.

In practice: A company asks an AI tool to "set up conversion tracking." Without a spec, the AI generates tags that fire on page loads. Six months later, conversions are double-counted because nobody specified which events count as conversions, which are duplicates, and which pages should be excluded. A one-page spec — defining conversion events, deduplication rules, and page scope — would have caught it before the first tag fired.

Layer 2

Self-describing systems

The system announces what it is without asking a person. Structural context — a manifest per module that states what it does, what it depends on, and what depends on it. Semantic context — behavioral contracts that define what happens when this component fails, what its retry semantics are, what data it expects and what it produces. Eval-driven development — tests that verify both correctness and comprehension, not just "does it run" but "does someone understand why it's built this way."

In practice: A CRM automation sends a winback email when a deal moves to "Lost." But nobody documented the behavioral contract: what happens when a deal moves to Lost and back to Active in the same day? Without a documented failure mode, the automation fires anyway — and the prospect who just re-engaged gets a "we're sorry to see you go" email. A behavioral contract would have specified: "If deal status changes twice within 24 hours, suppress the winback trigger."

Layer 3

Comprehension gate

An AI-mediated filter that asks senior-engineer questions before code ships. Not "does it work?", but "can someone explain why it's built this way?" If the answer is no, the code doesn't ship — regardless of whether the tests pass. The gate doesn't replace human judgment. It enforces the practice of requiring it.

In practice: A GTM container has a custom JavaScript tag that fires on every page. It was added by an AI assistant during a tracking sprint. The tag works — events appear in GA4. But nobody can explain what the regex in the tag is filtering. When the site redesign changes URL structure, the tag silently stops matching half the pages. A comprehension check would have flagged: "What does this regex match, and what happens when URL patterns change?"

What to ask your vendors — and yourself

Five questions. If you can't answer them about your own infrastructure, the perception gap isn't just in your data — it's in your source.

1

Show me the spec. What document defined what this system should do before someone built it? If the answer is "the Jira ticket" or "the Slack thread," there is no spec.

2

Who reviewed this? Not who approved the pull request — who read the code and can explain why it's structured the way it is?

3

What percentage was AI-generated? Not as a judgment — as a risk indicator. AI-generated code that was reviewed and understood is fine. AI-generated code that shipped without comprehension is dark code.

4

Can your team explain a random module? Pick any component of your marketing infrastructure. Can the person responsible for it walk you through what it does and why? If the answer is "we'd need to look at it," nobody holds the mental model.

5

When did someone last read a critical path end-to-end? Not check a dashboard. Not review a report. Read the actual logic — the tag code, the automation workflow, the consent configuration — from trigger to output.

If you're a founder shipping with AI — and increasingly, every founder is — the risk is higher for you, not lower. You don't have a second pair of eyes on any commit. Every line of code your AI produces ships with exactly one person's understanding: yours, if you read it. The countermeasure is structural. Spec before code. Self-describing systems. A gate that asks whether someone understands it, not just whether it works.

The perception gap doesn't just live in your data. It lives in your source code. The question is whether you know it's there.

Start with what your infrastructure is actually doing

The forensic diagnostic surfaces your tracking accuracy, compliance posture, and attribution reliability across ten pillars. Five minutes, no sales call required.