All writing

AI ROI Map: Where It Pays, Where It Pretends To

Most AI projects I encounter are solving for the demo, not the dollar. Teams ship a chatbot, leadership applauds, and six months later nobody can point to a number that moved. The gap isn't technology — it's that most companies deploy AI in the wrong places first, then wonder why the business case fell apart.

There's a structural reason for this. AI ROI isn't uniform across a business. It's sharply concentrated in a small number of activity types — and diffuse, sometimes negative, everywhere else. If you can't name the specific mechanism by which AI is compounding value in your context, you're probably in the diffuse zone.

Let me give you the map.

The Three Zones Where AI Actually Pays

The returns concentrate in three places across fintech, logistics, travel, and ops-heavy industries:

1. High-frequency, low-variance decisions at scale

This is the richest vein. Think: loan pre-screening, fraud flagging, customer tier routing, inventory reorder triggers. These decisions are made thousands of times per day, each one is cheap, and the quality bar is "better than a tired human following a flowchart." AI clears that bar easily. The ROI is multiplicative because you're not automating one decision — you're shifting the distribution of thousands of decisions simultaneously.

The signal that you're here: a human is currently making the same judgment call repeatedly with minimal new information each time. If you can describe the decision logic in a paragraph, you can probably encode it.

2. Bottlenecks that gate downstream value

Some tasks don't happen thousands of times per day — but they block everything else when they're slow. Document review before a contract can close. Compliance checks before a feature ships. Translation/localization before a product can enter a new market. When AI removes the bottleneck, it doesn't just speed up one step — it unlocks the entire downstream pipeline.

Imagine a team where every legal contract requires 3 days of manual review before procurement can proceed. AI-assisted review that cuts that to 4 hours doesn't just save review time — it changes the company's ability to move. That's where ROI gets asymmetric.

3. Knowledge retrieval at the frontline

Customer-facing and ops-facing staff spend a disproportionate share of their day hunting for information: policy docs, product specs, procedure manuals, previous case notes. An internal RAG system (retrieval-augmented generation, grounding the LLM in your actual corpus) can compress "find the answer" from 8 minutes to 45 seconds. Consider a contact center with 50 agents handling 40 queries each per day — that's 2,000 lookups daily. At even a modest $25/hour fully-loaded cost, shaving 7 minutes per lookup is worth roughly $145,000 annually in recovered capacity, before you count the downstream customer satisfaction impact. Unlike hallucination-prone generative tasks, retrieval quality is measurable and improvable.

This is also one of the fastest things to ship. A basic RAG pipeline over internal docs can be production-ready in days, not weeks. The slow part is always data hygiene and access permissions, not the engineering.

The Three Zones Where AI Pretends to Pay

1. Replacing human judgment in high-stakes, high-variance situations

AI is not good at novel situations. When the inputs are unfamiliar, the output degrades badly — and in high-stakes contexts, that degradation is expensive. The mistake is confusing "AI can do this task" with "AI should own this task end-to-end." For medical diagnosis edge cases, nuanced enterprise sales, or crisis communications, AI as a drafting assistant is useful. AI as the decision-maker is a liability.

The tell: if you'd need to explain the outcome to a regulator, a board, or a customer who was harmed, a human needs to be in the loop — not as a rubber stamp, but as a genuine check.

2. Automating broken processes

AI faithfully executes the process you give it. If that process is broken, you now have faster failure. Teams spend months building AI workflows around a handoff process that should have been eliminated entirely. Before you automate, ask: if this process worked perfectly, would we still want it? If the answer is "actually no," fix the process first.

This is the AI version of the old automation trap — and it's more seductive now because the technology makes it so easy to build the wrong thing quickly.

3. Vanity AI features nobody asked for

AI summaries of documents users don't read. Chatbots that sit on pages nobody visits. "Smart" recommendations in a product where users already know exactly what they want. These ship because they're easy to demo, not because they solve a problem. They consume infra budget, token budget, and engineering attention — and the only metric they move is "we have an AI feature now."

The test: can you name a specific user behavior you expect to change, and how you'll measure it? If the answer is "it'll be a nice-to-have," deprioritize it.

The AI ROI Diagnostic: A Decision Table

Activity TypeVolumeReversibilityCurrent QualityROI Potential
Repetitive decision at scaleHighHighInconsistentVery High
Bottleneck task gating pipelineLow-MedMedSlowHigh
Knowledge retrieval / triageHighHighSlow/variableHigh
Novel, high-stakes judgmentLowLowExpert-levelLow / Risky
Broken process requiring redesignAnyAnyAlready badNegative
Vanity feature, no behavior changeAnyHighN/AZero

Use this table as a filter before you commit engineering time. The question is never "can AI do this?" — it's "does this activity sit in a zone where AI compounds?"

Why Most AI Readiness Assessments Miss the Point

Standard AI readiness frameworks ask about data maturity, cloud infrastructure, and team upskilling. Those things matter. But the more important question is: do you know which decisions in your business are high-frequency and low-variance?

Most organizations don't have a clean map of this. They know their org chart, not their decision architecture. The companies that get ROI fast are the ones who can answer: "here are the 5 decisions made 1,000 times per week in this business, and here is their current error rate and cost-per-decision." That's not a data science problem — it's a business analysis problem. And it takes a week to do, not a quarter.

Once you have that map, the technology choices are mostly obvious. A fine-tuned classifier for high-volume routing. A RAG pipeline for knowledge retrieval. A human-in-the-loop workflow for anything touching compliance or irreversibility. The build-vs-buy question becomes answerable because you know exactly what you need the system to do.

The Compounding Trap: Why Sequence Matters

Here's the thing most roadmaps get wrong: they treat AI initiatives as independent bets. They're not. Where you start determines what's possible next.

Teams that start in Zone 1 (high-frequency decisions) generate immediate, measurable signal. That signal — accuracy rates, error patterns, edge cases — becomes training data for the next initiative. They build internal fluency with AI systems in a low-stakes environment before moving into higher-stakes territory. Their teams develop judgment about what AI can and can't do, grounded in their own production data.

Teams that start with the vanity feature or the broken process get none of that. They get a demo, a post-mortem, and a leadership team that's newly skeptical of AI investment.

AI isn't magic — it's a compounding capability. The early wins need to be real and measurable, or the organizational muscle never develops.

One Principle Most Teams Ignore

The best AI projects aren't the ones with the most sophisticated models — they're the ones with the highest ratio of AI decision volume to human oversight cost.

A simple classifier running 50,000 inferences per day with a 2% human review queue generates more compounding value than a complex agent requiring review on 40% of outputs — even if the agent is technically more impressive. Design for throughput and confidence thresholds, not for demo-ability.

This is why I push teams to instrument their AI systems from day one — not just accuracy metrics, but confidence distributions and escalation rates. If you're reviewing more than 15-20% of outputs manually in a supposed automation workflow, you haven't automated — you've added a step. The article AI Product Stability Stack: What to Wire Up Before You Move Fast covers the instrumentation baseline in detail.

What to Actually Do

  1. Map your decision architecture this week. List every repetitive decision made in your core operations. Note volume, current error rate, and who makes it. This exercise alone will surface 2-3 high-ROI targets you're not currently pursuing.

  2. Score each candidate against the diagnostic table above. Anything in the top three rows with high volume goes on the shortlist. Everything else needs a stronger case before it gets engineering time.

  3. Start with retrieval before generation. If you haven't deployed an internal knowledge RAG system, that's your fastest path to real ROI with manageable hallucination risk. Ship it in two weeks, measure query resolution rate and time-to-answer, iterate.

  4. Set a confidence threshold policy before you build. Decide what percentage of AI outputs will route to human review, and at what confidence score. This is a business decision, not an engineering decision — make it before the system is live, not after.

  5. Tie every AI initiative to a specific operational metric. Not "improved efficiency" — a named metric with a baseline and a target. If you can't name it before you build, you won't find it after.

The companies winning with AI aren't the ones running the most experiments — they're the ones running the right experiments in the right order. Find your Zone 1. Ship it. Measure it. Then move.

Working on something like this? I take on a few fractional-CTO and AI engagements at a time.

The AI CTO playbook

Get my AI playbooks — straight to your inbox

Practical notes on shipping production AI, scaling teams, and the calls a CTO actually has to make. A few times a month. No spam, no fluff.