How to calculate the real ROI of an AI investment

This pattern repeats often enough in our advisory work to be worth naming. The same AI use case, evaluated by three teams inside the same enterprise, produces three different ROI numbers.

The CFO models cost avoidance and sees payback in months. The innovation office models capability gain and arrives at a number two or three times higher. IT evaluates integration, supervision, and risk, and concludes the analysis is indeterminate.

The disagreement is not arithmetic. It is conceptual. Each team is building a different ROI model and calling it the same thing.

This is the central problem with how organizations evaluate AI today. Both the numerator and the denominator are unstable. Benefits hide in places traditional finance does not look. Costs accumulate in categories the project sponsor does not own. Option value compounds across use cases that have not yet been imagined.

Most AI business cases are not wrong. They are incomplete in predictable ways.

What follows is a complete framework: a structured Total Cost of Ownership, a four-component value model, a risk-adjusted NPV calculation, and a portfolio scoring rubric, applied end-to-end to a single use case, with numbers.

Three teams, same use case, three different ROI numbers — Fig. 01 The same use case, evaluated by three teams, produces three different ROI numbers. The disagreement is not arithmetic, it is conceptual. Each lens is internally consistent and none of them is complete.

Why classical ROI breaks for AI

The classical ROI formula was designed for capital projects with stable scope, known costs, and benefits that materialize through familiar mechanisms. AI projects violate all three assumptions.

Scope is endogenous

AI projects rarely leave the underlying process unchanged. A system introduced to accelerate a workflow often reveals redundancies, hidden dependencies, or quality issues that were previously invisible.

The "after" state is not a faster version of the "before" state, it is a different process. ROI calculations anchored to the original baseline systematically understate value.

Costs are distributed

A naive business case counts model and software costs. The real cost structure is broader: data preparation, integration, evaluation, governance, supervision, and change management, often spread across functions that do not report to the project sponsor.

Aggregated, these costs typically exceed visible technology spend by a factor of three to five.

Benefits compound nonlinearly

A first AI use case is the foundation on which subsequent ones run more cheaply. Reusable connectors, evaluation datasets, and governance patterns become inputs to the second and third use cases. A complete ROI model must therefore account for option value, the expected NPV of follow-on investments that become feasible because the first exists. Classical NPV does not handle this unless option value is made explicit.

Any framework that ignores these three properties will produce numbers that look precise and are systematically biased.

“
The spreadsheet is not wrong. It is looking at the wrong company, the company as it operates today rather than the company AI makes possible.

The Total Cost of Ownership of an AI use case

AI is often treated as a project competing for budget. It is more useful to treat it as an organizational capability that strengthens with use.

The first use case looks expensive because it pays for learning. The tenth looks cheap because that learning has become infrastructure. This is why AI investments should be evaluated over a three-year horizon, not as isolated projects.

The first discipline is to refuse to evaluate benefits without a complete cost model. AI total cost of ownership distributes across eight categories:

Category	Typical share of Year 1 cost
Technology · LLM usage, vector DBs, observability, infra	10–20%
Integration · APIs, data pipelines, identity, workflow	20–30%
Data preparation · cleaning, indexing, access control, metadata	10–20%
Governance · legal, compliance, risk, model usage policies	5–10%
Evaluation · golden datasets, regression tests, monitoring	10–15%
Human supervision · review, escalation, expert validation	10–20%
Onboarding · user training, workflow redesign, change mgmt	10–15%
Maintenance · prompt/model updates, drift, cost optimization	5–10%

Two patterns deserve attention. First, technology is rarely the dominant cost. Executives accustomed to SaaS economics underestimate the rest by an order of magnitude. Second, the cost profile shifts over time. Integration, data preparation, and evaluation are heavily front-loaded. Maintenance, supervision, and technology become the steady state from Year 2: typically 30–50% of Year 1 total.

This is why use cases that look marginal in Year 1 often look excellent on a three-year horizon and why pilots evaluated solely on first-year economics are routinely killed before they cross the value threshold.

Any AI business case should be presented over a minimum three-year horizon, with Year 1 costs decomposed into one-time setup and recurring operations.

A four-component value framework

Costs are easier to enumerate than benefits because costs are paid in cash. Benefits arrive in four distinct forms, each requiring a different measurement approach.

Direct value ( $V_d$ )

The cash-equivalent productivity gain on the targeted task: hours saved, headcount avoided, processing cost reduced. The standard calculation is:

V_{d} = (\text{volume of tasks}) \times (\text{time saved per task}) \times (\text{fully loaded cost per hour})

This number is real but partial. A use case justified only on Vd will usually be a back-office automation: fine as a starting point, but it misses the larger sources of value below.

Quality and risk value ( $V_q$ )

The cash-equivalent improvement in error rates, compliance, consistency, or risk exposure. The calculation requires a baseline error rate, an estimated post-AI error rate, and a cost per error:

V_{q} = (\text{volume}) \times (\Delta\,\text{error rate}) \times (\text{cost per error})

This is where AI often produces its most consequential effects. A two-percentage-point reduction in mis-routed claims or incorrectly classified support tickets routinely produces more value than the entire direct-time-saved figure.

Knowledge activation value ( $V_k$ )

Most organizations sit on large reserves of latent knowledge that employees cannot easily access. AI changes the economics of retrieval.

V_k captures second-order effects: faster expert work, better customer answers, fewer escalations caused by missing context.

V_{k} = (\text{decisions affected per period}) \times (\text{P better decision}) \times (\text{value differential})

The discipline is to commit to an estimate, document the assumptions, and update them after the first measurement window.

Real option value ( $V_o$ )

The expected NPV of follow-on use cases that become feasible because the first one exists. If a first deployment establishes the data pipeline, evaluation framework, and governance pattern that follow-on use cases will reuse, a portion of their NPV should be credited to the first investment:

V_{o} = \sum \left[ P(\text{use case}_{i}) \times NPV(\text{use case}_{i}) \times (\text{cost reduction from reusable assets}) \right]

The key is to make this value visible rather than letting it disappear into "strategic benefits."

Putting it together

V_{\text{total}}= V_{d} + V_{q} + V_{k} + V_{o}

\text{NPV}_{AI} = \sum \left[ \dfrac{p_{t} \cdot V_{t} - C_{t}}{(1 + r)^{t}} \right] - C_{0}

The probability factor pt forces explicit acknowledgment of execution risk. Honest pt values for Year 1 are 0.6–0.8, not 1.0. From Year 2, after the system has stabilized and adoption is documented, pt can reasonably move to 0.85–0.95.

The four components of AI value, stacked — Fig. 02 The four value components, illustrated. Most business cases see only the top bar. The honest one models all four and accepts the uncertainty that comes with the lower three.

A worked example

The most successful AI use cases are rarely those with the most impressive demos. They are the ones where business, IT, data, risk, and operations agree on the same unit of value.

Consider a mid-sized European property and casualty insurer evaluating an LLM-assisted claims triage system. The AI proposes routing and reasoning; adjusters validate before action.

Baseline

240,000 claims per year. Current triage: 12 minutes per claim at €42/hr fully loaded. Mis-routing rate: 8%, each costing €180 in rework and delays.

Annual triage labor: 240,000 × (12/60) × €42 = €2,016,000
Annual quality cost: 240,000 × 0.08 × €180 = €3,456,000

Projected post-AI state

Triage time drops to four minutes. Mis-routing drops from 8% to 3%, confirmed by a four-week pilot.

$V_d$ · Direct

€1,344,000 / yr 240,000 × (8/60) × €42
labor saved on triage time per claim.

$V_q$ · Quality

€2,160,000 / yr 240,000 × 0.05 × €180
five-point drop in mis-routing rate.

$V_k$ · Knowledge

€48,750 / yr 1,500 × (25/60) × €78
fewer senior-adjuster escalations.

$V_o$ · Option

€1,008,000 0.6 × €4.2M × 0.4
attributable to three identified follow-ons.

During the pilot, adjusters consulted historical similar cases more often, because the AI surfaced relevant precedents alongside the routing recommendation. This reduced senior-adjuster escalations by an estimated 1,500 per year, each saving roughly 25 minutes of senior time at €78 per hour. Small relative to V_d and V_q, but genuine and invisible without explicit modeling.

The investment also establishes a claims-document indexing pipeline, an evaluation framework, and a governance pattern that three identified follow-on use cases will reuse: fraud detection assistance, customer correspondence drafting, and broker query handling. Conservative estimates suggest each follow-on use case will cost 40% less to build because of reusable assets, with combined three-year NPV of approximately €4.2M and a 60% probability of being pursued. This is an attribution, not a cash flow. It belongs in the strategic case, not the cash NPV.

Costs

Year 1 setup and operating costs:

Category	Amount
Technology · LLM API at ~€0.04/claim, infra, observability	€60,000
Integration with claims management system (one-time)	€240,000
Data preparation and indexing (one-time)	€140,000
Governance, legal, model documentation	€80,000
Evaluation framework and golden test set (one-time)	€120,000
Human supervision · Year 1, ~30% of claims double-checked	€50,000
Onboarding, training, change management (one-time)	€180,000
Maintenance reserve	€40,000
Year 1 total	€910,000

Year 2 and Year 3 recurring costs drop to approximately €350,000 per year as one-time setup expenses fall away and supervision overhead reduces.

The full calculation

Year 1 cash value, before risk adjustment:

V_d + V_q + V_k = €1,344,000 + €2,160,000 + €48,750 = €3,552,750

Applying p₁ = 0.7 to reflect first-year execution risk:

Risk-adjusted Year 1 value: €3,552,750 × 0.7 = €2,486,925

Year 1 net: €2,486,925 − €910,000 = €1,576,925

For Year 2 and Year 3, raising pt to 0.9 to reflect a stabilized system:

Annual net: (€3,552,750 × 0.9) − €350,000 ≈ €2,847,000

Three-year NPV at 10% discount rate:

Year	Net cash flow	Discount	Present value
1	€1,576,925	1.10	€1,433,568
2	€2,847,000	1.21	€2,353,719
3	€2,847,000	1.331	€2,139,743
NPV (cash)	€5,927,030
+ Attributable option value	+€1,008,000

Payback period: approximately four months on the one-time investment of ~€760,000 (integration, data prep, eval framework, governance setup, and onboarding combined).

On a V_d-only basis the case would still pay back, but it would be approximately 60% less compelling, and the strategic narrative around follow-on use cases would be entirely missing. The components matter not because they make the number bigger, but because they make the number honest.

The cost of delay

The framework above evaluates whether to invest. It does not evaluate when.

For AI, timing is often more consequential than the investment itself. A 12-month delay can erase a significant share of first-year value while competitors build the same capabilities and accumulate the organizational learning that comes with them.

In practice, "waiting" only makes sense if the organization is actively building the capabilities needed to adopt later: data readiness, governance discipline, evaluation infrastructure. Otherwise, it is not a strategy. It is a way of falling behind.

Today's bespoke build will be tomorrow's commodity feature. Waiting reduces the cost of building. Sometimes true, and often used to rationalize inaction rather than articulate strategy.

Is the organization using the waiting period to build the capabilities that will make eventual adoption faster? If not, "we'll wait" is a euphemism for falling behind.

Portfolio scoring: a weighted rubric

Single-use-case ROI calculations support investment decisions. They do not support portfolio decisions. For prioritization across competing AI investments, executives need a multi-criteria scoring model.

Four dimensions, with the following weights for most enterprise contexts:

Value

Magnitude of Vtotal, weighted by frequency, criticality, and revenue exposure.

Feasibility

Realism of execution given available data, systems, users, and governance.

Risk

Severity of failure modes and quality of available controls.

Learning

Reusability of capabilities built: connectors, eval data, governance patterns.

Each dimension is scored 1–5. The composite score is the weighted sum.

The Learning dimension deserves emphasis because it is the one most often missing from portfolio reviews. A use case with moderate immediate ROI but high learning value is frequently a better first investment than a narrow automation with attractive short-term savings and no reuse potential. The first builds capability; the second exhausts it.

For early-stage portfolios: weight Learning at 30% rather than 20% for the first three to five use cases, then shift back to 20% as the platform matures.

Three horizons of value

The framework above evaluates use cases as proposed. It does not address the harder question of what use cases the organization should imagine in the first place. Three horizons help:

Do the same work faster (H1)

The entry point: productivity gains on existing tasks.
Easy to measure, quick to deploy, but limited in value ceiling.
Use cases here pay back quickly and build organizational confidence.

Do the same work better (H2)

AI improves quality, consistency, and decision preparation.
Harder to quantify than H1 but typically larger, because it captures Vq and Vk effects that H1 misses entirely.

Do different work (H3)

The operating model itself changes.
Static reports become conversational decision systems.
Reactive support becomes proactive guidance. The largest value horizon, and the hardest to model.

Most enterprises overinvest in H1 and underinvest in H2 and H3. The ROI framework above naturally corrects for this bias because Vq, Vk, and Vo are explicit components, but only if the team is willing to estimate them rather than round them to zero.

In any AI portfolio review, no more than 60% of use cases should sit purely in Horizon 1. If the portfolio is more concentrated, the organization is treating AI as automation rather than as a new operating capability.

The boardroom checklist

Before approving an AI use case, executives should be able to answer twelve questions in writing:

What business problem are we solving?

And what is the operational unit of value: claim, ticket, quote, decision?

What is the current baseline?

Cost, time, error rate, customer experience: quantified, not qualitative.

What will AI actually do?

Draft, retrieve, classify, recommend, decide, monitor: name the verb.

What is Vd?

Direct value: explicit assumptions on volume, time saved, and unit cost.

What is Vq?

Quality value: explicit assumptions on error rates and cost per error.

What is Vk?

Knowledge activation value: name the decisions affected.

What is Vo?

Option value: name the follow-on use cases and reuse percentages.

What is the full TCO?

All eight cost categories, over three years, decomposed into one-time and recurring.

What execution risk factor (pt)?

And why: what evidence supports it, and how does it evolve over time?

What human supervision is required?

And how does it phase down as the system stabilizes?

How will we evaluate accuracy and safety?

What is in the golden dataset, and who maintains it?

What is the cost of delay?

And what is our deliberate timing strategy: fast-follow, lead, or wait?

The most consequential question is the last. In a classical investment, the alternative to investing is preserving capital. In an AI investment, the alternative is often falling behind on a capability that will eventually become mandatory. That gap rarely appears in a spreadsheet, but it shows up in the strategy.

From ROI to Return on Intelligence

Return on Investment is necessary. For AI, it is incomplete. AI investments produce a second kind of return that classical finance does not capture: an improvement in the organization's ability to convert data, knowledge, and expertise into action.

Call this Return on Intelligence. It is not a soft concept. It has operational measures: how fast can the organization activate its existing knowledge for a new decision? How quickly does a new employee reach productivity? How many decisions per week are made with the right context rather than fragmentary information?

Leaving behind

Cost avoidance modelled in isolation
Single-project payback periods
Technology cost as the headline number
Strategic benefits as a footnote

Moving toward

Four-component value, fully modelled
Three-year horizons, option value visible
TCO across eight categories, owned cross-functionally
Cost of delay as a first-class number

“
The companies that win with AI over the next decade will not be those that calculate the most optimistic ROI. They will be those that build the most honest one, and then act on it with discipline.

The right boardroom question is no longer "How much will this AI use case save?"

It is: "What kind of company will we become if we treat intelligence as infrastructure, and what does it cost us to wait?"

Honesty

Honesty over optimism

Model all four value components and all eight cost categories — even the ones that are uncomfortable to estimate.

Horizon

Three years over one

Year-1 economics kill use cases that look excellent on a stabilized horizon. Pay for the learning; harvest the platform.

Timing

Delay as a number

The cost of waiting belongs in the spreadsheet. Otherwise inaction wins by default and capability gaps compound.

Building AI investments that survive a CFO review and a board one.

We help organizations build the financial, governance, and architectural discipline that turns AI ambition into auditable returns: TCO modelling, four-component value cases, portfolio scoring, and the operating model that scales them.

How to calculate the real ROI of an AI investment