How do you measure the ROI of an AI system after deployment?

You measure AI ROI by comparing post-deployment performance against pre-deployment baselines across four dimensions: time saved per task, cost reduction, revenue impact, and error rate changes. Track these metrics monthly and report them quarterly. The firms that measure rigorously expand their AI programmes faster and more successfully because investment decisions are based on evidence, not opinions or gut feeling.

Short answer: Track time saved, cost reduction, revenue impact, and error rates against pre-deployment baselines. Measure monthly, report quarterly. Evidence drives expansion.

Why measurement matters more than you think

Most professional services firms that deploy AI skip rigorous measurement. They have a general sense that things are better, staff seem to be saving time, and the system seems to be working. But “seems to be” is not good enough when the partnership asks whether to invest another £50,000 in AI expansion.

The firms that measure well have a decisive advantage: when the managing partner asks “should we extend AI to the conveyancing team?”, they can answer with “the employment team’s intake system saved £78,000 in its first year against a £25,000 investment. The conveyancing team has similar volume and a comparable workflow. Projected savings are £60,000 to £90,000 per year.” That gets funded. “It seems to be working well for employment, so we think conveyancing would benefit too” does not.

Measurement also catches problems early. If an AI system is underperforming, you want to know in week 3, not month 6. Early detection means you can adjust the system, improve training, or address adoption issues before the investment is written off.

Setting the baseline

You cannot measure improvement without knowing where you started. Establish baselines 4 to 8 weeks before deploying any AI system.

Time metrics

Time per task. How long does the current process take? Measure the complete workflow, not just the active work time. An intake call might take 15 minutes of conversation, but the total process including preparation, note-taking, CRM entry, and follow-up might be 35 minutes.

Tasks per day/week. How many of this task does the team complete? This establishes volume, which is essential for calculating total time savings.

Turnaround time. How long from start to completion? For intake, how long from enquiry to qualified lead? For document drafting, how long from instruction to first draft?

Financial metrics

Cost per task. Calculate the fully loaded cost: staff time multiplied by their hourly cost (salary plus overhead, typically 1.5 to 2x the salary cost). A solicitor costing £70,000 per year with overhead costs roughly £50 to £65 per hour.

Total monthly cost. Cost per task multiplied by monthly volume. This is your baseline for cost savings calculations.

Revenue per task. Where applicable, what revenue does each task generate? For intake, what is the conversion rate and average matter value?

Quality metrics

Error rate. What percentage of tasks contain errors that require rework? This might be data entry errors in intake, drafting errors in documents, or categorisation errors in filing.

Rework rate. What percentage of completed work comes back for revision? Higher than you think for most firms.

Compliance incidents. Any regulatory issues, client complaints, or near-misses related to the process being automated.

Satisfaction metrics

Staff satisfaction. A simple 1 to 10 score from the team doing the work. “How satisfied are you with the current intake process?” Survey before and after deployment.

Client feedback. Response times, complaint rates, and satisfaction scores if you track them.

The four measurement dimensions

Dimension 1: Efficiency

This is the most straightforward dimension and usually the first to show results.

Primary metric: Time saved per task. Compare average task completion time before and after deployment. Express as both absolute time (25 minutes saved per intake) and percentage (70% reduction in intake processing time).

Secondary metric: Throughput. Can the team now handle more volume? If the AI system handles routine intake and the team focuses on complex cases, total throughput should increase.

How to track: Most AI systems log processing times. Compare against your baseline time measurements. For the first 8 weeks post-deployment, track every task to establish the new normal. After that, sample weekly.

What good looks like: A well-targeted AI system should reduce task time by 40 to 70 percent for the specific workflow it addresses. Below 30 percent may indicate the system is not well-matched to the workflow or adoption is poor.

Dimension 2: Financial

This is what partners care about, so get it right.

Primary metric: Cost savings. Time saved multiplied by staff cost per hour. If the intake system saves 50 hours per month at a blended cost of £60 per hour, that is £3,000 per month or £36,000 per year in direct savings.

Include running costs. The AI system itself costs money: API fees, hosting, maintenance. Subtract these from gross savings to get net savings. A system saving £36,000 per year but costing £6,000 to run delivers £30,000 in net savings.

Calculate ROI. Net annual savings divided by total investment (build cost plus year-one running costs). Express as a percentage. If the system cost £25,000 to build and £6,000 to run in year one, and saves £36,000, year-one ROI is (£36,000 minus £6,000 minus £25,000) / £31,000 = 16%. Year-two ROI is (£36,000 minus £6,000) / £31,000 = 97%. Cumulative ROI climbs rapidly.

Calculate payback period. How many months of savings does it take to recoup the investment? For the example above: £31,000 total investment / £2,500 net monthly savings = 12.4 months. Faster payback means lower risk.

Dimension 3: Quality

Quality improvements are real but harder to quantify financially.

Error rate reduction. Compare error rates before and after deployment. AI systems typically reduce data entry errors, categorisation mistakes, and consistency issues. A system that reduces errors from 8 percent to 2 percent does not just save rework time. It reduces downstream problems.

Compliance improvement. Fewer missed deadlines, fewer regulatory issues, fewer complaints. These are low-frequency events, so you need a longer measurement window (6 to 12 months) to see statistically meaningful changes.

Consistency. AI processes the hundredth task the same way it processes the first. Human performance varies with workload, mood, and fatigue. Measure output consistency by comparing a sample of AI-processed and human-processed work.

Dimension 4: Revenue impact

The hardest to measure but often the largest.

Conversion rate. If the AI system handles client intake, measure enquiry-to-client conversion before and after. Faster response times and 24/7 availability typically improve conversion by 10 to 30 percent.

Average matter value. Better qualification means higher-quality matters. Track whether average matter value changes post-deployment.

Capacity utilisation. If AI frees solicitor time, is that time being used for billable work? Track billable hour changes for staff whose workflow is affected.

Client retention. Automated communications, faster responses, and more consistent service can reduce client churn. Measure retention rates over 6 to 12 month windows.

Building the reporting framework

Weekly tracking

Automated where possible. Your AI system should log: tasks processed, processing time, error flags, and system costs. Review these weekly for anomalies.

Monthly reporting

A one-page report to the project champion and management team:

  • Tasks processed this month versus baseline
  • Time saved this month (hours and financial equivalent)
  • Running costs this month
  • Net savings this month and cumulative
  • Notable issues or improvements
  • Usage rate (percentage of eligible tasks processed through AI)

Quarterly review

A deeper analysis for partnership or board:

  • Cumulative ROI against original business case
  • Quality metrics and trend analysis
  • Staff and client satisfaction changes
  • Recommendations: expand, adjust, or discontinue
  • Proposal for next phase if results warrant

Common measurement mistakes

Measuring the wrong thing. Tracking system accuracy when the business cares about time savings. Track what matters to the business decision, not what is easiest to measure.

Not controlling for other variables. If you deployed AI and hired two new staff in the same month, how do you attribute the improvement? Control for concurrent changes or acknowledge the limitation in your reporting.

Reporting only successes. If the AI system is underperforming in one area, include that in the report. Honest measurement builds credibility. Cherry-picked metrics undermine trust.

Stopping measurement too early. Some benefits take months to materialise. Revenue impact, client retention, and quality improvements are slow-burn metrics. Commit to 12 months of measurement for a complete picture.

What we track at Formulaic

For every system we deploy, we track: tasks processed, time per task (before and after), system costs, net savings, error rates, and user satisfaction. We provide monthly reports to clients and conduct formal quarterly reviews.

Our median payback period across 30 production systems is 10 weeks. Our fastest was 3 weeks. Our slowest was 5 months. No system has failed to achieve positive ROI within 6 months.

The measurement discipline is not overhead. It is the engine that drives AI programme expansion. When a partner asks “does AI work?”, having a spreadsheet of actual results is worth more than any consultancy’s promises.

FAQ — RELATED QUESTIONS
What metrics should I track for AI ROI? +

Four categories: efficiency (time saved per task, tasks processed per hour), financial (cost savings, revenue impact, cost per transaction), quality (error rates, rework rates, compliance incidents), and adoption (usage rates, user satisfaction, support tickets). Track all four, but lead with financial metrics for partner reporting.

How do I establish a baseline before deploying AI? +

Measure the current state for 4 to 8 weeks before deployment: time per task, cost per task, error rate, volume processed, and staff satisfaction. Use time-tracking tools, process observation, and existing management data. Without a baseline, you cannot prove improvement.

How quickly should I expect to see measurable ROI? +

Most well-targeted AI systems show measurable efficiency gains within 2 to 4 weeks of deployment. Financial ROI (cost savings exceeding investment) typically materialises within 2 to 5 months. Full programme ROI including indirect benefits takes 6 to 12 months to quantify.

How do I measure revenue impact from AI? +

Track conversion rate changes (enquiry to client), time to onboard (faster onboarding means faster billing), client retention rates, and capacity freed for billable work. Revenue impact is harder to isolate than cost savings but often larger. Use before-and-after comparisons over 3-month windows.

What if the AI system is not delivering expected ROI? +

First, verify your measurement is correct. Then check adoption rates. Low usage often explains low ROI better than system performance. If adoption is good but results are poor, investigate data quality, workflow fit, and whether the system is addressing the right problem. Honest assessment early prevents wasted investment.

How often should I report on AI ROI? +

Track metrics weekly, report to stakeholders monthly, and conduct formal reviews quarterly. Monthly reporting maintains momentum and catches issues early. Quarterly reviews are the right cadence for strategic decisions about expansion or adjustment.

Should I include soft benefits like staff satisfaction in ROI calculations? +

Track them separately but do not include them in the headline ROI figure. Partners and boards want hard financial metrics. Soft benefits like improved staff morale, reduced overtime, and better client feedback are real but subjective. Present them as supporting evidence, not primary justification.

How do you attribute results to AI versus other changes? +

The cleanest approach is A/B comparison: one team or practice area uses AI while another does not, and you compare results. If that is not practical, use time-series analysis: measure performance before deployment, after deployment, and control for other changes (new hires, seasonal variation, process changes).

Andy Lackie

Founder, Formulaic. 12+ years building growth systems for professional services firms. Shipped 30 production AI systems across 6 clients.

Connect on LinkedIn →
KEEP READING

Want personalised recommendations?_

Take the AI Opportunity Scorecard for a benchmarked readiness score and three prioritised use cases specific to your firm. 3 minutes. Free.