REF-011 · UK · LEGAL · LEGAL-TECH

UK employment-law AI stack at 12× lower LLM cost. Corpus + claim helper + lawyer handoff — production in 11 phases.

A legal-tech firm needed three things at once: a corpus of UK employment-tribunal decisions structured for retrieval, an AI claim helper for inbound enquiries that hands off to lawyers when needed, and an LLM cost structure that didn't kill the unit economics. We built all three.

001_OUTCOMES
12×
LLM COST REDUCTION (vs HAIKU 4.5)
11
PRODUCTION PHASES SHIPPED
$38
ONE-TIME CORPUS BUILD COST
3
AI SURFACES — CORPUS · AGENT · FUNNEL
002_THE PROBLEM

Three problems at once: a corpus, a claim agent, and unit economics.

Mercer Employment Law is a UK legal-tech firm building AI-augmented services for unfair dismissal, discrimination, and settlement claims. Three things needed to work together: a structured, queryable corpus of UK employment-tribunal decisions; an AI claim helper that could engage inbound enquiries 24/7 and triage them; and a smooth handoff to qualified lawyers when a claim was worth pursuing.

The technical challenge was unit economics. Inbound enquiry volume was projected at thousands of conversations per month. Each conversation could touch the corpus multiple times for grounding. Running that on Sonnet 4.5 or even Haiku 4.5 at the projected volumes pushed monthly LLM cost into a band where the per-claim economics no longer worked.

The corpus itself was non-trivial. Tribunal decisions are dense legal documents — multi-page narratives with judgement reasoning, statutory citations, factual findings, and quantum calculations. Extracting structured fields (case type, employer size, claim outcome, awarded compensation, key precedents cited) at corpus scale with off-the-shelf models was technically feasible but financially painful.

003_WHAT WE BUILT

Corpus on Gemini Flash-Lite. Agent on Sonnet 4.5 with parallel sampling. Handoff via verified email + lawyer dashboard.

The corpus extraction pipeline runs on Gemini Flash-Lite. We benchmarked it against Haiku 4.5 on structured-JSON extraction of legal documents and found accuracy parity at roughly 12× lower per-token cost — a material economic finding that we now apply to other dense-document extraction workloads. The entire UK employment-tribunal corpus was extracted for a one-time spend of $38. Voyage-law-2 embeddings on top of the structured corpus power citation-grounded retrieval.

The claim helper runs on Sonnet 4.5 with parallel sampling for high-stakes decisions (ET1 form completion, settlement scenario modelling, demand letter drafting). The agent has access to the corpus for precedent-grounded reasoning, the firm's intake criteria for claim qualification, and the firm's published quantum benchmarks for compensation modelling.

11 phases of production deployment over the project — each phase a discrete capability shipped to staging, validated against legal-team review, then promoted to production. Frictionless funnel: claimant arrives → agent engages → claim assessed against firm criteria → if pursuing, structured handoff to a qualified lawyer with the case summary, suggested next steps, and full conversation log already attached.

The lawyer dashboard surfaces every active conversation, every pending handoff, and every model output the agent produced — fully auditable. Lawyers can intervene mid-conversation if they spot an issue. Every claim that reaches a lawyer arrives already documented; lawyers spend their time on judgement, not transcription.

01
CORPUS EXTRACTION PIPELINE

UK employment-tribunal decisions structured on Gemini Flash-Lite — 12× cheaper than Haiku at equivalent extraction accuracy

02
VOYAGE EMBEDDINGS LAYER

voyage-law-2 + voyage-4-large for retrieval-grounded answers; corpus build was $38 one-time

03
CLAIM HELPER AGENT (SONNET 4.5)

Inbound engagement, intake triage, parallel sampling on high-stakes outputs

04
ET1 FORM AUTO-DRAFTING

From conversation transcript, claim type, and firm methodology — lawyer reviews not writes

05
QUANTUM MODELLING

Compensation scenarios from corpus precedents + firm's published benchmarks

06
DEMAND LETTER DRAFTING

Structured demand templates populated from case facts + extracted precedents

07
FRICTIONLESS LAWYER HANDOFF

Verified email + structured case summary + conversation log delivered to assigned lawyer

08
LAWYER REVIEW DASHBOARD

Every active conversation, every handoff, every model output — fully auditable

09
LANGFUSE TRACING

Prompt management + trace replay for legal-team review and continuous improvement

10
OPENROUTER FALLBACK

Multi-provider LLM routing for reliability — agent doesn't go down when one provider does

11
STAGING ENVIRONMENT

Full mirror at internal subdomain for legal-team review of every new capability before production

004_THE OUTCOME

Unit economics work. The 12× LLM cost reduction on corpus extraction is the headline number; the same principle (right model for the workload, not the most capable model for everything) was applied to extraction-style operations throughout the stack. Per-claim cost is now well inside the band where the agent + lawyer model is profitable.

11 production phases shipped on cadence. Each was de-risked by legal-team review at staging before production promotion. No surprises in prod; no rollbacks.

The corpus is now firm infrastructure: it powers the agent, it powers an internal research tool the lawyers use directly, and it's the foundation for future productised offerings (settlement calculators, claim assessment widgets) that the firm is rolling out next quarter.

The frictionless handoff has changed the firm's intake economics. Lawyers spend their first 5 minutes on a case reading the AI-produced summary, not transcribing the customer's first call. Per-lawyer claim throughput is materially higher.

005_MODEL-CHOICE RATIONALE

Three principles drove model selection. First, match the model to the workload — Gemini Flash-Lite for structured extraction (cheap, accurate enough); Sonnet 4.5 for client-facing reasoning + parallel sampling on high-stakes outputs. Second, route via OpenRouter so a single provider outage doesn't take the agent down. Third, instrument everything via Langfuse so the legal team can replay any conversation, any model output, any prompt.

Most legal-tech builds default to Sonnet 4.5 (or Sonnet 4.6 now) for everything — it's safe and convenient. The cost difference at corpus scale is meaningful. The cost difference at agent-conversation scale is the difference between a viable business and a venture-subsidised one.

What's your LLM cost ceiling — and what's actually driving it?

Book a 30-minute call to talk through where matching the model to the workload would change your firm's unit economics.

Book a call → Take the scorecard