UK employment-law AI stack at 12× lower LLM cost. Corpus + claim helper + lawyer handoff — production in 11 phases.
A legal-tech firm needed three things at once: a corpus of UK employment-tribunal decisions structured for retrieval, an AI claim helper for inbound enquiries that hands off to lawyers when needed, and an LLM cost structure that didn't kill the unit economics. We built all three.
Three problems at once: a corpus, a claim agent, and unit economics.
Mercer Employment Law is a UK legal-tech firm building AI-augmented services for unfair dismissal, discrimination, and settlement claims. Three things needed to work together: a structured, queryable corpus of UK employment-tribunal decisions; an AI claim helper that could engage inbound enquiries 24/7 and triage them; and a smooth handoff to qualified lawyers when a claim was worth pursuing.
The technical challenge was unit economics. Inbound enquiry volume was projected at thousands of conversations per month. Each conversation could touch the corpus multiple times for grounding. Running that on Sonnet 4.5 or even Haiku 4.5 at the projected volumes pushed monthly LLM cost into a band where the per-claim economics no longer worked.
The corpus itself was non-trivial. Tribunal decisions are dense legal documents — multi-page narratives with judgement reasoning, statutory citations, factual findings, and quantum calculations. Extracting structured fields (case type, employer size, claim outcome, awarded compensation, key precedents cited) at corpus scale with off-the-shelf models was technically feasible but financially painful.
Corpus on Gemini Flash-Lite. Agent on Sonnet 4.5 with parallel sampling. Handoff via verified email + lawyer dashboard.
The corpus extraction pipeline runs on Gemini Flash-Lite. We benchmarked it against Haiku 4.5 on structured-JSON extraction of legal documents and found accuracy parity at roughly 12× lower per-token cost — a material economic finding that we now apply to other dense-document extraction workloads. The entire UK employment-tribunal corpus was extracted for a one-time spend of $38. Voyage-law-2 embeddings on top of the structured corpus power citation-grounded retrieval.
The claim helper runs on Sonnet 4.5 with parallel sampling for high-stakes decisions (ET1 form completion, settlement scenario modelling, demand letter drafting). The agent has access to the corpus for precedent-grounded reasoning, the firm's intake criteria for claim qualification, and the firm's published quantum benchmarks for compensation modelling.
11 phases of production deployment over the project — each phase a discrete capability shipped to staging, validated against legal-team review, then promoted to production. Frictionless funnel: claimant arrives → agent engages → claim assessed against firm criteria → if pursuing, structured handoff to a qualified lawyer with the case summary, suggested next steps, and full conversation log already attached.
The lawyer dashboard surfaces every active conversation, every pending handoff, and every model output the agent produced — fully auditable. Lawyers can intervene mid-conversation if they spot an issue. Every claim that reaches a lawyer arrives already documented; lawyers spend their time on judgement, not transcription.
UK employment-tribunal decisions structured on Gemini Flash-Lite — 12× cheaper than Haiku at equivalent extraction accuracy
voyage-law-2 + voyage-4-large for retrieval-grounded answers; corpus build was $38 one-time
Inbound engagement, intake triage, parallel sampling on high-stakes outputs
From conversation transcript, claim type, and firm methodology — lawyer reviews not writes
Compensation scenarios from corpus precedents + firm's published benchmarks
Structured demand templates populated from case facts + extracted precedents
Verified email + structured case summary + conversation log delivered to assigned lawyer
Every active conversation, every handoff, every model output — fully auditable
Prompt management + trace replay for legal-team review and continuous improvement
Multi-provider LLM routing for reliability — agent doesn't go down when one provider does
Full mirror at internal subdomain for legal-team review of every new capability before production
Unit economics work. The 12× LLM cost reduction on corpus extraction is the headline number; the same principle (right model for the workload, not the most capable model for everything) was applied to extraction-style operations throughout the stack. Per-claim cost is now well inside the band where the agent + lawyer model is profitable.
11 production phases shipped on cadence. Each was de-risked by legal-team review at staging before production promotion. No surprises in prod; no rollbacks.
The corpus is now firm infrastructure: it powers the agent, it powers an internal research tool the lawyers use directly, and it's the foundation for future productised offerings (settlement calculators, claim assessment widgets) that the firm is rolling out next quarter.
The frictionless handoff has changed the firm's intake economics. Lawyers spend their first 5 minutes on a case reading the AI-produced summary, not transcribing the customer's first call. Per-lawyer claim throughput is materially higher.
Three principles drove model selection. First, match the model to the workload — Gemini Flash-Lite for structured extraction (cheap, accurate enough); Sonnet 4.5 for client-facing reasoning + parallel sampling on high-stakes outputs. Second, route via OpenRouter so a single provider outage doesn't take the agent down. Third, instrument everything via Langfuse so the legal team can replay any conversation, any model output, any prompt.
Most legal-tech builds default to Sonnet 4.5 (or Sonnet 4.6 now) for everything — it's safe and convenient. The cost difference at corpus scale is meaningful. The cost difference at agent-conversation scale is the difference between a viable business and a venture-subsidised one.
More systems like this
What's your LLM cost ceiling — and what's actually driving it?
Book a 30-minute call to talk through where matching the model to the workload would change your firm's unit economics.