Business US

Legal AI Might Be Accurate… And Still Not *Right*

Everyone knows about hallucinations. Well, apparently not everyone, which is why hallucinations provide so much amusement. Lawyers keep putting them into their briefs and, sometimes, lying about it when caught. Judges are even getting in on the action with hallucinations of their own. The plague of hallucinations remains the most discussed AI threat for lawyers.

But one AI weakspot that gets almost no attention — despite being arguably more dangerous — is the case where AI is both perfectly accurate and fundamentally incomplete.

Hallucinations, while pernicious, should be caught by a human. Sending a brief out the door without cite checking is a you problem, not an AI problem. Sure, the AI hype cycle and seductively confident interface may be making lawyers dumber, but to borrow from Smokey the Bear: only YOU can prevent yourself from stupidity.

But incompleteness arises under a whole different set of circumstances. It’s one thing to search a few hundred cases for helpful precedent, and another to scour millions of documents to make sure there’s nothing harmful in there. This is work that humans simply can’t manage on their own and there’s no equivalent to cite-checking when the whole assignment is to “prove a negative.” If AI set to that task misses a document, it’s an “unknown unknown.”

And missing a lone prior-art document buried in the weeds can mean millions in patent litigation.

A new case study delves into this risk of unknown unknowns and how to get ahead of it. Melange, a patent analytics company, set out to build patent search, monitoring, and mapping tools to aide clients in high-stakes intellectual property issues. Finding those obscure prior art gems hidden within hundreds of millions of global patent filings, machine-translated foreign documents, obscure academic papers, and technical manuals can have massive repercussions in litigation. The average cost of patent litigation runs between $2.3 million and $4 million and the average damages clock in around $24 million. A single missed prior-art document can materially shift those numbers.

“When we find that one killer piece of prior art that the customer thinks might win their case, we lock in the customer for life,” Melange CEO Joshua Beck says.

What Melange discovered is that the primary risk in this work isn’t model quality or the dreaded hallucinations, but infrastructure reliability. Scaling AI search to deal with hundreds of millions of global patents and technical papers, takes a lot and a self-hosted system will struggle with incomplete recall and downtime. And when Melange sought to scale from a manageable 40 million documents to the full global patent corpus of roughly 450 million, they identified this precise problem. Those drawbacks could cost a client money.

So they hooked up with Pinecone, a vector database provider, to address the underlying infrastructure. Using AI for massive searches is limited by recall. “Intelligence, provided by the LLM, is a ‘frozen’ reasoning engine that knows how to think and process language,” Pinecone CEO Ash Ashutosh explained. “Knowledge, on the other hand, represents the dynamic, factual state of the world that the LLM must draw upon. Without infrastructure that derives knowledge from proprietary data, even the most intelligent model is prone to costly, inaccurate, and incomplete conclusions.” An infrastructure that supports these insanely high levels of recall are what keeps “index structures stable and query performance predictable,” as the study notes. For most tasks a 90 percent recall is all well and good, but when seeking out prior art, a 10 percent failure rate isn’t acceptable. With Pinecone’s help, Melange scaled beyond 600 million documents without reliability issues.

In the human quest to be distracted by shiny objects, we’re obsessed with debating the merits of these new algorithms and pointing and laughing at the hallucinations. “Lawyers should stop focusing solely on the brain (the model) and start asking about the nervous system (the infrastructure),” Beck explained. “Can this system accurately and reliably scale to the full universe of data without degrading?” If your vendor can’t answer that clearly, well, that is your answer.

Of course patent litigation is just one context. Run-of-the-mill discovery faces the same problems. At conferences, attendees chatter about new context limits and clever workarounds slowly chipping away at the problem, but building infrastructure capable of handling the load is a key part of the equation. Because the model itself may be “accurate” but if it’s incomplete it’s still not right.

Millions at Stake: How Melange’s High-Recall Retrieval Prevents Litigation Collapse [Pinecone]

Joe Patrice is a senior editor at Above the Law and co-host of Thinking Like A Lawyer. Feel free to email any tips, questions, or comments. Follow him on Twitter or Bluesky if you’re interested in law, politics, and a healthy dose of college sports news. Joe also serves as a Managing Director at RPN Executive Search.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button