Getting Over the Trust Hump: Navigating the Current State of Contract AI
Delivering data that can be trusted is the key to unlocking AI's potential in contract management.
In a nutshell, there’s reason to be excited about AI’s impact on contracts, but first we need to get over the trust hump. This is my TL/DR takeaway from last month’s three-day digital event dedicated to exploring the transformative impact of AI on contracting practices hosted by World Commerce & Contracting.
Here’s a summary of my personal observations:
Limited Adoption Despite Big Potential: Why Contract AI Is Still in the Early Stages
- There are many potential use cases for leveraging AI to solve contract challenges, but the things people are doing right now are still quite narrow. Indeed, WorldCC’s latest survey data on the “current reality” of AI use cases lists 5 things that are all variants on “tell me what’s in my contracts” (February 2024). The top five items in this list are metadata extraction, clause extraction, contract summarization, contract analytics, and obligation extraction. In each case, the proportion of businesses who say they’re actually doing it right now is in the range of 9-12%, which are relatively low adoption rates. Why so low? My best guess is that disappointing AI performance and data quality, plus the high cost of human review is the logical explanation. People are doing it, but only where the risk or value is high, and the scope is tight, like in M&A due diligence contract reviews.
- Despite the large number of AI-powered playbook and redlining demos we see from vendors, the current reality for that particular use case appears to be even lower. The WorldCC survey results suggest that AI-influenced workflows are being used by only 4% of businesses and negotiation support by just 6%. The vast majority simply are not using AI for these things in the real world. Why not? Again, my best guess is real-world performance challenges. It looks okay in a demo but is not quite fit for purpose.
- Trust remains a persistent hurdle for AI customers. WorldCC survey data shows that almost half of the community (46%) cite data quality and lack of trust as barriers to AI adoption, second only to privacy and security concerns (57%). Even if AI gets it right most of the time, it frequently gets things wrong. The was true before Gen AI came along and remains true even with powerful new Gen AI models. In a sense, Gen AI has made the trust issue worse because LLMs produce data that is designed to look right even when it isn’t. So now you know there are errors, but they are harder to detect, casting a shadow over large scale use cases.
The Hallucination Dilemma: When 80% Accuracy Isn’t Enough
- How bad is the LLM hallucination problem? In various sessions, experts confirmed the 80% ballpark accuracy benchmark and the fact that it’s not clear which 80% you can trust (the believability problem). That means hallucinations are a problem at least 20% of the time.
- A few experts noted that retrieval augmented generation (RAG) can improve or lower the rate of hallucinations, but not get rid of them. RAG is a method of feeding factual data to an LLM and asking it to ground its answer against those facts. It can help, but as Agiloft’s Thomas Levi put it, anyone who says RAG works 100% is “selling magic beans”.
- If AI can’t be trusted, what’s the solution? For most people, so far, their answer is “get humans to check everything”. This is why co-pilots are popular. There’s usually an expert human-in-the-loop to step in when the AI steps out of line. LegalSifter even has a term for the approach of bundling people + AI to automate contract processes: Contract Operations.
- A few of my favorite quotes from the event came via Fraser Hill at Shell: “bad data + insight = bad insight” and “bad data + AI = bad AI”. He also made the point that “if your underlying data – your data definitions, data models, data governance – are not in place, it doesn’t matter how much AI you have, it will not solve problems.” This is good advice. Get your data house in order before you attempt anything too fancy with AI.
AI Hype vs. Reality
- People and businesses are still trying to figure out just how much AI is real and useful and how much is hype. Personal enthusiasm for AI has grown but organizational enthusiasm for AI is much more evenly split between enthusiasts, skeptics and those who aren’t sure. Kingsley Martin, from KM Standards, seems like a classic enthusiast, with visions of autonomous AI negotiation agents and bots. Bernadette Bulacan, from Icertis, called herself a “realist” which seems like a smart position to take. Ian Radford, a CCM and SRM expert, commented that much of AI feels like “we’re all in a dark room groping for the light switch.” (I think this is true for many potential users of the technology).
- A few people commented on AI regulation. New regulations are popping up rapidly around the globe, but these are still very much works in progress, and not really consistent. The EU seems to be heading down a generalized AI regulatory path similar to its GDPR privacy regime. The UK, by contrast, is taking a more sector-specific approach.
- My final comment is a word of gratuitous advice for my fellow vendors. I am seeing too many poorly designed demos that are not based on real-world examples. When your demo looks flaky, this only serves to fan the flames of distrust amongst potential users and customers of your product. If your prime-time demo doesn’t really work, then keep improving your product until it does. All too often, watching AI demos in slow-motion reveals that the tools (a) miss and fail to correct obvious typos, (b) suggest and implement edits that are already covered and pointless, and (c) only work on clauses that are artificially manufactured. If the Theranos implosion teaches us nothing else, it should be that “fake it till you make it” is not a great motto to live by.
For those seeking trusted data without the need for large teams of humans, keep an eye on what we have recently launched. Our newest feature AI Matching is specifically designed to combine Generative AI and Analytical AI in ways that deliver hallucination-free data with zero human effort. For the data geeks out there, this is the breakthrough you’ve been waiting for.