Artificial Intelligence is disrupting every industry, everywhere, at a speed never seen before. It’s hard to keep up and feel ready for whatever good or bad AI will bring into our work and lives.
As part of this interview series by Website Planet, I talk to AI experts and executives from the best tech companies in the world, who share their insights and tips to help us navigate the practical and often under-discussed realities of AI.
James Evans is the Founder and CEO of Command AI (which used to be called CommandBar), a startup developing a search plugin for web apps. That has so far attracted investments for a total of $24 million and a user base of over 20 million, including companies like Hashicorp and Hubspot. Before establishing Command AI, Evans co-founded codePost and worked as a Private Equity Analyst at Bain Capital. With Command AI, Jamed has pioneered a pragmatic approach to dealing with AI’s hallucinations, a topic that many in the industry would rather avoid.
James will help us understand why AI hallucinations are inevitable and explain the thought process that helped Command AI create a robust system to detect, address, and continuously improve its AI models.
What are AI hallucinations, and why do they happen in generative models?
Hallucinations happen when an AI model asserts something that is not “correct”. Of course, AI only knows correct from incorrect based on its training set, so a hallucination is usually something the AI made up that was not contained within the training set. This can range from an incorrect extrapolation based on the training data to stuff that seems to be ungrounded in the training data at all.
In a “raw” AI system that isn’t architected specifically to avoid them, hallucinations occur because a generative model has no concept of right or wrong. It is simply a prediction of the text that should occur in response to another string of text (usually a question or command by the human interacting with the AI).
Generative AI has no “correctness engine” to ensure that what it is saying is correct; it is more of a “plausibility engine”.
Why so many companies claim to be ‘hallucination-free’ despite the evidence to the contrary
Eliminating hallucinations is the holy grail of generative AI. I really like the mental model that Nat Friedman came up with for generative AI as it exists today: Imagine we encountered a planet with 100 billion mid-level IQ people willing to work for free. What would we do with them?
Eliminating hallucinations is like turning those 100 billion people into extremely conscientious, fact-checking-obsessed people who would rarely, if ever, make mistakes.
The problem is that the definition of hallucination is too broad to make any AI hallucination-free. At best, you can make smaller claims like “our AI won’t reference any facts outside of its training set”. But even that is hard because the whole point of generative AI is to string together information found in a source into personalized responses to a user’s query. And doing so almost always requires filling in the gaps a bit.
Vendors like to claim their AI is hallucination-free because the chief concern many companies have when considering the adoption of AI – whether for an internal or external-facing use case like a support bot – is that the AI will say something very wrong and there will be consequences to that.
Command AI assumes hallucinations will happen from the start. Why is this assumption critical to your approach? In what ways is it an advantage?
We’ve made the decision that we’re not going to train our own foundation models. We want to focus on being world-class experts at utilizing foundation models (partnering with companies like OpenAI, Google, etc) for the problem of helping users and customers with websites and software generally. That means we have to work within the bounds of today’s foundation models, which include occasional hallucinations.
Starting with this assumption is critical for two reasons:
It helps build customer trust. Because hallucinations always happen, the customer must expect that from the get-go. Otherwise, you are selling snake oil that will eventually spoil.
It pushes us to invest in tooling and monitoring that help detect and then mitigate hallucinations.
You’ve developed a process to automatically surface hallucinations. Can you explain how it works?
For example, we have alerting in our product that detects when certain topics are receiving a higher-than-expected level of negative user feedback. That’s a signal that the result is incorrect in some way. So we rely on the end-user to give us a signal that something is off. The end user’s interpretation of the information is a critical source of ground truth for us.
Once a hallucination is detected, what’s the next step?
Once alerted, our customers have several tools available to guide our bot (Copilot) away from the hallucination. For example, they can provide a custom answer override. Or provide instructions in a primitive called a “Workflow”.
Sometimes, the answer is “This isn’t a hallucination”. In those instances, a customer can explain why they’re giving negative feedback to the answer. Customers provide negative feedback for reasons other than “this answer was incorrect”. For example, they may just not like the answer (e.g. if they asked about a return policy and didn’t like the response!).
Do you think the industry is moving towards more transparency, or what can be done to improve in this area?
Transparency in the limit is inevitable. Right now, there’s a lot of market confusion around what is and isn’t possible from foundation models, and many companies are claiming “breakthroughs” to get attention and get their foot in the door. But hallucinations are easy to see when they happen, so I can’t see that strategy working out long-term.
Will we ever reach a point where hallucinations are entirely avoidable? What new technologies or methodologies can help reduce or better manage hallucinations?
Just because we assume hallucinations will happen, doesn’t mean we shouldn’t try to stop them. Our architecture includes several steps to mitigate hallucinations:
An “adversarial” step where we compare several possible answers from our AI agent, and, if they are too different from one another, instruct the main agent to clarify the user’s question rather than providing an answer
Allowing our AI agents to learn from previous user interactions. For example, if users haven’t responded well to an answer that includes certain information, our agents will know that and can choose to answer differently.
What key takeaways would you like our listeners to remember when dealing with AI hallucinations?
Assume hallucinations will happen with GenAI, keep a watchful eye on your interaction logs (especially if you’re building user/customer-facing AI), and remember that humans hallucinate too 🙂
With over 13 years of experience in SEO and managing content websites, he has coordinated over 5000 product reviews and interviews with the biggest names in eCommerce, web hosting, cybersecurity, SaaS, AI, and online marketing, to provide newbies and experts with untapped, actionable insights from the top experts in the industry on how to build and grow websites.
Thank you, - your comment was submitted successfully!
We check all user comments within 48 hours to make sure they are from real people like you. We're glad you found this article useful - we would appreciate it if you let more people know about it.
Share this blog post with friends and co-workers right now:
Thank you, , your comment was submitted successfully!
We check all comments within 48 hours to make sure they're from real users like you. In the meantime, you can share your comment with others to let more people know what you think.
Thank you for signing up!
Once a month you will receive interesting, insightful tips, tricks, and advice to improve your website performance and reach your digital marketing goals!