A customer asks your AI chatbot about your refund policy. The chatbot, drawing on its language model rather than your actual policy document, confidently invents a 90-day money-back guarantee your business has never offered. The customer screenshots the conversation and requests a refund six weeks after purchase. Under Australian Consumer Law, you may be obligated to honour it because your AI made the representation on your behalf.
This is not hypothetical. Air Canada's chatbot told a customer he was entitled to a bereavement fare discount after booking. The airline said the chatbot was wrong. A tribunal ruled that Air Canada was liable because the chatbot was the airline's agent and its representations were the airline's representations. The customer got the discount.
AI customer service is powerful when deployed correctly. It handles routine queries at scale, operates 24/7, and responds in seconds. But when it goes wrong, it does not just frustrate customers. It creates legal liability, damages trust, and can cost more than the human team it was meant to replace. Here is what goes wrong, why, and how to prevent it.
of customers still prefer human customer service over AI
binding: AI chatbot promises have been upheld by courts
of customer queries are simple enough for AI to handle well
A chatbot for a retail company was asked about warranty terms. The AI generated a warranty policy that sounded authoritative but did not match the company's actual terms. It offered extended coverage the company never provided. Customers who relied on the chatbot's response then expected the coverage. The company had to choose between honouring commitments it never made or fighting disputes with customers who had evidence of what they were told.
An AI customer service bot, attempting to resolve a complaint, offered a 20% discount code. The discount did not exist in the company's system. When the customer tried to use it and it failed, they were angrier than before the AI interaction. The bot had escalated a minor complaint into a major customer relations problem by making a promise it was not authorised to make.
A customer contacted support about a billing error that had caused them genuine financial hardship. The AI responded with cheerful, scripted language: "Thanks for reaching out! We are here to help!" The mismatch between the customer's distress and the bot's relentless positivity made an already upset customer furious. Empathy is not a feature you can reliably configure in a language model.
A customer asked a question the AI could not answer. Instead of escalating to a human, the bot kept rephrasing the same unhelpful response in different ways. After eight messages of going in circles, the customer gave up and left a one-star review. The bot counted the interaction as "resolved" because the customer stopped responding. The metrics showed success. The reality was a lost customer.
AI chatbots need a knowledge base to draw from. If that knowledge base is incomplete, outdated, or poorly structured, the AI fills gaps by generating plausible-sounding answers from its training data. Those answers may have nothing to do with your business. The hallucination problem is not a bug that will be fixed. It is a fundamental characteristic of how language models work. They predict what sounds right, not what is right.
AI can simulate empathy through scripted responses. It cannot genuinely understand emotional context. A human agent reads frustration, adjusts their tone, offers genuine apology, and exercises judgement about when to break from script. AI reads keywords and applies probability. For routine queries this is fine. For emotionally charged interactions it is visibly inadequate, and customers can tell.
Many AI customer service deployments have weak or nonexistent escalation paths. The AI is configured to handle everything, which means it attempts to handle things it should not. Without clear triggers for human handoff, complex situations get the same scripted treatment as simple ones. The result is customers who feel unheard and businesses that believe their AI is performing well because the metrics show interactions are being completed.
Most AI customer service tools measure response time, resolution rate, and customer satisfaction scores. But resolution rate often means "the customer stopped messaging," not "the customer's problem was solved." Satisfaction scores often reflect the ease of the interaction, not the accuracy of the answer. Silent failures hide behind good-looking metrics.
The answer is not AI or humans. It is AI for the right things and humans for the right things.
Business hours, location, and contact information. Order status and tracking. Simple product questions answered directly from a verified knowledge base. Password resets and account access. Appointment scheduling and rescheduling. FAQ responses that are factual, not interpretive. These represent 60 to 70% of all customer queries in most businesses, and AI handles them faster and more consistently than humans.
Complaints, disputes, and refund requests. Any interaction where the customer is upset. Complex or unusual situations that fall outside standard processes. Anything involving financial decisions or legal implications. Sensitive personal situations. Conversations where the customer asks to speak to a person. These represent 30 to 40% of queries but carry 90% of the reputational and legal risk.
The biggest failure point in hybrid systems is the transition from AI to human. If the customer has to repeat everything they told the AI, the handoff has failed. The human agent must receive the full conversation history, the customer's issue summary, and any actions the AI has already taken. Good hybrid systems make this invisible to the customer. Bad ones make the customer feel like they are starting over.
Restrict AI to verified knowledge. Configure your AI to only answer from an approved knowledge base. When it encounters a question outside that base, it should say "Let me connect you with someone who can help" rather than generating an answer.
Set hard escalation triggers. Words like "complaint," "lawyer," "refund," "supervisor," "upset," and "not happy" should trigger immediate human escalation. Configure these triggers before going live and expand them based on real conversation data.
Audit weekly. Review a random sample of AI conversations every week. Check for accuracy, tone, and appropriate escalation. This catches quality drift before it becomes a customer relations problem.
Test with adversarial scenarios. Before going live, test your AI with the hardest questions you can think of. Try to get it to hallucinate a policy. Try to get it to offer a discount. Try to frustrate it. If you can break it in testing, customers will break it in production.
Make the human option visible. Every AI interaction should include a clear, easy way to reach a human. Do not bury it behind three levels of menu. Customers who want a human and cannot find one will leave your business, not your chatbot.
AI customer service works brilliantly for simple, factual, high-volume queries. It fails predictably for complex, emotional, or ambiguous situations. The businesses that succeed deploy AI with clear boundaries, strong escalation paths, and regular monitoring. The businesses that fail treat AI as a replacement for their customer service team rather than a tool that handles the routine work while humans handle the work that requires judgement. Done right, AI customer service is a genuine competitive advantage. Done wrong, it is a faster way to lose customers.
Our Free AI Audit identifies which customer interactions are good candidates for AI and which should stay human.