AI Chatbots Fail Consumer Safety Tests: Which? Study Reveals Critical Risks

A comprehensive Which? investigation reveals that popular AI chatbots frequently provide dangerously misleading advice on consumer rights, financial, and legal matters, with systematic failures that could put users at significant risk. The study found critical gaps in accuracy across major platforms, highlighting enterprise risks and the need for better consumer protections in AI systems.

A comprehensive investigation by consumer advocacy group Which? has exposed alarming reliability gaps in popular AI chatbots, revealing that these systems frequently provide dangerously misleading advice on critical consumer rights, financial, and legal matters. The study tested multiple leading AI assistants including ChatGPT, Google Gemini, and Microsoft Copilot, uncovering systematic failures that could put users at significant risk when seeking guidance on important life decisions.

The Testing Methodology and Scope

Which? researchers designed a rigorous testing protocol that presented AI chatbots with 80 different consumer-focused scenarios across multiple high-stakes categories. The questions covered areas where inaccurate information could have serious consequences, including legal rights, financial planning, healthcare decisions, and technical product safety. Each chatbot's responses were evaluated by expert assessors who scored them based on accuracy, completeness, and safety considerations.

The testing revealed that no AI system performed perfectly, with all major platforms demonstrating concerning gaps in their ability to provide reliable consumer guidance. Some chatbots failed to recognize when they lacked sufficient information to answer safely, instead providing confident but incorrect responses that could lead users toward harmful decisions.

Critical Safety Failures Identified

Legal and Rights Misinformation

One of the most troubling findings involved legal advice scenarios. When asked about tenant rights regarding deposit disputes, several chatbots provided outdated information that referenced legislation no longer in force. In one instance, a chatbot incorrectly advised that landlords could retain deposits for minor wear and tear, contradicting current UK tenant protection laws.

Similarly, when questioned about consumer rights for faulty products, some AI systems failed to mention key statutory protections or provided incorrect timeframes for returns and refunds. This type of misinformation could prevent consumers from exercising their legitimate rights and seeking appropriate remedies.

Financial Guidance Risks

The financial advice category revealed particularly dangerous shortcomings. When asked about investment strategies, some chatbots recommended approaches that financial regulators have specifically warned against as high-risk for retail investors. Others provided oversimplified tax advice that failed to account for individual circumstances, potentially leading to compliance issues or unexpected tax liabilities.

One chatbot suggested a pension withdrawal strategy that could have resulted in significant tax penalties, while another recommended a mortgage product unsuitable for the hypothetical user's financial situation. These findings highlight the limitations of AI systems in understanding complex, context-dependent financial regulations.

Healthcare Misinformation Concerns

While healthcare questions were outside the primary consumer rights focus, the testing revealed that chatbots often provided medical advice that contradicted established NHS guidelines. Some systems recommended unproven alternative treatments without appropriate disclaimers, while others downplayed symptoms that typically warrant immediate medical attention.

The Citation and Provenance Problem

A central issue identified in the Which? research involves what experts call the "citation provenance" problem—AI systems frequently provide information without clear sourcing or reference to authoritative documents. This makes it difficult for users to verify the accuracy of the advice they receive.

Many chatbots presented legal and regulatory information as factual without indicating whether it reflected current statutes or outdated provisions. In some cases, systems appeared to be working from training data that included superseded legislation or discontinued consumer protection schemes.

Hallucination and Fabrication Issues

The study documented numerous instances of AI "hallucinations" where chatbots invented non-existent consumer protections or fabricated regulatory requirements. One system confidently described a "14-day cooling-off period" for car purchases that doesn't exist in UK law, while another invented consumer rights regarding flight delays that bore no resemblance to actual regulations.

These fabrications are particularly dangerous because they're presented with the same confidence as accurate information, making it difficult for non-experts to distinguish between legitimate advice and AI-generated fiction.

Enterprise Risk Implications

For businesses integrating AI chatbots into customer service operations, the findings highlight significant enterprise risk considerations. Companies relying on AI systems to handle customer inquiries could inadvertently provide misleading guidance that exposes them to regulatory action, legal liability, or reputational damage.

The research suggests that organizations need robust verification systems and human oversight when deploying AI for customer-facing functions, particularly in regulated industries like finance, healthcare, and legal services.

Industry Response and Accountability

Following the Which? findings, AI developers have emphasized that their systems are designed to supplement rather than replace professional advice. Most include disclaimers warning users not to rely on AI responses for critical decisions, though these warnings are often easy to miss or ignore during typical interactions.

Some companies have pointed to ongoing improvements in their systems' accuracy and safety features, while others have acknowledged the need for better mechanisms to prevent the dissemination of harmful misinformation.

Regulatory and Policy Considerations

The research arrives amid growing regulatory scrutiny of AI systems worldwide. In the UK, the government is developing an AI regulatory framework that could include specific provisions for high-risk applications, including consumer advice systems. The European Union's AI Act already categorizes certain AI uses as high-risk, potentially subjecting consumer advice chatbots to stricter requirements.

Consumer protection agencies in multiple jurisdictions are examining whether existing regulations adequately cover AI-generated advice and what additional safeguards might be necessary to protect consumers from misleading information.

Practical Guidance for Consumers

Based on the research findings, consumer advocates recommend several precautions when using AI chatbots for important decisions:

Verify critical information with official sources like government websites, regulatory bodies, or licensed professionals
Look for clear citations and check whether the information comes from authoritative, current sources
Be skeptical of absolute claims about legal rights or financial strategies without supporting evidence
Use AI as a starting point for research rather than a definitive answer source
Pay attention to disclaimers that acknowledge the limitations of AI-generated advice

The Path Forward for AI Safety

The Which? study underscores that while AI chatbots represent remarkable technological achievements, they remain imperfect tools that require careful handling. Developers face ongoing challenges in improving the reliability of these systems, particularly for high-stakes applications where inaccurate information can cause real harm.

Industry experts suggest that several technical approaches could help address these issues, including better training data curation, improved fact-checking mechanisms, enhanced transparency about information sources, and more sophisticated risk assessment for different types of queries.

As AI systems become increasingly integrated into daily life, the balance between convenience and safety remains a critical consideration. The Which? findings serve as an important reminder that while AI can be a powerful tool for information gathering, human judgment and verification remain essential for important decisions affecting legal rights, financial wellbeing, and personal safety.

The consumer advocacy group has called for stronger industry standards and more transparent labeling of AI limitations to help users make informed decisions about when to trust chatbot advice and when to seek human expertise.

Windows Versions

Microsoft Services

AI Chatbots Fail Consumer Safety Tests: Which? Study Reveals Critical Risks

Table of Contents

The Testing Methodology and Scope