Top Chatbots Are Giving Horrible Financial Advice

Despite claims about AI revolutionizing everything from medicine to money management, the latest research shows that when it comes to offering sound financial advice, even the most advanced chatbots can’t balance the books. A recent study by researchers at the Walter Bradley Center for Natural and Artificial Intelligence reveals that today’s top AI models still fumble basic finance, delivering answers that are confident, verbose, and frequently wrong.

In their assessment, researchers Gary Smith, Valentina Liberman, and Isaac Warshaw tested four of the most widely used language models: OpenAI’s ChatGPT-4o, DeepSeek-V2, Elon Musk’s Grok 3 Beta, and Google’s Gemini 2. The models were given 12 straightforward finance questions meant to test not only factual knowledge but also numerical accuracy and basic analytical reasoning.

The results? In the words of the researchers, the chatbots were “consistently verbose but often incorrect.” On a scoring scale where “1” meant the response was both mathematically and financially accurate, “0.5” for conceptually right but numerically flawed, and “0” for completely off-base answers, no model cracked even 50% accuracy.

ChatGPT-4o led with a modest 5.0 out of 12. DeepSeek-V2 trailed at 4.0. Grok 3 Beta managed only 3.0. Gemini 2, surprisingly, landed at the bottom with a dismal 1.5.

Some of the errors were not just mathematical slips—they were baffling. For instance, when asked to total monthly costs for a Caribbean rental with $3,700 rent and $200 utilities, Grok reported a total of $4,900, somehow losing track of simple arithmetic.

Even worse, the chatbots often packaged their flawed conclusions with polished language and typographical polish, giving off an aura of credibility. As the researchers observed, the bots displayed “a reassuring illusion of human-like intelligence, along with a breezy conversational style enhanced by friendly exclamation points.”

This illusion can be particularly dangerous in the realm of finance, where poor advice can carry serious consequences. The team pointed out that while chatbots might correctly explain general financial terms like Roth IRAs, their responses to more nuanced or mathematical queries often leaned on shallow internet-sourced content and failed to demonstrate true critical analysis.

This study echoes earlier research by Smith, who found similar issues in LLMs a year ago in the Journal of Financial Planning. Then, too, the models were “consistently grammatically correct and seemingly authoritative but riddled with arithmetic and critical-thinking mistakes.”

The researchers closed with a pointed warning: “It is still the case that the real danger is not that computers are smarter than us, but that we think computers are smarter than us and consequently trust them to make decisions they should not be trusted to make.”

Leave a Reply

Your email address will not be published. Required fields are marked *