Artificial intelligence (AI) has the ability to solve complicated equations, write essays, and analyze enormous volumes of data. But when it comes to easy things like spelling “strawberry,” AI can make the funniest mistakes. Consider GPT-4 and Claude, two potent AI models that frequently miscalculate the number of letters in the word “strawberry.” The letter “r” actually appears three times in the word “strawberry,” despite these models falsely claiming it appears twice.
This failure in large language models (LLMs) is a reminder that AI, while incredibly advanced, does not think like humans. LLMs don’t “understand” letters or syllables in the way we do because they operate differently. AI models like GPT-4 are built on transformers, which break text down into smaller pieces called tokens. Tokens can be full words, syllables, or even letters, depending on the model’s configuration. But these models are not actually processing the letters as we see them.
As Matthew Guzdial, an AI researcher, explains, the text is converted into numerical representations. The model knows that “straw” and “berry” together form “strawberry,” but it doesn’t have a fundamental grasp of the individual letters that make up the word. This issue is embedded in the architecture of LLMs, making it a tough nut to crack.
Fixing this problem is not simple, especially when considering the diverse nature of human languages. For instance, tokenization—the process of breaking down language—varies greatly across languages like English, Chinese, and Japanese. These nuances make it challenging for LLMs to become truly adept at every language.
Thus, even while AI is capable of amazing results in many domains, it still has difficulty with fundamental tasks like accurately counting the letters in the word “strawberry.” These limitations serve as a reminder that machines still have a long way to go before they can match human cognition, at least for the time being.