In a nostalgic-but-surprising experiment, ChatGPT played a game of chess against a 1977 Atari 2600 and lost, humiliatingly. The match was ignited by Citrix engineer Robert Caruso, who had a conversation with the AI about the history of chess-playing machines. Caruso agreed to play ChatGPT by using a Stella emulator to play Atari Chess. What ensued was not so much a clash of minds but a show of LLM shortcomings.
The Atari, an old eight-bit machine of the disco and original Star Wars era, played well within its modest means—an engine that only appears to see one move ahead. In the meantime, ChatGPT struggled. It confused rooks with bishops, lost track of the location of pieces, and suggested illegal or illogical moves repeatedly. It did not play correctly even after the change to algebraic notation.
Caruso gave up after 90 minutes of instructing the AI on the fundamentals and fixing its numerous mistakes, with ChatGPT admitting defeat. The irony? The chess engine of the Atari dates back to a period when it was an engineering achievement to make a legal move.

This defeat tells us something larger about AI: it is not monolithic. LLMs like ChatGPT, trained to predict text rather than apply rules, aren’t designed for logic-heavy games like chess. They lack a consistent memory of board state and have difficulty with symbolic reasoning and are susceptible to hallucinations and errors in structured tasks. In comparison, chess engines are very specialized systems designed to perform brute-force calculations, deep search trees, and accuracy. They are able to kill grandmasters, but they are not able to talk.
The mismatch isn’t about intelligence—it’s about purpose. AI isn’t a singular intelligence but a collection of narrow tools, each with distinct strengths. In chess, ChatGPT isn’t the player—it’s the commentator. According to Grandmaster David Bronstein, the art of chess is to think about what chess is. At least, that is not yet within the reach of the chatbot.