Scientists Asked ChatGPT To Solve A Math Problem From More Than 2,000 Years Ago – How It Answered It Surprised Them

Over two thousand years ago, Plato recounted a lesson in which Socrates challenged a student with the classic “doubling the square” problem. When asked to create a square with twice the area of the original, the student incorrectly doubled the side length, unaware that the true solution required using the diagonal. The ancient puzzle has long been used to probe questions about learning, reasoning, and whether mathematical insight comes from innate understanding or experience. Today, researchers are turning the same problem toward artificial intelligence.

A team from Cambridge University and Jerusalem’s Hebrew University posed the problem to ChatGPT to test its mathematical reasoning. They chose it specifically because, as they explained, the solution is not obvious and unlikely to appear directly in the model’s training data, which is drawn largely from text rather than diagrams. If ChatGPT could solve it unaided, it might suggest that AI can “learn” through reasoning rather than simply retrieving memorized knowledge.

Initially, ChatGPT stumbled when asked to apply similar logic to rectangles. It incorrectly claimed that there was no geometric solution to doubling a rectangle’s area, insisting that the diagonal could not help. Visiting scholar Nadav Marco noted that the likelihood of this exact error appearing in training data was “vanishingly small.” To him, the response revealed something more improvisational, as though the AI were testing hypotheses based on earlier parts of the conversation.

“When we face a new problem, our instinct is often to try things out based on our past experience,” Marco explained. “In our experiment, ChatGPT seemed to do something similar. Like a learner or scholar, it appeared to come up with its own hypotheses and solutions.”

This interpretation was echoed by Cambridge professor Andreas Stylianides. Both scholars suggested the AI’s reasoning resembled a concept familiar in education: the zone of proximal development (ZPD), the space between what someone already knows and what they might learn with proper guidance. In this sense, ChatGPT appeared to act in a “learner-like” fashion, capable of navigating new challenges even while making mistakes reminiscent of Socrates’ student.

The study, published in the International Journal of Mathematical Education in Science and Technology on September 17, raises broader questions about how AI “thinks.” Because large language models are essentially black boxes, the steps they take to reach conclusions remain hidden. Still, the researchers emphasized that this experiment demonstrates an opportunity: by recognizing how AI improvises, educators might use it as a partner for exploration rather than just a tool for answers.

Stylianides cautioned against blind trust, warning that “unlike proofs found in reputable textbooks, students cannot assume that ChatGPT’s proofs are valid.” Instead, he argued, education must now include the ability to evaluate AI-generated reasoning. That may also require better prompt engineering—asking AI to “explore the problem together” rather than commanding it to provide an answer.

The team remains careful not to overstate their findings, but they see fertile ground for future work. Testing newer AI models on broader sets of mathematical problems, or combining them with dynamic geometry systems, could create more collaborative digital environments for teaching.

Leave a Reply

Your email address will not be published. Required fields are marked *