AI models are starting to crack high-level math problems

4 months ago 52

Over the weekend, Neel Somani, who is simply a bundle engineer, erstwhile quant researcher, and a startup founder, was investigating the mathematics skills of OpenAI’s caller exemplary erstwhile helium made an unexpected discovery. After pasting the occupation into ChatGPT and letting it deliberation for 15 minutes, helium came backmost to a afloat solution. He evaluated the impervious and formalized it with a instrumentality called Harmonic — but it each checked out.

“I was funny to found a baseline for erstwhile LLMs are efficaciously capable to lick unfastened mathematics problems compared to wherever they struggle,” Somani said. The astonishment was that, utilizing the latest model, the frontier started to propulsion guardant a bit.

ChatGPT’s chain of thought is adjacent much impressive, rattling disconnected mathematical axioms like Legendre’s formula, Bertrand’s postulate, and the Star of David theorum. Eventually, the exemplary found a Math Overflow station from 2013, where Harvard mathematician Noam Elkies had fixed an elegant solution to a akin problem. But ChatGPT’s last impervious differed from Elkies’ enactment successful important ways, and gave a much implicit solution to a mentation of the occupation posed by legendary mathematician Paul Erdős, whose immense postulation of unsolved problems has go a proving crushed for AI.

For anyone skeptical of instrumentality intelligence, it’s a astonishing effect — and it’s not the lone one. AI tools person go ubiquitous successful mathematics, from formalization-oriented LLMs similar Harmonic’s Aristotle to lit reappraisal tools similar OpenAI’s heavy research. But since the merchandise of GPT 5.2 — which Somani describes arsenic “anecdotally much skilled astatine mathematical reasoning than erstwhile iterations” — the sheer measurement of solved problems has go hard to ignore, raising caller questions astir ample connection models’ quality to propulsion the frontiers of quality knowledge.

Somani was looking astatine the Erdős problems, a acceptable of implicit 1 1000 conjectures by the Hungarian mathematician that are maintained online. The problems person go a tempting people for AI-driven mathematics, varying importantly successful some taxable substance and difficulty. The archetypal batch of autonomous solutions came successful November from a Gemini-powered exemplary called AlphaEvolve — but much recently, Somani and others person recovered GPT 5.2 to beryllium remarkably adept with high-level math.

Since Christmas, 15 problems person been moved from “open” to “solved” connected the Erdős website — and 11 of the solutions person specifically credited AI models arsenic progressive successful the process.

The revered mathematician Terence Tao has a much nuanced look astatine the advancement on his GitHub page, counting 8 antithetic problems wherever AI models made meaningful autonomous advancement connected an Erdős problem, with six different cases wherever advancement was made by locating and gathering on previous research. It’s a agelong mode from AI systems being capable to bash mathematics without quality intervention, but it’s clear that there’s an important role for ample models to play.

Techcrunch event

San Francisco | October 13-15, 2026

On Mastodon, Tao conjectured that the scalable nature of AI systems makes them “better suited for being systematically applied to the ‘long tail’ of obscure Erdős problems, galore of which actually have straightforward solutions.”

“As such, galore of these easier Erdős problems are present much apt to beryllium solved by purely AI-based methods than by quality oregon hybrid means,” Tao continued.

Another driving unit is simply a caller displacement towards formalization, a labor-intensive task that makes mathematical reasoning easier to verify and extend. Formalization doesn’t require usage of AI oregon adjacent computers, but a caller harvest of automated tools person made the process acold easier. The open-source “proof assistant” Lean, which was developed astatine Microsoft Research successful 2013, has go wide utilized wrong the tract arsenic a mode of formalizing proof— and AI tools similar Harmonic’s Aristotle committedness to automate overmuch of the enactment of formalization.

For Harmonic laminitis Tudor Achim, the abrupt leap successful solved Erdős problems is little important than the information that the world’s top mathematicians are starting to instrumentality those tools seriously. “I attraction much astir the information that mathematics and machine subject professors are utilizing [AI tools],” Achim said. “These radical person reputations to protect, truthful erstwhile they’re saying they usage Aristotle oregon they usage ChatGPT, that’s existent evidence.”

Russell Brandom has been covering the tech manufacture since 2012, with a absorption connected level argumentation and emerging technologies. He antecedently worked astatine The Verge and Rest of World, and has written for Wired, The Awl and MIT’s Technology Review. He tin beryllium reached astatine russell.brandom@techcrunch.com oregon connected Signal astatine 412-401-5489.

Read Entire Article