• Education
    • Higher Education
    • Scholarships & Grants
    • Online Learning
    • School Reforms
    • Research & Innovation
  • Lifestyle
    • Travel
    • Food & Drink
    • Fashion & Beauty
    • Home & Living
    • Relationships & Family
  • Technology & Startups
    • Software & Apps
    • Startup Success Stories
    • Startups & Innovations
    • Tech Regulations
    • Venture Capital
    • Artificial Intelligence
    • Cybersecurity
    • Emerging Technologies
    • Gadgets & Devices
    • Industry Analysis
  • About us
  • Contact
  • Advertise with Us
  • Privacy & Policy
Today Headline
  • Home
  • World News
    • Us & Canada
    • Europe
    • Asia
    • Africa
    • Middle East
  • Politics
    • Elections
    • Political Parties
    • Government Policies
    • International Relations
    • Legislative News
  • Business & Finance
    • Market Trends
    • Stock Market
    • Entrepreneurship
    • Corporate News
    • Economic Policies
  • Science & Environment
    • Space Exploration
    • Climate Change
    • Wildlife & Conservation
    • Environmental Policies
    • Medical Research
  • Health
    • Public Health
    • Mental Health
    • Medical Breakthroughs
    • Fitness & Nutrition
    • Pandemic Updates
  • Sports
    • Football
    • Basketball
    • Tennis
    • Olympics
    • Motorsport
  • Entertainment
    • Movies
    • Music
    • TV & Streaming
    • Celebrity News
    • Awards & Festivals
  • Crime & Justice
    • Court Cases
    • Cybercrime
    • Policing
    • Criminal Investigations
    • Legal Reforms
No Result
View All Result
  • Home
  • World News
    • Us & Canada
    • Europe
    • Asia
    • Africa
    • Middle East
  • Politics
    • Elections
    • Political Parties
    • Government Policies
    • International Relations
    • Legislative News
  • Business & Finance
    • Market Trends
    • Stock Market
    • Entrepreneurship
    • Corporate News
    • Economic Policies
  • Science & Environment
    • Space Exploration
    • Climate Change
    • Wildlife & Conservation
    • Environmental Policies
    • Medical Research
  • Health
    • Public Health
    • Mental Health
    • Medical Breakthroughs
    • Fitness & Nutrition
    • Pandemic Updates
  • Sports
    • Football
    • Basketball
    • Tennis
    • Olympics
    • Motorsport
  • Entertainment
    • Movies
    • Music
    • TV & Streaming
    • Celebrity News
    • Awards & Festivals
  • Crime & Justice
    • Court Cases
    • Cybercrime
    • Policing
    • Criminal Investigations
    • Legal Reforms
No Result
View All Result
Today Headline
No Result
View All Result
Home Science & Environment

Mathematicians Question AI Performance at International Math Olympiad todayheadline

August 7, 2025
in Science & Environment
Reading Time: 5 mins read
A A
0
2
SHARES
4
VIEWS
Share on FacebookShare on Twitter


A defining memory from my senior year of high school was a nine-hour math exam with just six questions. Six of the top scorers won slots on the U.S. team for the International Math Olympiad (IMO), the world’s longest running math competition for high school students. I didn’t make the cut, but became a tenured mathematics professor anyway.

This year’s olympiad, held last month on Australia’s Sunshine Coast, had an unusual sideshow. While 110 students from around the world went to work on complex math problems using pen and paper, several AI companies quietly tested new models in development on a computerized approximation of the exam. Right after the closing ceremonies, OpenAI and later Google DeepMind announced that their models earned (unofficial) gold medals for solving five of the six problems. Researchers like Sébastien Bubeck of OpenAI celebrated these models’ successes as a “moon landing moment” by industry.

But are they? Is AI going to replace professional mathematicians? I’m still waiting for the proof.


On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


The hype around this year’s AI results is easy to understand because the olympiad is hard. To wit, in my senior year of high school, I set aside calculus and linear algebra to focus on olympiad-style problems, which were more of a challenge. Plus the cutting-edge models still in development did so much better at the exam than the commercial models already out there. In a parallel contest administered by MathArena.ai, Gemini 2.5 pro, Grok 4, o3 high, o4-mini high and DeepSeek R1 all failed to produce a single completely correct solution. It shows that AI models are getting smarter, their reasoning capabilities improving rather dramatically.

Yet I’m still not worried.

The latest models just got a good grade on a single test—as did many of the students—and a head-to-head comparison isn’t entirely fair. The models often employ a “best-of-n” strategy, generating multiple solutions and then grading themselves to select the strongest. This is akin to having several students work independently, then get together to pick the best solution and submit only that one. If the human contestants were allowed this option, their scores would likely improve too.

Other mathematicians are similarly cautioning against the hype. IMO gold medalist Terence Tao (currently a mathematician at the University of California, Los Angeles) noted on Mastodon that what AI can do depends on what the testing methodology is. IMO president Gregor Dolinar said that the organization “cannot validate the methods [used by the AI models], including the amount of compute used or whether there was any human involvement, or whether the results can be reproduced.”

Besides, IMO exam questions don’t compare to the kinds of questions professional mathematicians try to answer, where it can take nine years, rather than nine hours, to solve a problem at the frontier of mathematical research. As Kevin Buzzard, a mathematics professor at Imperial College London, said in an online forum, “When I arrived in Cambridge UK as an undergraduate clutching my IMO gold medal I was in no position to help any of the research mathematicians there.”

These days, mathematical research can take more than one lifespan to acquire the right expertise. Like many of my colleagues, I’ve been tempted to try “vibe proving”—having a math chat with an LLM as one would with a colleague, asking “Is it true that…” followed by a technical mathematical conjecture. The chatbot often then supplies a clearly articulated argument that, in my experience, tends to be correct when it comes to standard topics but subtly wrong at the cutting edge. For example, every model I’ve asked has made the same subtle mistake in assuming that the theory of idempotents behaves the same for weak infinite-dimensional categories as it does for ordinary ones, something that human experts (trust me on this) in my field know to be false.

I’ll never trust an LLM—which at its core is just predicting what text will come next in a string of words, based on what’s in its dataset—to provide a mathematical proof that I can’t verify myself.

The good news is, we do have an automated mechanism for determining whether proofs can be trusted. Relatively recent tools called “proof assistants” are software programs (they don’t use AI) designed to check whether a logical argument proves the stated claim. They are increasingly attracting attention from mathematicians like Tao, Buzzard and myself who want more assurance that our own proofs are correct. And they offer the potential to help democratize mathematics and even improve AI safety.

Suppose I received a letter, in unfamiliar handwriting, from Erode, a city in Tamil Nadu, India, purporting to contain a mathematical proof. Maybe its ideas are brilliant, or maybe they’re nonsensical. I’d have to spend hours carefully studying every line, making sure the argument flowed step-by-step, before I’d be able to determine whether the conclusions are true or false.

But if the mathematical text were written in an appropriate computer syntax instead of natural language, a proof assistant could check the logic for me. A human mathematician, such as I, would then only need to understand the meaning of the technical terms in the theorem statement. In the case of Srinivasa Ramanujan, a generational mathematical genius who did hail from Erode, an expert did take the time to carefully decipher his letter. In 1913 Ramanujan wrote to the British mathematician G. H. Hardy with his ideas. Luckily, Hardy recognized Ramanujan’s brilliance and invited him to Cambridge to collaborate, launching the career of one of the all-time mathematical “greats.”

What’s interesting is that some of the AI IMO contestants submitted their answers in the language of the Lean computer proof assistant so that the computer program could automatically check for errors in their reasoning. A start-up called Harmonic posted formal proofs generated by their model for five of the six problems, and ByteDance achieved a silver-medal level performance by solving four of the six problems. But the questions had to be written to accommodate the models’ language limitations, and they still needed days to figure it out.

Still, formal proofs are uniquely trustworthy. While so-called “reasoning” models are prompted to break problems down into pieces and explain their “thinking” step by step, the output is as likely to produce an argument that sounds logical but isn’t, as to constitute a genuine proof. By contrast, a proof assistant will not accept a proof unless it is fully precise and fully rigorous, justifying every step in its chain-of-thought. In some circumstances, a hand-waving or approximate solution is good enough, but when mathematical accuracy matters, we should demand that AI-generated proofs are formally verifiable.

Not every application of generative AI is so black and white, where humans with the right expertise can determine whether the results are correct or incorrect. In life, there is a lot of uncertainty and it’s easy to make mistakes. As I learned in high school, one of the best things about math is the fact that you can prove definitively that some ideas are wrong. So I’m happy to have an AI try to solve my personal math problems, but only if the results are formally verifiable. And we aren’t quite there, yet.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.

Previous Post

The US can get to the moon first — and still lose

Next Post

Sri Lanka sells extra Rs8.2bn Treasury bills after auction todayheadline

Related Posts

cell

Even a few mutated cells can significantly impact how blood cancers develop, study finds

August 7, 2025
3
YouTube Thumbnail

An Astrophysicist Proposes We Send a Spacecraft to Visit a Black Hole : ScienceAlert todayheadline

August 7, 2025
5
Next Post

Sri Lanka sells extra Rs8.2bn Treasury bills after auction todayheadline

  • Trending
  • Comments
  • Latest
Family calls for change after B.C. nurse dies by suicide after attacks on the job

Family calls for change after B.C. nurse dies by suicide after attacks on the job

April 2, 2025
Pioneering 3D printing project shares successes

Product reduces TPH levels to non-hazardous status

November 27, 2024

Police ID man who died after Corso Italia fight

December 23, 2024

Hospital Mergers Fail to Deliver Better Care or Lower Costs, Study Finds todayheadline

December 31, 2024
Harris tells supporters 'never give up' and urges peaceful transfer of power

Harris tells supporters ‘never give up’ and urges peaceful transfer of power

0
Des Moines Man Accused Of Shooting Ex-Girlfriend's Mother

Des Moines Man Accused Of Shooting Ex-Girlfriend’s Mother

0

Trump ‘looks forward’ to White House meeting with Biden

0
Catholic voters were critical to Donald Trump’s blowout victory: ‘Harris snubbed us’

Catholic voters were critical to Donald Trump’s blowout victory: ‘Harris snubbed us’

0
cell

Even a few mutated cells can significantly impact how blood cancers develop, study finds

August 7, 2025
Ailing bull moose on Adirondack mountain hiking trail in New York put down

Ailing bull moose on Adirondack mountain hiking trail in New York put down

August 7, 2025
UK interest rates cut to lowest level in more than two years

UK interest rates cut to lowest level in more than two years

August 7, 2025
Will the Last Surviving Parent of a Japanese Abductee Get Closure?

Will the Last Surviving Parent of a Japanese Abductee Get Closure? – The Diplomat

August 7, 2025

Recent News

cell

Even a few mutated cells can significantly impact how blood cancers develop, study finds

August 7, 2025
3
Ailing bull moose on Adirondack mountain hiking trail in New York put down

Ailing bull moose on Adirondack mountain hiking trail in New York put down

August 7, 2025
6
UK interest rates cut to lowest level in more than two years

UK interest rates cut to lowest level in more than two years

August 7, 2025
6
Will the Last Surviving Parent of a Japanese Abductee Get Closure?

Will the Last Surviving Parent of a Japanese Abductee Get Closure? – The Diplomat

August 7, 2025
4

TodayHeadline is a dynamic news website dedicated to delivering up-to-date and comprehensive news coverage from around the globe.

Follow Us

Browse by Category

  • Africa
  • Asia
  • Basketball
  • Business & Finance
  • Climate Change
  • Crime & Justice
  • Cybersecurity
  • Economic Policies
  • Elections
  • Entertainment
  • Entrepreneurship
  • Environmental Policies
  • Europe
  • Football
  • Gadgets & Devices
  • Health
  • Medical Research
  • Mental Health
  • Middle East
  • Motorsport
  • Olympics
  • Politics
  • Public Health
  • Relationships & Family
  • Science & Environment
  • Software & Apps
  • Space Exploration
  • Sports
  • Stock Market
  • Technology & Startups
  • Tennis
  • Travel
  • Uncategorized
  • Us & Canada
  • Wildlife & Conservation
  • World News

Recent News

cell

Even a few mutated cells can significantly impact how blood cancers develop, study finds

August 7, 2025
Ailing bull moose on Adirondack mountain hiking trail in New York put down

Ailing bull moose on Adirondack mountain hiking trail in New York put down

August 7, 2025
  • Education
  • Lifestyle
  • Technology & Startups
  • About us
  • Contact
  • Advertise with Us
  • Privacy & Policy

© 2024 Todayheadline.co

Welcome Back!

OR

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Business & Finance
  • Corporate News
  • Economic Policies
  • Entrepreneurship
  • Market Trends
  • Crime & Justice
  • Court Cases
  • Criminal Investigations
  • Cybercrime
  • Legal Reforms
  • Policing
  • Education
  • Higher Education
  • Online Learning
  • Entertainment
  • Awards & Festivals
  • Celebrity News
  • Movies
  • Music
  • Health
  • Fitness & Nutrition
  • Medical Breakthroughs
  • Mental Health
  • Pandemic Updates
  • Lifestyle
  • Fashion & Beauty
  • Food & Drink
  • Home & Living
  • Politics
  • Elections
  • Government Policies
  • International Relations
  • Legislative News
  • Political Parties
  • Africa
  • Asia
  • Europe
  • Middle East
  • Artificial Intelligence
  • Cybersecurity
  • Emerging Technologies
  • Gadgets & Devices
  • Industry Analysis
  • Basketball
  • Football
  • Motorsport
  • Olympics
  • Climate Change
  • Environmental Policies
  • Medical Research
  • Science & Environment
  • Space Exploration
  • Wildlife & Conservation
  • Sports
  • Tennis
  • Technology & Startups
  • Software & Apps
  • Startup Success Stories
  • Startups & Innovations
  • Tech Regulations
  • Venture Capital
  • Uncategorized
  • World News
  • Us & Canada
  • Public Health
  • Relationships & Family
  • Travel
  • Research & Innovation
  • Scholarships & Grants
  • School Reforms
  • Stock Market
  • TV & Streaming
  • Advertise with Us
  • Privacy & Policy
  • About us
  • Contact

© 2024 Todayheadline.co