DeepMind and OpenAI win gold at ICPC

<- Back

DeepMind and OpenAI win gold at ICPC

notemap

Comments (198)

amluto
I've contemplated this a bit, and I think I have a bit of an unconventional take:First, this is really impressive.Second, with that out of the way, these models are not playing the same game as the human contestants, in at least two major regards. First, and quite obviously, they have massive amounts of compute power, which is kind of like giving a human team a week instead of five hours. But the models that are competing have absolutely massive memorization capacity, whereas the teams are allowed to bring a 25-page PDF with them and they need to manually transcribe anything from that PDF that they actually want to use in a submission.I think that, if you gave me the ability to search the pre-contest Internet and a week to prepare my submissions, I would be kind of embarrassed if I didn't get gold, and I'd find the contest to be rather less interesting than I would find the real thing.
modeless
More information on OpenAI's result (which seems better than DeepMind's) from the X thread:> our OpenAI reasoning system got a perfect score of 12/12> For 11 of the 12 problems, the system’s first answer was correct. For the hardest problem, it succeeded on the 9th submission. Notably, the best human team achieved 11/12.> We had both GPT-5 and an experimental reasoning model generating solutions, and the experimental reasoning model selecting which solutions to submit. GPT-5 answered 11 correctly, and the last (and most difficult problem) was solved by the experimental reasoning model.I'm assuming that "GPT-5" here is a version with the same model weights but higher compute limits than even GPT-5 Pro, with many instances working in parallel, and some specific scaffolding and prompts. Still, extremely impressive to outperform the best human team. The stat I'd really like to see is how much money it would cost to get this result using their API (with a realistic cost for the "experimental reasoning model").
flimflamm
Good to note that OpenAI solved 12/12 and DeepMind 10/12.
JohnKemeny
I went to ICPC's web pages, downloaded the first problem (problem A) and gave it to GPT-5, asking it for code to solve it (stating it was a problem from a recent competitive programming contest).It thought for 7m 53s and gave as reply # placeholder # (No solution provided)
birktj
They apparently managed gold in the IOI as well. A result that was extremely surprising for me and causes me to rethink a lot of assumptions I have about current LLMs. Unfortunately there was very little transparency on how they managed those results and the only source was a Twitter post. I want to know if there was any third party oversight, what kind of compute they used, how much power what kind of models and how they were set up? In this case I see that DeepMind at least has a blog post, but as far as I can see it does not answer any of my questions.I think this is huge news, and I cannot imagine anything other than models with this capability having a massive impact all over the world. It causes me to be more worried than excited, it is very hard to tell what this will lead which is probably what makes it scary for me.However with so little transparency from these companies and extreme financial pressure to perform well in these contests, I have to be quite sceptical of how truthful these results are. If true I think it is really remarkable, but I really want some more solid proof before I change my worldview.
smokel
The best thing of the ICPC is the first C, which stands for "collegiate". It means that you get to solve a set of problems with three persons, but with only one computer.This means that you have to be smart about who is going to spend time coding, thinking, or debugging. The time pressure is intense, and it really is a team sport.It's also extra fun if one of the team members prefers a Dvorak keyboard layout and vi, and the others do not.I wonder how three different AI vendors would cooperate. It would probably lift reinforcement learning to the next level.
ferguess_k
I think in the future information will be more walled -- because AI companies are not paying anyone for that piece of information, and I encourage everyone to put their knowledge on their own website, and for each page, put up a few urls that humans won't be able to find (but can still click if he knows where to find), but can be crawled by AI, which link to pages containing falsified information (such as, oh the information on url blah is actually incorrect, here you can find the correct version, with all those explanations, blah blah -- but of course page blah is the only correct version).Essentially, we need to poison AI in all possible ways, without impacting human reading. They either have to hire more humans to filter the information, or hire more humans to improve the crawlers.Or we can simply stop sharing knowledge. I'm fine with it, TBF.
patrickhogan1
This is impressive.Here is the published 2025 ICPC World Finals problemset. The "Time limit: X seconds" printed on each ICPC World Finals problem is the maximum runtime your program is allowed. If any judged run of your program takes longer than that, the submission fails, even if other runs finish in time.https://worldfinals.icpc.global/problems/2025/finals/problem...
sinuhe69
I wonder whether they allowed humans input for the AI besides the initial generic prompt? Could they provide guidance for the AI?We all know that by this kind of problems, intuition/guiding principles to transform the problem is all you need. The human may not be fast enough or error-free to sample correctly the already restricted solution space, but machine can. And for them, it’s a huge advantage. So did they allow human input (as part of a centaur team!) input or not?These AI teams often have one of the best (ex-) competitive programmers.
NitpickLawyer
So this year SotA models have gotten gold at IMO, IoI, ICPC and beat 9/10 humans in that atcoder thing that tested optimisation problems. Yet the most reposted headlines and rethoric is "wall this", "stangation that", "model regression", "winter", "bubble", doom etc.
Imnimo
My understanding is that the way they do this is have some number of model instances generating solution proposals, and then another model which chooses which candidates to submit.I haven't been able to find information on how many proposals were generated before a solution was chosen to submit. I'm curious to know whether this is "you can get ICPC gold medal performance with a handful of GPT-5 instances" or "you will drown yourself in API credit debt if you try this".Still extremely impressive either way.
HarHarVeryFunny
ICPC = The International Collegiate Programming Contest. These are college level programmers, not elite competitive programmers.Apparently Gemini solved one problem (running on who knows what kind of cluster) by burning 30 min of "thinking" time on it, and at a cost that Google have declined to provide.According to one prior competition paricipant, writing in the comments section of this ArsClasica coverage, each year they include one "time sink" problem that smart humans will avoid until they have tackled everything else.https://arstechnica.com/google/2025/09/google-gemini-earns-g...This would all seem to put a rather different spin on this. It's not a case of Google outwitting the worlds best programmers, but rather that by searching for solutions for 30 min on god knows what kind of cloud hardware, they were able to get something done that the college kids did not have time to complete, or deem worthwhile starting.
Vegenoid
While very cool, this feels like another instance of the kind of thing that we already know they are good at: self-contained, perfectly-specified problems that can be done by humans in a short timespan (especially when a team of highly skilled engineers behind the model is wielding it). Yes, it's amazing that a computer can do this, consider what they could do 10 years ago to today, so on and so on - but I don't see this and go "holy shit", I see this and go "yep".I wish they went into more detail about how exactly the interaction with the LLM works - I'm pretty sure there's significantly more to it than "drop the paper with the problems into a text box and hit go".
ototot
Given that ICPC problems are in general easier than IOI problems. I wouldn't be surprise to see they can get Gold (even perfect scores) in ICPC.Nonetheless, I'm still questioning what's the cost and how long it would take for us to be able to access these models.Still great work, but it's less useful if the cost is actually higher than hiring someone with the same level.
jaggs
I think it's becoming clear that these mega AI corps are juggling with their models at inference time to produce unrealistically good results. By that it seems that they're just cranking up the compute beyond reasonable levels in order to gain PR points against each other.The fact is most ordinary mortals never get access to a fraction of that kind of power, which explains the commonly reported issues with AI models failing to complete even rudimentary tasks. It's now turned into a whole marketing circus (maybe to justify these ludicrous billion-dollar valuations?).
m3kw9
i'm still waiting for LLMs to give us one profound science breakthrough
z7
Current cope collection:- It's not a fair match, these models have more compute and memory than humans- Contestants weren't really elite, they're just college level programmers, not the world's best- This doesn't matter for the real world, competitive programming is very different from regular software engineering- It's marketing, they're just cranking up the compute to unrealistic levels to gain PR points- It's brute force, not intelligence
ChrisArchitect
Sharing links to a couple of tweets is not a blog post.Google source post: https://deepmind.google/discover/blog/gemini-achieves-gold-l... (https://news.ycombinator.com/item?id=45278480)OpenAI tweet: https://x.com/OpenAI/status/1968368133024231902 (https://news.ycombinator.com/item?id=45279514)
d--b
Two words: Uh oh
antegamisou
Make that shit cure cancer/disease and abstain from that modern Space race equivalent BS ffs.
huflungdung
[dead]
bgwalter
A database is good at leetcode, who would have thought. Give humans a database and they'll outperform your "AI" (which probably uses an extraordinary amount of graphics cards and electricity).It is an idiotic benchmark, in line with the rest of the "AI" propaganda.
sameermanek
Whats the point? These models are still unreliable in every day work. And they're getting fat! For a moment, they were getting cheaper, but now they are only getting bigger and this is not going to be cheap in the future. The point is, what are we investing a trillion dollars in?