A recent experience with ChatGPT 5.5 Pro

<- Back

A recent experience with ChatGPT 5.5 Pro

_alternator_

Comments (181)

kang
> The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.5.5pro is amazing but this implication might not be true & is the core argument of this piece.AI will prove all sort of things - interesting, boring & incorrect.To sort it will be the task of the PhD.
ziotom78
I am a physics professor and often use Gemini to check my papers. It is a formidable tool: it was able to find a clerical error (a missing imaginary unit in a complex mathematical expression) I was not able to find for days, and it often underlines connections between concepts and ideas that I overlooked.However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.
iandanforth
I found the section on publishing very interesting. Even if the quality of the output is up to snuff, where should it go? Arxiv doesn't allow AI written work. The author proposes that only work that has been certified by human should be published. However, now the field is in the same boat as software engineering where we are facing a glut of pull requests and not enough time and people to review them.
pmontra
It's a very long post with a mix of technical (math) and philosophical sections. Here are the most striking points to reflect upon IMHO.> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.The point is perhaps confirmed by another comment further down in the post> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good codersPeople pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
MrDrDr
> "Even though I can motivate it in retrospect, ChatGPT’s idea to use h^2-dissociated sets to control relations of order at most h feels quite ingenious. As far as I can tell, this idea is completely original."The question that keep bothering me is can an LLM generate an idea that is truly novel? How would/could that actually happen? But then that leads to the question - what are we actually doing when we think?Perhaps it's as simple as the ability to just make mistakes that matters, the same things that powers evolution. As long as the LLM can make mistakes, it's capable of generating something genuinely novel. And it can make more mistakes much faster than we can.
mxwsn
> Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here. As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.
zkmon
>> but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paperThe "non-trivial" is for human abilities. The weights lifted by a crane are also "non-trivial". People keep getting amazed at machine's abilities. Just like a radio telescope can see things humans can't, microscope can see the detail humans can't, we need not be amazed. The sensory perception of patterns is at different level for AI. It's a machine.
NotOscarWilde
As a TCS assistant professor from Eastern Europe, I always am a little jealous of the biggest names in math having such an easy access to the expensive, long thinking models.Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)Apologies for the rant.
few
>So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody.This made me a little sad
MinimalAction
As a graduate student, this piece made me sad. I always believed that my work speaks for itself and transcends beyond my limited time on this cosmic experience. This notion of immortality was just a small intangible bonus I hoped for when I jumped into grad school. AI is making me feel less worthy.
bustermellotron
I saw Tim Gowers give a talk at the AMS-MAA joint meeting in Seattle about ten years ago where he predicted that in 100 years humans would no longer be doing research mathematics. I wonder if he’s adjusted his timeline.At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).
amelius
Makes sense as a mathematician basically has two powers (1) using their intuition and (2) an enormous amount of mental stamina. A mathematician builds their intuition by reading maths books. It is thus not surprising that an LLM is well equipped to take over the tasks of the mathematician.
momojo
Sorry, I'm reposting a comment I made yesterday that seems fitting:> This reminds me of Antirez's "Don't fall into the anti-AI hype". In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".
arjie
The question of where the creative input is was a big thing around Experiments in Musical Intelligence and co-composing. But it seems perhaps that it’s a transient state we needn’t spend too much effort it. The machine has failed to disappoint repeatedly. Perhaps this is as far as it gets or perhaps we will be like people in Catching Crumbs by the Table by Ted Chiang where almost all science is interpretation of papers by vastly greater intellects.
dabinat
I feel like this experiment was successful because those prompting the AI were knowledgeable enough to ask the right questions and verify the output was correct. This shows that there is still a place for expertise, even if the LLM does the actual research.
lysecret
There is a great recent episode of latent space about a similar topic it’s worth a watch even with the click baiti thumbnail and title https://youtu.be/9d899Ram9Bs?is=pQMoVmlWVsTNKfRK
iTokio
On complex problems with lengthy proofs, the first step that I would have done is to ask 5.5 pro in a new, unrelated, session, to be very critical, to try to find flaws in the arguments.And certainly not to send it to a fellow colleague to ask its opinion first.LLMs are certainly becoming capable to code, find vulnerabilities, solve mathematical problems, but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.Otherwise tech leads, maintainers, experts get overwhelmed and this is how the « AI slop » fatigue begins.To be clear I’m talking about this step:> That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
zingar
The post talks about LLM+human contributions being recognized in some different category from human-only. But is it possible to spot the difference between the two?
fulafel
Link to source blog post: https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
adammdaw
This is certainly interesting, though I would say that based on my understanding of how the current models work combinatorial problems would be an area where they could be particularly successful. They are pretty good at combinatorial creativity - its the exploratory and transformational aspects that are still pretty tricky, and I expect would come to bear in other areas of mathematics.
zkmon
>>
ionwake
one thing I was wondering, is, if LLMs are word completions seemingly coming up with new solutions could this just be because stuff that was kept secret and now - is no longer is due to ingestion? I dont know enough about it tho
__rito__
> So maybe there should be a different repository where AI-produced results can live.Does the author know about CAISc 2026 [0]?[0]: https://caisc2026.github.io
incrediblylarge
A month ago my PhD supervisor told me it rips on proofs but he also said it's useless when formalising arguments in Lean - is this still the case?
CharlesLau
Is the assessment system of undergraduate mathematics education no longer effective?
globular-toast
I wish people would stop generating stuff they don't understand only to forward it to someone who does. Something about that really rubs me the wrong way.
locknitpicker
From the article:> Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.This is exactly what leads me to believe that the real impact of LLMs in human history is yet to come. My work as a researcher was mostly spent on two classes of workloads: reading papers that were recently published to gather ideas and keep up with the state of the art, and work on a selection of ideas gathered from said papers to build my research upon. It turns out that LLMs excel at the most critical component of both workloads: parsing existing content and use it when prompting the model to generate additional content based on specific goals and constraints. I mean, papers are already a way to store and distribute context.
adaml_623
"It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour"This comment about time is very interesting to me. I know it's "just" doing mathematical proofs but the possibilities of speeding up planning, proposals and decision making in the physical world should excite people.
SubiculumCode
I honestly can't say this isn't AGI anymore. AGI shouldn't be a bar so taboo that it has to be at the extreme capability in every domain. What human is?This is as AGI as it needs to be to get my vote. And it's scary.
jdw64
[dead]
shevy-java
[flagged]
verisimi
[flagged]
slopinthebag
AI generated article btw.Maybe if you find AI to be doing stuff you find impressive, the stuff you were doing wasn't that impressive? Worth ruminating on your priors at least.
zuogl
The HTML generation is surprisingly good because the training corpus for markup is cleaner than most programming languages.
bambax
> quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniquesCreativity is connecting ideas from different domains and see if something from one field applies to another. I do think AI is overhyped generally; but a major benefit from AI could be that after ingesting all the existing human knowledge (something no single human can ever hope to achieve) it would "mix and connect" it and come up with novel insights.Most published research sits ignored and unread; AI can uncover and use everything.
einrealist
"After 16 minutes and 41 seconds, it came back" ... "further 47 minutes and 39 seconds" ... "After 13 minutes and 33 seconds" ... "After 9 minutes and 12 seconds" ... "After 31 minutes and 40 seconds" ... plus other computationsAnyone spotting the issue here? What did that really cost?I am not against compute being used for scientific or other important problems. We did that before LLMs. However, the major LLM gatekeepers want to make all industries and companies dependent on their models. And, at some point, they need to charge them the actual, unsubsidized costs for the compute. In the meantime, companies restructure in the hopes that the compute costs remain cheap.