Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

<- Back

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

hopechong

Comments (44)

kraddypatties
I feel like most of this recent Autoresearch trend boils down to reinventing hyper-parameter tuning. Is the SOTA still Bayesian optimization when given a small cluster? It was ~3 years ago when I was doing this kind of work, haven't kept up since then.Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!
pbkhrv
> How parallelism changed the agent’s research strategy > With a single GPU, the agent is stuck doing greedy hill-climbing: try one thing, check the result, pick a direction, try the next thing. With 16 GPUs, the strategy shifts. ...skip... 12 experiments in a single 5-minute wave. This makes it much harder to get stuck in local optima and much easier to find interaction effects between parameters.The agent can theoretically come up with a protocol to run those same 12 experiments one-by-one and only then decide which branch to explore next - which I think would lead to the same outcome?But in this case, it just happened to have stumbled on this particular outcome only because it didn't get a chance to execute a greedy strategy after the first 1 or 2 results.Worse experiment design + parallelism = better experiment design + serialized execution ?
herf
This "early velocity only" approach seems like a problem - how do you know with 5-minute training runs that you aren't affecting the overall asymptote? e.g., what if the AI picks a quantizer that happens to be faster in the first five minutes, but has a big noise floor where it can't make more progress?
zhwu
The most surprising part: the agent had access to both H100s and H200s. Without being told, it noticed H200s scored better and started screening ideas on H100s, then promoting winners to H200s for validation. That strategy emerged entirely on its own.
fabmilo
I am fascinated by this example of using AI to improve AI. I won a small prize using this technique on helion kernels at a pytorch hackathon in SF.The next step are: - give the agent the whole deep learning literature research and do tree search over the various ideas that have been proposed in the past. - have some distributed notepad that any of these agents can read and improve upon.
covi
This feels like the chimpanzee with a power drill. An agent is honestly just brute-force search, but guided.
ipsum2
A cluster is 2 nodes? That's technically true, but not very exciting.
saberience
Wait, "Karpathy's Autoresearch", you mean a loop that prompts the agent to improve a thing given a benchmark?People have been doing this for a year or more, Ralph loops etc.I hate the weird strange Twitter world of hero-worship for folks that seems to arise just out of large followings.Joe no-followers does this six months ago, nobody cares. Karpathy writes a really basic loop and it's now a kind of AI miracle prompting tons of grifters, copy-cats, weird hype.I do wonder if LLMs have just made everyone seriously, seriously dumber all of a sudden. Most of the "Autoresearch" posts I see are completely rubbish, with AI optimizing for nonsense benchmarks and people failing to understand the graphs they are looking at. So yes, the AI made itself better at a useless benchmark while also making the code worse in 10 other ways you don't actually understand.
robutsume
[dead]
aplomb1026
[dead]
maxothex
[dead]
opensre
[flagged]
ReacherL3692283
[dead]
ladyxtel88
[dead]
pratelsingh
[dead]
mika-el
[flagged]