<- Back
Comments (105)
- lukeinator42The internal dialog breakdowns from Claude Sonnet 3.5 when the robot battery was dying are wild (pages 11-13): https://arxiv.org/pdf/2510.21860
- ummonkI wonder whether that LLM has actually lost its mind so to speak or was just attempting to emulate humans who lose their minds?Or to put it another way, if the writings of humans who have lost their minds (and dialogue of characters who have lost their minds) were entirely missing from the LLM’s training set, would the LLM still output text like this?
- koeng95% for humans. Who failed to get the butter?
- ghostly_sPutting aside success at the task, can someone explain why this emerging class of autonomous helper-bots is so damn slow? I remember google unveiled their experiments in this recently and even the sped-up demo reels were excruciating to sit through. We generally think of computers as able to think much faster than us, even if they are making wrong decisions quickly, so what's the source of latency in these sytems?
- Reason077The most surprising thing is that 5% of humans apparently failed this task! Where are they finding these test subjects?!
- fentoncI built a whimsical LLM-driven robot to provide running commentary for my yard: https://www.chrisfenton.com/meet-grasso-the-yard-robot/
- ge96Funny I was looking at the chart like "what model is Human?"
- AnimatsUsing an LLM for robot actuator control seems like pounding a screw. Wrong tool for the job.Someday, and given the billions being thrown at the problem, not too far out, someone will figure out what the right tool is.
- amelius> The results confirm our findings from our previous paper Blueprint-Bench: LLMs lack spatial intelligence.But I suppose that if you can train an llm to play chess, you can also train it to have spatial awareness.
- WilsonSquaredGuess it has no purpose then
- anonundefined
- FinnucaneI have a cat that will never fail to find the butter. Will it bring you the butter? Ha ha, of course not.
- anonundefined
- zzzeekwill noone claim the Rick and Morty reference? I've seen that show like, once and somehow I know this?
- DubiousPusherI guess I'm very confused as to why just throwing an LLM at a problem like this is interesting. I can see how the LLM is great at decomposing user requests into commands. I had great success with this on a personal assistant project I helped prototype. The LLM did a great job of understanding user intent and even extracting parameters regarding the requested task.But it seems pretty obvious to me that after decomposition and parameterization, coordination of a complex task would much better be handled by a classical AI algorithm like a planner. After all, even humans don't put into words every individual action which makes up a complex task. We do this more while first learning a task but if we had to do it for everything, we'd go insane.
- bhewesSomeone actually paid for this?
- yieldcrv95% pass rate for humanswaiting for the huggingface Lora
- sam_goodyThe error messages were truly epic, got quite a chuckle.But boy am I glad that this is just in the play stage.If someone was in a self driving car that had 19% battery left and it started making comments like those, they would definitely not be amused.
- anonundefined
- hidelooktropicHow can I get early access to this "Human" model on the benchmarks? /s
- fsckboy>Our LLM-controlled office robot can't pass butterwas the script of Last Tango in Paris part of the training data? maybe it's just scared...
- throwawayffffasIt feels misguided to me.I think the real value of llms for robotics is in human language parsing.Turning "pass the butter" to a list of tasks the rest of the system is trained to perform, locate an object, pick up an object, locate a target area, drop off the object.