Better link:
<a href="https:&#x2F;&#x2F;iquestlab.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;iquestlab.github.io&#x2F;</a><p>But yes, sadly it looks like the agent cheated during the eval

TL;DR is that they didn&#x27;t clean the repo (.git&#x2F; folder), model just reward hacked its way to look up future commits with fixes. Credit goes to everyone in this thread for solving this: <a href="https:&#x2F;&#x2F;xcancel.com&#x2F;xeophon&#x2F;status&#x2F;2006969664346501589" rel="nofollow">https:&#x2F;&#x2F;xcancel.com&#x2F;xeophon&#x2F;status&#x2F;2006969664346501589</a><p>(given that IQuestLab published their SWE-Bench Verified trajectory data, I want to be charitable and assume genuine oversight rather than &quot;benchmaxxing&quot;, probably an easy to miss thing if you are new to benchmarking)<p><a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1q1ura1&#x2F;iquestlabiquestcoderv1_swebench_score_is&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1q1ura1&#x2F;iquestl...</a>

GLM-4.7 in opencode is the only opensource one that comes close in my experience and probably they did use some Claude data as I see the occasional You’re absolutely right in there

A 40B weight model that beats Sonnet 4.5 and GPT 5.1? Can someone explain this to me?

This is a lie, so why is it still on the front page?

Has anyone run this yet, either on their own machine or via a hosted API somewhere?

HN

IQuest-Coder: A new open-source code model beats Claude Sonnet 4.5 and GPT 5.1 [pdf]

Comments (39)