AI outperforms law professors in Stanford Law study

<- Back

AI outperforms law professors in Stanford Law study

berlianta

Comments (299)

godelski
I find this study quite suspect. I'd have to dive deeper but there's definitely significant alarm bells that should be going off for anyone reading.Figure 2 (page 6) screams problems. There's only 16 professors (3k comparisons each?!?!) and the professors are all over the place. That's very high variance, suggesting the study has no meaningful statistical power. Poor instructor 16 can't catch a break lolThere's also really clear bias given that the main results only feature Google models. Other models show up elsewhere, why not there?I'm no lawyer, but I'm a pretty competent statistician and can confidently say this paper has a smell to it. I can't call it bullshit, but there are red flags all over
aristofun
In general it is not surprising. Even if this particular study is bad.There are certain areas of law work that are about analyzing large amounts of texts, drawing conclusions and writing other texts based on that and nothing more. That is literally the bread of LLMs.Those types of lawyers should be the first in line for unemployment, not programmers, not even close.
causal
As a software engineer I have some intuition for what the risks are of letting agents do some tasks vs others.I don't have a similar intuition calibrated for what could go wrong when asking AI to draft a legal document. Some things seem harmless, i.e. drafting a will, but I don't really know- our legal system is notoriously rife with footguns.
finnborge
I understand why the conversation on this article looks like it does, but the study is specifically focused on the potential for LLMs to operate as tutors for law students. I enjoy the extrapolation out to whether LLMs will replace lawyers, but did not find that to be discussed in the study itself.In the framing of using LLMs as legal tutors, with the implication of lowering the cost of legal training, this seems like a socially-positive outcome. Furthermore, it feels kind of intuitive to me that any contemporary system operating with an LLM and access to legal reference material will be prepared to answer _student-originated questions_ comprehensively and with breadcrumbs or direct references to educational/source materials, as seems to have been found in the study.The authors explicitly and intentionally emphasize that many legal questions require contextualization, as opposed to some discrete calculated answer. The result of the study implies that the LLM-based systems were capable of using what many of us here understand to be the "stochastic best-fit algorithmic generation" of a contemporary language model to adequately contextualize a student's question, providing insight into the trade-offs or complications implicit in the question, while then, critically, _meeting the professional standards of legal educators in explaining that complexity to a student_.Realistically, I would hope this provides some confidence to readers of HN that they can actually ask a legal question to an LLM and expect the response will explain the complexity of the law in relation to the question. This is great news, and is likely the minimal pre-work any of us should do before actually consulting a lawyer, if time permits.On the other hand, I do _not_ think that this study provides any indication that an LLM is prepared to actually provide direct legal counsel. Possibly in the same way that a legal textbook does not replace legal counsel, or perhaps more accurately, the same way that stumbling upon a legal case study for approximately the same situation you're in doesn't guarantee you'll have the same result.
quantisan
I'm surprised Stanford Law would go along with this over-reaching press release title. How about "For common first-year contracts-law questions, law professors preferred AI-generated answers to professor-generated answers"
ulrischa
By its very nature, the field of law is ideally suited for AI language models. Fundamentally, everything is based on interconnected texts. I believe that even larger waves of layoffs could loom here than in the IT sector. However, it is likely that a more powerful lobby will be at work here—one that will grossly inflate the perceived value of their work and shield it from outside intrusion.
chewbacha
My best guess is that Gemini was trained on the textbooks that the questions are meant to test against, thus they are probably better at explicit recall of those questions or related questions.This is a pretty limited introductory course based on what it says in the methods of the paper itself.
damnesian
Does the "outperforming" conclusion incorporate the appropriateness of decisions? Or just if things are technically correct. Without human eyes on cases, things could easily get very off track. AI can do a lot of data wrangling, but there is no conscience.
Danox
Sure it does AI multiple IPOs incoming...
applicative
What the LLM cannot do is explain why it said what it said, when cross-examined. It simply hallucinates the best account of why someone would have said such a thing as it said, same as it can give a probable account of why someone else said something different. The question 'But why did you say this not that ...?' does not lead it to make explicit its grounds for what it said, but just to make a new more complicated statement.
TrackerFF
In many (most?) countries you can defend yourself, waive your court appointed attorney. You are of course highly discouraged to do so. But sometimes people do it, mostly for smaller claims where they don't want to rack up legal bills for things which might cost more than what is at stake.But, it makes me wonder, will clients be able to use these AI-attorney systems in the future, in the court. Where they basically either just parrot what the model is instructing them to do, or - I dunno - give the model permission to speak for them (while waiving liabilities).I have no doubt that some complex AI system can perform better than a bottom-tier, overworked lawyer.
rockskon
I do question at what point AI could be useful as a teaching aid.The quality of LLMs depends heavily on, among other things, how you word your questions.Knowing the correct questions to ask is not something most students know how to do given that it tends to require a fair bit of pre-existing domain knowledge.
piker
Having been a law student and practicing lawyer, it's clear to me that law professors aren't really representative of much if any part of private practice. Most of the things they think and reason about are quite theoretical and academic, and it doesn't surprise me that the models would regurgitate a more average response which most human graders would prefer.That's the entire point, though!The legal academy is supposed to have outlying opinions on things and present novel philosophical answers to questions. (And questions to answers!) So in addition to the statistical arguments against this paper made elsewhere, to me it doesn't real much new information.
songting591
The interesting shift isn't whether AI beats law professors on tests â it's what happens to the value chain after that threshold is crossed.When AI clears the knowledge bar in a domain, the remaining moat becomes trust, accountability, and local regulatory context. That's actually good news for niche SaaS builders targeting specific jurisdictions: the generic AI layer commoditizes, but the "AI + local compliance + human accountability" bundle still has real pricing power.Curious whether anyone has seen this play out already in contract review or compliance tooling outside the US.
mchl-mumo
16 is such a small number for what they phrase as an important finding. It really couldn't be much harder to coordinate with 100+ professors.
galaxyLogic
I'm going to need some legal help for my startup. But I can't pay much. So I figured I will ask AI all relevant questions, as well as forms filled etc. Perhaps even create a patent-application for me.THEN I find a human lawyer and give AI's answers to them and say "Can you find any errors in this? Can you improve it?" .That way I think my legal bills should be smaller because the AI has already done most of the work. What do you think? Which LLM is best for legal work?
throw7
Oh, a "Human-Cented" study by AI lover:Julian Nyarko Professor of Law Co-Chair Stanford Law AI Initiative Senior Fellow, Stanford Institute for Human-Cented AI (HAI) LOL!
weatherlite
It is important for society to understand it is not merely programmers and customer support who are at risk of losing their jobs. Clearly A.I can do much more than just program.
epicureanideal
One way to make legal services more affordable and accessible would be to put the burden of ensuring the AI legal services are accurate on a private-public partnership with the government.If a person using the service is given inaccurate legal advice and acts on that advice, the person can't be charged with a crime, can't be given any civil penalties, etc., as long as the law in question is non-obvious.Obviously if by some exploit, some fundamentally obvious crime (murder, theft, obvious fraud, etc.) is said to be legal, that wouldn't apply, but of course the service should try to prevent those kinds of exploits anyway.Could limit this to something like business regulations to begin with, or even specifically for small businesses, or contracts within some time limit and dollar amount that would otherwise be coverable by small claims court, etc.
motbus3
As others pointed. It kind implies it surpasses professors, but reading more carefully it seems more like the mythos situation. There was a single professor or test that it surpasses.Reading it makes me extremely suspicious on how cherry picked this was
francisdavey
I'm not a law lecturer. I spend most of my time wrangling contracts and advising about data law. But I did a stint of part-time work teaching a masters in law.My experience then (this was back before "Attention Is All You Need", I hadn't met the output of generative models) was that students tended to produce work that did not have a proper thread of reasoning in it. There was a tendency to repeat things they had read but rehashed in various ways.Reviewing some of their texts it was clear that much of the writing - by law tutors - was of the same kind. Much was incorrect. The fact that someone at some time had said a particular case was a proposition for something, meant that got repeated from book to book. Many authors simply didn't read their sources or check their references. Students repeated what they had been told incuriously.Note: this was a graduate level course. Not wet about the ears undergraduates.The worst material was little potted notes produced for law students. Utterly awful material in most cases.Anyway, when LLM's became a thing, a lot of what did not feel right about their output and many of their error patterns, reminded me of the experience of teaching masters' students.One of the saving graces of English court room practice (when I did that sort of thing) was that judges would say to you "where does it say that?" in a case you cited. You had better have them all at your fingertips and know exactly where you had cited. That avoided a lot of hallucination.Just a random remark which might be of interest.
aitchnyu
Tangential, is there a "test suite/CI" for AI writing legal documents? Long back in terms of AI progress, a lawyer filed something with hallucinated sources. Do new tools prevent this?
RataNova
I'd read this less as "AI replaces law professors" and more as "AI may be a surprisingly strong first-pass tutor, especially when the student knows enough to question it"
elnatro
When I see news pieces like this I wonder about the failures. Maybe the failure percentage is low but what happens if a bot gives bad counseling? Who is responsible then?Attorneys will be using LLMs for convenience but they will not disappear, because there needs to be an ultimately human responsible of the decisions.
KnuthIsGod
In the hands of a domain expert, AI is useful. In the hands of the naive, it is a foot gun.I killed my Arch installation and was stuck at the GRUB prompt.Unwilling to brush up my rusty knowledge of GRUB syntax, I asked Gemini for help. The commands Gemini suggested would have wiped my hd...Once Gemini was told that I was using BTRFS, the suggestion from Gemini looked a bit more sane, but still looked incorrect to me.It was only after I informed Gemini that I was using a NMVE with BTRFS that it finally produced a sane command.
dguest
I'm not a lawyer, I program.My understanding is that Civil Law (most of the world excluding UK, US, AU) is like a program: you feed it a situation, it outputs a decision, every once in a while you edit it.Common Law (UK, US) isn't really a program, but you could stretch and say it's a state machine that has been running since the country started. Every interaction sets a new precedent and changes the state. But the programming analogy falls apart because no one in the right mind would design such a program.LLMs might actually be the best example of such a program though: Common Law is basically one long chat with an LLM, hundreds of years long.Before LLMs came along, a Common Law system seemed to have a finite time limit before it's co-opted by wealthy people with the resources to read the whole history. Now I think maybe can push it a bit further.But it's still a terrible program.
eichi_uehara
I beat lawyers twice before generative AI even existed. Recently I asked Gemini a few questions about personal conflicts in everyday life. It's often too conservative, with views too shallow for the problem. So I still handle human conflicts myself. I only outsource the templated stuff like routine chat replies or marketing copy though it saves me huge amount of time. People who quote AI in serious conflicts are too weak to handle them on their own.
airstrike
Yes, LLMs are great at search. That's not news.
Aperocky
> rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.That's the problem, you never know when the 25% deliver a true stink bomb, and that's not considering prompting - while a fair prompt/question maybe considered objective, it's very easy to stray.
anon
undefined
u1hcw9nx
After quick look of study details and statistics, it does not look very definitive in one way or another.I mean, LLM's do OK with tutoring, but it depends more of how unique the questions are, not how difficult they are.
Esophagus4
Yeah this could be interesting. A lot of the spotlight has been on “law firm stuff” like demand letters and writing contracts…But imagine if a dev team didn’t have to go engineer -> product manager -> legal team to get a question answered on local data retention requirements. You could ship that much faster.
tipsytoad
Curious how they do a “blind” preference test. To any evaluator I’m sure it’s quite clear which answer is AI vs human.
himata4113
There is quite a simple solution for many of the problems described in the comments: Make drafting legal papers a defined interface.If you think about it and extract sematics of any law you get something that looks familiar, sort of like code. Of course there's some complexities where certain phrases can mean different things, but legal papers in a way are written like they're programming languages already especially when it comes to law.First we would have to define a language that can handle ambigious operations and we alread y have this with programatic proofs where n should land in x. So in the end I'd assume it would look something like this in a two party dispute:This is very simplified and pseudo like language, writing out a full contract would be as long as a real contract. DEFINE DEFENDANT "A Corp" DEFINE PLAINTIFF "B Corp" DEFINE CONTRACT CONTRACT(PLAINTIFF, DEFENDANT, 3054-41-95) // attaching extracted requirements, definitions and obligations of contract FACT PLAINTIFF delivered(goods) ON 7054-34-99 FACT DEFENDANT paid(0) OF CONTRACT.amount CLAIM breach WHEN obligation(DEFENDANT, "pay") IS NOT satisfied PROVE breach: REQUIRE PLAINTIFF performed REQUIRE DEFENDANT.paid < CONTRACT.amount ASSERT delay WITHIN reasonable(time) IF PROVE(breach): AWARD PLAINTIFF (CONTRACT.amount - DEFENDANT.paid) + interest() ELSE: DISMISS Then you would run a proof based LLM to generate it into target language and since we already had an example of this from one of the AI labs we know it works. Automatic citations and supporting proof would be automatically populated from reviewed legal -> DSL extracted papers as supporting evidence.I am sure that many AI labs are working on something similar already and we will see something like that in the near future as proof based llms evolve.
vessenes
* Gemini 2.5 Pro (no outside resources), and * NotebookLM (not versioned -- with added legal resources).NotebookLM was considered slightly better than 2.5 Pro by the evaluators.
wilg
> In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.75% win rate seems pretty good!Paper link: https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...
teiferer
Question is: if a legal question is answered incorrectly by an LLM, who is going to be held responsible?
king_zee
I think there will be a market for firms that aggressively market themselves as non-AI, and then as more people turn towards that human connection we'll go full circle
atleastoptimal
And this was done with Gemini 2.5By the time any research study is done on AI is published the models are already 0.5-1 generation ahead. Even this bullish outcome for AI models and their ability to perform useful work does not reflect how good they are now.
iLoveOncall
The title of the study "Law Professors Prefer AI Over Peer Answers" is VERY different from the title on HackerNews. This is completely clickbait at this point.
lp4v4n
Honestly it's not surprising that AI provided answers that were flagged less often as "pedagogically harmful" if we take in account that somehow LLMs create an "average" of all knowledge they ingested.
gaiagraphia
Incredible that the common people will be able to wrestle the right to rule of law away from the bloated legal caste, who have built themselves quite the moat.The inaccessibility of justice is a huge driver of inequality. Any tools which bridge this gap will help make a more just society.
homeonthemtn
Personally I think this is very good. One of the hardest things out there is maintaining a society in the face of changing times and it's because law is dense and slow.I think, in the right hands, this could be huge.
tj_hustler_1966
interesting
gamblor956
While they provided the questions that professors and LLMs were asked to respond to, they don't include any of the answers from either the humans or the LLMs, so there's no way to independently verify that the LLMs actually returned "better" answers.Given the number of responses the professors were asked to rate (200 each), they probably graded them the same way that bar exam responses are graded: quickly and superficially. Not surprising that LLMs achieved higher scores in this scenario, since they excel at producing superficially nice answers that don't hold up under scrutiny.Also...unless statistics has changed in the past 2 decades, the math in the charts doesn't math. That's probably why they're leaving out the actual numerical data. I also wouldn't be surprised if we learn in the coming days that the charts were AI generated.
Thaxll
AI will never convince a jury though.
xyzal
This contradicts my anecdata.Recently, I tasked Opus 4.6 to study a new Czech building permit law in conjunction with some waste disposal regulations and the result was disappointing. The model could not stop drawing conclusions from obsolete regulations in its training dataset, even when given the fulltext of the new law. The usual "you are totally right" also applied and its conclusions were most of the time obviously wrong even to a human with cursory knowledge of the subject.I ended with studying the relevant regulations myself over the weekend.
cess11
I skimmed portions of the study but didn't manage to figure out whether this actually measures a preference for confident mediocrity.
Eufrat
What is the point of this conclusion? That law professors like the tone and verbosity of AI slop? Okay?
t0lo
Library outperforms student... more news at 9
34981t
He is basically an AI professor for law. This study just confirms his existence:https://juliannyarko.com/Stanford and its donors of course want to replace anyone but its administrators, so they cheer on such anti-intellectual nonsense.
flanked-evergl
...
rimliu
Yes yes, the IPO is near.
infoinlet
[flagged]
charliewang0322
[dead]
dfilppi
[dead]
steele
[flagged]
jimbokun
[flagged]
fgh_ask
[flagged]
aetq51
[flagged]
t0lo
More great news from the prestigious university where 40% of students claim they are disabledhttps://fortune.com/article/rise-in-elite-students-seeking-a...and where they wanted to ban words such as "chief", "stupid", "karen" and "American"https://reason.com/2022/12/21/stanford-elimination-harmful-l...
bko
Marc Andreessen argued that we've already reached AGI. He says that the top AI models give better answers than 99% of people he has access to, and he has access to some of the best people in their field.I'm getting more convinced. I mean, sure it makes dumb mistakes sometimes but its a particular set of self serving mistakes, commenting out tests in order to pass. We obv don't want this behavior but I wouldn't say it's dumb.It'll be like the Turing test, which we just blew past years ago and no one cared. After all the hand-wringing about sentience and rights of the AI if it passes the Turing test, and now we just have AI bots running 24/7 writing slop.How does everyone else feel?
IFC_LLC
This is exactly what LLM designed to do. Double up a lot of data and find connections and patterns in it.So no wonder on this point.One thing I want to mention: Law != Justice.So while LLMs are awesome at the law study they will suck at justice. Just because one has to solve very emotional problems with it at times. And LLMs are not that good at finding the correct emotion.