Fighting Fire with Fire: Scalable Oral Exams

<- Back

Fighting Fire with Fire: Scalable Oral Exams

sethbannon

Comments (144)

michaelt
> We surveyed students before releasing grades to capture their experience. [...] Only 13% preferred the AI oral format. 57% wanted traditional written exams. [...] 83% of students found the oral exam framework more stressful than a written exam.[...]> Take-home exams are dead. Reverting to pen-and-paper exams in the classroom feels like a regression.Yeah, not sure the conclusion of the article really matches the data.Students were invited to talk to an AI. They did so, and having done so they expressed a clear preference for written exams - which can be taken under exam conditions to prevent cheating, something universities have hundreds of years of experience doing.I know some universities started using the square wheel of online assessment during covid and I can see how this octagonal wheel seems good if you've only ever seen a square wheel. But they'd be even better off with a circular wheel, which really doesn't need re-inventing.
lifetimerubyist
This is all so crazy to me.I went to school long before LLMs were even a Google Engineer's brianfart for the transformer paper and the way I took exams was already AI proof.Everything hand written in pen in a proctored gymnasium. No open books. No computers or smart phones, especially ones connected to the internet. Just a department sanctioned calculator for math classes.I wrote assembly and C++ code by hand, and it was expected to compile. No, I never got a chance to try to compile it myself before submitting it for grading. I had three hours to do the exam. Full stop. If there was a whiff of cheating, you were expelled. Do not pass go. Do not collect $200.Cohorts for programs with a thousand initial students had less than 10 graduates. This was the norm.You were expected to learn the gd material. The university thanks you for your donation.I feel like i'm taking crazy pills when I read things about trying to "adapt" to AI. We already had the solution.
ordu
> We love you FakeFoster, but GenZ is not ready for you.Don't tell me about GenZ. I had oral exams in calculus as undergrad, and our professor was intimidating. I barely passed each time when I got him as examiner, though I did reasonably well when dealing with his assistant. I could normally keep my emotions in check, but not with my professor. Though, maybe in that case the trigger was not just the tone of professor, but the sheer difference in the tone he used normally (very friendly) and at the exam time. It was absolutely unexpected at my first exam, and the repeated exposure to it didn't help. I'd say it was becoming worse with each time. Today I'd overcome such issues easily, I know some techniques today, but I didn't when I was green.OTOH I wonder, if an AI could have such an effect on me. I can't treat AI as a human being, even if I wanted to, it is just a shitty program. I can curse a compiler refusing to accept a perfectly valid borrow of a value, so I can curse an AI making my life difficult. Mostly I have another emotional issue with AI: I tend to become impatient and even angry at AI for every small mistake it does, but this one I could overcome easily.
Twirrim
So what's next? Students using AIs with text-to-speech to orally respond to the "oral" exam questions from an AI?Where do we go from there? At some point soon I think this is going to have to come firmly back to real people.
Aurornis
> Many students who had submitted thoughtful, well-structured work could not explain basic choices in their own submission after two follow-up questions.When I was doing a lot of hiring we offered the option (don’t roast me, it was an alternative they could choose if they wanted) of a take-home problem they could do on their own. It was reasonably short, like the kind of problem an experienced developer could do in 10-15 minutes and then add some polish, documentation, and submit it in under an hour.Even though I told candidates that we’d discuss their submission as part of the next step, we would still get candidates submitting solutions that seemed entirely foreign to them a day later. This was on the cusp of LLMs being useful, so I think a lot of solutions were coming from people’s friends or copied from something on the internet without much thought.Now that LLMs are both useful and well known, the temptation to cheat with them is huge. For various reasons I think students and applicants see using LLMs as not-cheating in the same situations where they wouldn’t feel comfortable copying answers from a friend. The idea is that the LLM is an available tool and therefore they should be able to use it. The obvious problem with that argument is that we’re not testing students or applicants on their abilities to use an LLM, we’re using synthetic problems to explore their own skills and communication.Even some of the hiring managers I know who went all in on allowing LLMs during interviews are changing course now. The LLM-assisted interviewed were just turning into an exercise of how familiar the candidate was with the LLM being used.I don’t really agree with some of the techniques they’re using in this article, but the problem they’re facing is very real.
A_Duck
Being interrogated by an AI voice app... I am so grateful I went to university in the before timeIf this is the only way to keep the existing approach working, it feels like the only real solution for education is something radically different, perhaps without assessment at all
philipallstar
> I had prepared thoroughly and felt confident in my understanding of the material, but the intensity of the interviewer's voice during the exam unexpectedly heightened my anxiety and affected my performance. The experience was more triggering than I anticipated, which made it difficult to fully demonstrate my knowledge. Throughout the course, I have actively participated and engaged with the material, and I had hoped to better demonstrate my knowledge in this interview.This sounds as though it was written by an LLM too.
eaglefield
At the price per student it probably makes sense to run some voluntary trial exams during the semester. This would give students a chance to get acquainted to the format, help them check their understanding and if the voice is very intimidating allow them to get used to that as well.As an aside, I'm surprised oral exams aren't possible at 36 students. I feel like I've taken plenty of courses with more participants and oral exams. But the break even point is probably very different from country to country.
YakBizzarro
I seriously don't get it. At my time in university, ALL the exams were oral. And most had one or two written parts before (one even three, the professor called it written-for-the-oral). Sure, the orals took two days for the big exams at the beginning, still, professors and their assistants managed to offer six sessions per year.
semilin
This seems like a mistake. On the one hand, other commenters' experiences provide additional evidence that oral communication is a vastly different skill from the written word and ought to be emphasized more in education. Even if a student truly understands a concept, they might struggle at talking about it in a realtime context. For many real-world cases, this is unacceptable. Therefore the skill needs to be taught.On the other hand, can an AI exam really simulate the conditions necessary for improving at this skill? I think this is unlikely. The students' responses indicate not a general lack of expertise in oral communication but also a discomfort with this particular environment. While the author is making steps to improve the environment, I think it is fundamentally too different from actual human-to-human discussion to test a student's ability in oral communication. Even if a student could learn to succeed in this environment, it won't produce much improvement in their real world ability.But maybe that's not the goal, and it's simply to test understanding. Well, as other commenters have stated, this seems trivially cheatable. So it neither succeeds at improving one's ability in oral communication nor at testing understanding. Other solutions have to be thought of.
acbart
I have a lot of complicated feelings and thoughts about this, but one thing that immediately jumps to my mind: was the IRB (Institutional Review Board) consulted on this experiment? If so, I would love to know more details about the protocol used. If not, then yikes!
viccis
>0.42 USD per student (15 USD total)Reminder: This professor's school costs $90k a year, with over $200k total cost to get an MBA. If that tuition isn't going down because the professor cut corners to do an oral exam of ~35 students for literally less than a dollar each, then this is nothing more than a professor valuing getting to slack off higher than they value your education.>And here is the delicious part: you can give the whole setup to the students and let them prepare for the exam by practicing it multiple times. Unlike traditional exams, where leaked questions are a disaster, here the questions are generated fresh each time. The more you practice, the better you get. That is... actually how learning is supposed to work.No, students are supposed to learn the material and have an exam that fairly evaluates this. Anyone who has spent time on those old terrible online physics coursework sites like Mastering Physics understands that grinding away practicing exams doesn't improve your understanding of the material; it just improves your ability to pass the arbitrary evaluation criteria. It's the same with practicing leetcode before interviews. Doing yet another dynamic programming practice problem doesn't really make you a better SWE.Minmaxing grades and other external rewards is how we got to the place we're at now. Please stop enshittifying education further.
TehShrike
My ability to recall and express things that I have learned is different when writing versus speaking. I suspect this is true for others as well.I would prefer to write responses to textual questions rather than respond verbally to spoken questions in most cases.
rpcope1
Oral quals were OK and even kind of fun with faculty who I knew and who knew me especially in the context of grad school where it was more a "we know you know this but want to watch you think and haze you a little bit". Having an AI do it's poor simulacrum of this sounds like absolute hell on earth and I can't believe this person thinks it's a good idea.
Yossarrian22
I predict by the very next semester students still be weaponizing Reasonable Accommodation requests against any further attempts at this
phren0logy
I had plenty of oral exams throughout my education and training. It's interesting to see their resurgence, and easy to understand the appeal. If they can be done rigorously and fairly (no easy thing), then they go much further than multiple can in demonstrating understanding of concepts. But, they are inherently more stressful. I agree with the article that the increased pressure is a feature, not a bug. It's much more real-world for many kinds of knowledge.
Levitz
Humanization and responsibility issues aside (I worry that the author seems to validate AIs judgement with no second thought) education is one sector which isn't talked about enough in terms of possible progress with AI.Ask about any teacher, scalability is a serious issue. Students being in classes above and under their level is a serious issue. non-interactive learning, leading to rote memorization, as a result of having to choose scaling methods of learning is a serious issue. All these can be adjusted to a personal level through AI, it's trivial to do so, even.I'm definitely not sold on the idea of oral exams through AI though. I don't even see the point, exams themselves are specifically an analysis of knowledge at one point in time. Far from ideal, we just never got anything better, how else can you measure a student's worth?Well, now you could just run all of that student's activity in class through that AI. In the real world you don't know if someone is competent because you run an exam, you know if he is competent because he consistently shows competency. Exams are a proxy for that, you can't have a teacher looking at a student 24/7 to see they know their stuff, except now you can gather the data and parse it, what do I care if a student performs 10 exercises poorly in a specific day at a specific time if they have shown they can do perfectly well, as can be ascertained by their performance the past week?
bagrow
If you can use AI agents to give exams, what is stopping you from using them to teach the whole course?Also, with all the progress in video gen, what does recording the webcam really do?
owenbrown
A regular paper and pencil exam would be a better experience for the students.
CuriouslyC
Just let students use whatever tool they want and make them compete for top grades. Distribution curving is already normal in education. If an AI answer is the grading floor, whatever they add will be visible signal. People who just copy and paste a lame prompt will rank at the bottom and fail without any cheating gymnastics. Plus this is more like how people work.https://sibylline.dev/articles/2025-12-31-how-agent-evals-ca...
alwa
> We can publish exactly how the exam works—the structure, the skills being tested, the types of questions. No surprises. The LLM will pick the specific questions live, and the student will have to handle them.I wonder: with a structure like this, it seems feasible to make the LLM exam itself available ahead of time, in its full authentic form.They say the topic randomization is happening in code, and that this whole thing costs 42¢ per student. Would there be drawbacks to offering more-or-less unlimited practice runs until the student decides they’re ready for the round that counts?I guess the extra opportunities might allow an enterprising student to find a way to game the exam, but vulnerabilities are something you’d want to fix anyway…
schainks
My Italian friends went through only oral exams in high school and it worked very well for them.The key implementation detail to me is that the whole class is sitting in on your exam (not super scalable, sure) so you are literally proving to your friends you aren’t full of shit when doing an exam.
owenbrown
+ would be a much better experience for the students.
anon
undefined
gaborcselle
Curious why the setup had 3 different LLMs?
EdNutting
I wrote a related thought piece recently on the return of oral vivas. But damn, I didn’t anticipate someone doing them using voice apps and LLMs. That’s completely fucked up.https://ednutting.com/2025/11/25/return-of-the-viva.html
Wowfunhappy
...if I was a student, I just fundamentally don't think I'd want to be tested by an AI. I understand the author's reasoning, but it just doesn't feel respectful for something that is so high-stakes for the student.Wouldn't a written exam--or even a digital one, taken in class on school-provided machines--be almost as good?As long as it's not a hundred person class or something, you can also have an oral component taken in small groups.
dvh
Students cheat when grades are more valuable than knowledge.
cryptonector
Is there an evaluation of how good the questioning was? Did TFA review the transcripts for that? Did I miss it?> The grading was stricter than my own default. That's not a bug. Students will be evaluated outside the university, and the world is not known for grade inflation.Good!> 83% of students found the oral exam framework more stressful than a written exam.That's alright -- that's how life goes. This reminds me of a history teacher I had in middle school who told us how oral exams were done at the university he had studied in: in class, each student would come up to the front, pick three topics at random from a lottery-ball-picker type setup, and then they'd have a few minutes in which to explain how all three are related. I would think that would be stressful except to those who enjoy the topic (in this case: history) and mastered the material.> Accessibility defaults. Offer practice runs, allow extra time, and provide alternatives when voice interaction creates unnecessary barriers.Yes, obviously this won't work for deaf students. But why must it be an oral examination anyways? In the real world (see above example) you can't cheat at an oral examination because you're physically present, with no cheat sheets, just you, and you have to answer in real time. But these are "take-at-home" oral exams, so they had to add a requirement of audio/video recording to restore the value of the "physically present" part of old-school oral exams -- if you could do something like that for written exams, surely you would?Clearly a take-home written exam would be prone to cheating even with a real-time AI examiner, but the real-time requirement might be good enough in many cases, and probably always for in-class exams.Oh, that brings me to: TFA does not explicitly say it, but it strongly implies that these oral exams were take-at-home exams! This is a very important detail. Obviously the students couldn't do concurrent oral exams in class, not unless they were all wearing high quality headsets (and even then). The exams could have been in school facilities with one student present at a time, but that would have taken a lot of time and would not have required that the student provide webcam+audio recordings -- the school would have performed those recordings themselves.My bottom-line take: you can have a per-student AI examiner, and this is more important than the exam being oral, as long as you can prevent cheating where the exam is not oral.PS: A sample of FakeFoster would have been nice. I found videos online of Foster Provost speaking, but it's hard to tell from those how intimidating FakeFoster might have been.
baq
It's dehumanizing to be grilled by AI, whether it is a job interview or a university exam....but OTOH if cheating is so easy it's impossible to resist and when everyone cheats honest students are the ones getting all the bad grades, what else can you do?
neilv
Instead of funneling more business/hype to the AI bro industry, to police the AI bro industry that fully expected this effect from their cheating-on-your-homework/plagiarism services (oh, I see this is a business school)...First, the business school administration and faculty firmly commits, that plagiarism, including with AI, means prompt dismissal.Then, the first time you have a suspicion of plagiarism, you investigate.After the first student of a class year is found guilty, and smacked to curb, all the other students will know, and I bet your problem is mostly solved for that class year.Then, one coked-up nepo baby sociopath will think they are too smart or meritorious to "fail" by getting caught. Bam! Smacked to the curb.Then one of those two will try sue, and the university PR professionals will laugh at them, for putting their name in the news as someone who got kicked out of business school for cheating. The business school will take this opportunity to bolster their reputation for excellence.At this point, it will become standard advice for the subsequent class years, that cheating at this school is something only an idiot loser does, not a winner MBA.
throwaway81523
Great, so we'll see chatbots taking the exams that are administered by other chatbots. Sorry but this whole scheme is mega cringe.