We need a clearer framework for AI-assisted contributions to open source

<- Back

We need a clearer framework for AI-assisted contributions to open source

keybits

Comments (135)

andy99
This is a problem everywhere now, and not just in code. It now takes zero effort to produce something, whether code or a work plan or “deep research” and then lob it over the fence, expecting people to review and act upon it.It’s an extension of the asymmetric bullshit principle IMO, and I think now all workplaces / projects need norms about this.
softwaredoug
Anyone else feel like we're cresting the LLM coding hype curve?Like a recognition that there's value there, but we're passing the frothing-at-the-mouth stage of replacing all software engineers?
r4victor
So far I prefer the Hashimoto's solution to this that "AI tooling must be disclosed for contributions": https://news.ycombinator.com/item?id=44976568I use it like this: If a PR is LLM-generated, you as a maintainer either merge it if it's good or close if it's not. If it's human-written, you may spend some time reviewing the code and iterating on the PR as you used to.Saves your time without discarding LLM PRs completely.
jamesbelchamber
> “I am closing this but this is interesting, head over to our forum/issues to discuss”I really like the way Discourse uses "levels" to slowly open up features as new people interact with the community, and I wonder if GitHub could build in a way of allowing people to only be able to open PRs after a certain amount of interaction, too (for example, you can only raise a large PR if you have spent enough time raising small PRs).This could of course be abused and/or lead to unintended restrictions (e.g. a small change in lots of places), but that's also true of Discourse and it seems to work pretty well regardless.
jrochkind1
Essay is way more interesting than the title, which doesn't actually capture it.
Bengalilol
Shouldn't there be guidelines for open source projects where it is clearly stipulated that code submitted for review must follow the project's code format and conventions?
gwbas1c
> You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.A couple of weeks ago I needed to stuff some binary data into a string, in a way where it wouldn't be corrupted by whitespace changes.I wrote some Rust code to generate the string. After I typed "}" to end the method: 1: Copilot suggested a 100% correct method to parse the string back to binary data, and then 2: Suggested a 100% correct unit test.I read both methods, and they were identical to what I would write. It was as if Copilot could read my brain.BUT: If I relied on Copilot to come up with the serialization form, or even know that it needed to pick something that wouldn't be corrupted by whitespace, it might have picked something completely wrong, that didn't meet what the project needed.
lccerina
I have a framework: don't use it, if you never used it don't start using it, public shame people, stop talking about it. Slow down. Think long and deep about your problems. Write less code.There is NOTHING inevitable about this stuff.
quxbar
If one claims to be able to write good code with LLMs, it should just as easy to write comprehensive e2e tests. If you don't hold your code to a high testing standard than you were always going off 'vibes' whether they were from a silicon neural network or your human meatware biases.
chrischen
Maybe what we need is AI based code review.
anon
undefined
darkwater
The title doesn't make justice to the content.I really liked the paragraph about LLMs being "alien intelligence" > Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison. I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time. Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to. It's a similitude I find compelling. The way they produce code and the way you have to interact with them really feels "alien", and when you start humanizing them, you get emotions when interacting with it and that's not correct. I mean, I do get emotional and frustrated even when good old deterministic programs misbehaved and there was some bug to find and squash or work-around, but the LLM interactions can bring the game to a complete new level. So, we need to remember they are "alien".
prymitive
The problem with AI isn’t new, it’s the same old problem with technology: computers don’t do what you want, only what you tell them. A lot of PRs can be judged by how well they are described and justified, it’s because the code itself isn’t that important, it’s the problem that you are solving with it that is. People are often great at defining problems, AIs less so IMHO. Partially because they simply have no understanding, partially because they over explain everything to a point where you just stop reading, and so you never get to the core of the problem. And even if you do there’s a good chance AI misunderstood the problem and the solution is wrong in a some more or less subtle way. This is further made worse by the sheer overconfidence of AI output, which quickly erodes any trust that they did understand the problem.
gordonhart
> As engineers it is our role to properly label our changes.I've found myself wanting line-level blame for LLMs. If my teammate committed something that was written directly by Claude Code, it's more useful to me to know that than to have the blame assigned to the human through the squash+merge PR process.Ultimately somebody needs to be on the hook. But if my teammate doesn't understand it any better than I do, I'd rather that be explicit and avoid the dance of "you committed it, therefore you own it," which is better in principle than in practice IMO.
bloppe
Maybe we need open source credit scores. PRs from talented engineers with proven track records of high quality contributions would be presumed good enough for review. Unknown, newer contributors could have a size limit on their PRs, with massive PRs rejected automatically.
specproc
A bit of a brutal title for what's a pretty constructive and reasonable article. I like the core: AI-produced contributions are prototypes, belong in branches, and require transparency and commitment as a path to being merged.
Lerc
It is possible that some projects could benefit from triage volunteers?There are plenty of open source projects where it is difficult to get up to speed with the intricacies of the architecture that limits the ability of talented coders to contribute on a small scale.There might be merit in having a channel for AI contributions that casual helpers can assess to see if they pass a minimum threshold before passing on to a project maintainer to assess how the change works within the context of the overall architecture.It would also be fascinating to see how good an AI would be at assessing the quality of a set of AI generated changes absent the instructions that generated them. They may not be able to clearly identify whether the change would work, but can they at least rank a collection of submissions to select the ones most worth looking at?At the very least the pile of PRs count as data of things that people wanted to do, even if the code was completely unusable, placing it into a pile somewhere might be minable for the intentions of erstwhile contributors.
jcgrillo
I guess the main question I'm left with after reading this is "what good is a prototype, then?" In a few of the companies I've worked at there was a quarterly or biannual ritual called "hack week" or "innovation week" or "hackathon" where engineers form small teams and try to bang out a pet project super fast. Sometimes these projects get management's attention, and get "promoted" to a product or feature. Having worked on a few of these "promoted" projects, to the last they were unmitigated disasters. See, "innovation" doesn't come from a single junior engineer's 2AM beer and pizza fueled fever dream. And when you make the mistake of believing otherwise, what seemed like some bright spark's clever little dream turns into a nightmare right quick. The best thing you can do with a prototype is delete it.
andai
>That said, there is a trend among many developers of banning AI. Some go so far as to say “AI not welcome here” find another project.>This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.Isn't that exactly the point? Doesn't this achieve exactly what the whole article is arguing for?A hard "No AI" rule filters out all the slop, and all the actually good stuff (which may or may not have been made with AI) makes it in.When the AI assisted code is indistinguishable from human code, that's mission accomplished, yeah?Although I can see two counterarguments. First, it might just be Covert Slop. Slop that goes under the radar.And second, there might be a lot of baby thrown out with that bathwater. Stuff that was made in conjunction with AI, contains a lot of "obviously AI", but a human did indeed put in the work to review it.I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review? (And a proof of competence, to boot?)
anal_reactor
An idea occurred to me. What if:1. Someone raises a PR2. Entry-level maintainers skim through it and either reject or pass higher up3. If the PR has sufficient quality, the PR gets reviewed by someone who actually has merge permissions
insane_dreamer
related discussion: https://news.ycombinator.com/item?id=45330378
jongjong
2 months ago, after I started using Claude Code on my side project, within the space of days, I went from not allowing a single line of AI code into my codebase to almost 100% AI-written code. It basically codes in my exact style and I know ahead of time what code I expect to see so reviewing is really easy.I cannot justify to myself writing code by hand when there is literally no difference in the output from how I would have done it myself. It might as well be reading my mind, that's what it feels like.For me, vibe coding is essentially a 5x speed increase with no downside. I cannot believe how fast I can churn out features. All the stuff I used to type out by hand now seems impossibly boring. I just don't have the patience to hand-code anymore.I've stuck to vanilla JavaScript because I don't have the patience to wait for the TypeScript transpiler. TS iteration speed is too slow. By the time it finishes transpiling, I can't even remember what I was trying to do. So you bet I don't have the patience to write by hand now. I really need momentum (fast iteration speed) when I code and LLMs provide that.
lapcat
> That said it is a living demo that can help make an idea feel more real. It is also enormously fun. Think of it as a delightful movie set.[pedantry] It bothers me that the photo for "think of prototype PRs as movie sets" is clearly not a movie set but rather the set of the TV show Seinfeld. Anyone who watched the show would immediately recognize Jerry's apartment.
dearilos
We’re fixing this slop problem - engineers write rules that are enforced on PRs. Fixes the problem pretty well so far.
mattlondon
The way we do it is to use AI to review the PR before a human reviewer sees it. Obvious errors, non-consistent patterns, weirdness etc is flagged before it goes any further. "Vibe coded" slop usually gets caught, but "vibe engineered" surgical changes that adhere to common patterns and standards and have tests etc get to be seen by a real live human for their normal review.It's not rocket science.
jcgrillo
> Some go so far as to say “AI not welcome here” find another project.This feels extremely counterproductive and fundamentally unenforceable to me.But it's trivially enforceable. Accept PRs from unverified contributors, look at them for inspiration if you like, but don't ever merge one. It's probably not a satisfying answer, but if you want or need to ensure your project hasn't been infected by AI generated code you need to only accept contributions from people you know and trust.
ninju
Well...just have AI review the PR to have it highlight the slop/s
jmpeax
[flagged]
colesantiago
I wouldn't call it "vibe coded slop" the models are getting way better and I can work with my engineers a lot faster.I am the founder and a product person so it helps in reducing the number of needed engineers at my business. We are currently doing $2.5M ARR and the engineers aren't complaining, in fact it is the opposite, they are actually more productive.We still prioritize architecture planning, testing and having a CI, but code is getting less and less important in our team, so we don't need many engineers.
Toby1VC
Nice jewish word mostly meant to mock. Why would I care what a plugin that I don't even see in use has to say to my face (since I had to read this with all the interpretation potential and receptiveness available). The same kind of inserted judgment that lingers similar to "Yes, I will judge you if you use AI".