<- Back
Comments (282)
- ChrisArchitect[dupe] Discussion on source: https://news.ycombinator.com/item?id=47521799
- kepanoI've been saying this since 2023> If your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resisthttps://news.ycombinator.com/item?id=37124188
- martinwoodwardNo we won’t. Details here https://github.blog/news-insights/company-news/updates-to-gi...For users of Free, Pro and Pro+ Copilot, if you don’t opt out then we will start collecting usage data of Copilot for use in model training.If you are a subscriber for Business or Pro we do not train on usage.The blog post covers more details but we do not train on private repo data at rest, just interaction data with Copilot. If you don’t use Copilot this will not affect you. However you can still opt out now if you wish and that preference will be retained if you decide to start using Copilot in the future.Hope that helps.
- landl0rdThis headline is false; it will not go take your private repos and dump them into a training dataset. Rather, GitHub will train on your copilot interactions with your private repos. If you do not use copilot, this makes no difference to you, though you should probably still turn it off.
- lanxevo3To be precise: the opt-out is for GitHub Copilot training specifically, which has always required opt-in for public repos under their policy. The change Apr 24 is about private repos being included by default unless you opt out. If you're using Copilot in your private repos, definitely opt out unless you're comfortable with that. The setting is at github.com/settings/copilot — takes 30 seconds.
- ubermanIf even one person in a repo does not disable this will copilot have full access to the repo? How can I determine if other members of my team have turned this off or not?
- munk-aThe only setting I'm seeing is on a per-user basis. Does anyone know how to blanket disable training on an organizational basis?Is there any information about how much information from an organization managed repo may be trained on if an individual user has this flag enabled? Will one leaky account cause all of our source code to be considered fair game?
- hedayetTo Github's credit, they have been showing a banner consistently. To my discredit - I never bothered to read that banner until I saw this HN headline
- parsimo2010Jokes on them, my private repos are total dog dookie. If nobody but me can see the code then I don't have to worry about style, structure, comments, or any other best practices.You don't want an LLM trained on my private repos. Trust me.
- SunshineTheCatRIP all the people who have been paying Github for years and never happen to see the notice.
- mxtbccagmailcomTime to put adversarial code into GitHub to pollute the training set?
- w10-1https://github.com/settings/copilot/featuresThe feature to opt out is at the bottom under privacy: "Allow GitHub to use my data for AI model training"TIL: you cannot opt out of a copilot-pro subscription. How is it a subscription if I can't cancel?(Honestly, who has time to evade all these traps? Or to migrate 150+ repo's on 6+ machines...)
- sedatkI have an individual GitHub Copilot Pro subscription and also am a member of an Enterprise account that has one of its GitHub Copilot Business seats assigned to me. The opt-out setting doesn't appear on my individual profile anymore. However, I want to be able to use individual GitHub Copilot subscription for my individual work, and it seems like I can't do it anymore as Enterprise has taken over all my preferences. What a mess.
- maplethorpeWhat's the best way to poison my repos to sabotage LLM training? Asking for a friend.
- kristianpWhat's a good alternative for free private repos?
- prmoustacheWhile I understand the network effect of github for public project, I don't really understand why one would want to use it for private repos.There are tons of git providers including free ones that include full gitlab/gitea/forgejo to get similar features to github and there is nothing more easy to self host or host on a vps with near zero maintenance.
- _pdp_Rather than defending this absurd decision, GitHub could instantly win back trust by admitting they f*** up and reversing it entirely.If they want to incentivise people to contribute their sources and copilot sessions, they could easily make it opt-in on a per-repository basis and provide some incentive, like an increased token quota.This is not hard.
- GMoromisatoI'm sure this is just me, but I don't mind if AI trains on my public or private repos. I suspect my imagination is just not good enough to come up with downsides.So far it's been a benefit because coding agents seems to understand my code and can follow my style.I don't store client data (much less credentials) in my repos (public or private) so I'm not worried about data leaks. And I don't expect any of my clients to decide to replace me and vibe code their way to a solution.I do worry (slightly) about large company competitors using AI to lower their prices and compete with me, but that's going to happen regardless of whether anyone trains on my code. And my own increases in efficiency due to AI have made up for that.
- Esophagus4There’s a lot of furor in this thread, but people felt the same way when Google Street View came out. Eventually they worked through most of the thorny bits and people use Street View now.I suspect MSFT is in a similar spot. If they don’t train on more data, they’ll be left behind by Anthropic/OAI. If they do, they’ll annoy a few diehards for a while, they’ll work through the kinks, then everyone will get used to it.
- jacameraLots of hair splitting in the comments. The service is so unreliable at this point that I don’t trust them to not train on private repos even accidentally. You’re one vibe-coded PR away from having all your data scooped up regardless of any policy or intention.
- jmward01They just lost my repos. I can not believe they snuck this in. My level of anger right now is far higher that I ever wanted to feel. I went to API access for anthropic, paying more in the process, to avoid them training on my code. And GH just -adds- this, without telling me? Without a prompt. They are dead to me.
- endofreachHow did people forget that github was purchased by that one company?
- bonestamp2Thanks for the heads up, I assumed they had already done this with my data.
- yonatan8070How do I opt out of this for my own private repos? I don't see anything related to this as I've got a ton of settings for Copilot itself (I have access to Copilot through my work org)
- bszaI've been encrypting my private git repos for a while because I had suspected they were going to do something like this.https://github.com/flolu/git-gcryptIt's very easy to set up and integrates nicely into git. Obviously only works if you don't need Actions or anything that requires Github to know what's in your repo (duh).
- JonChesterfieldDon't give your code to Microsoft if you don't want them to have your code.This setting will make no difference to whether your code is fed into their training set. "Oops we accidentally ignored the private flag years ago and didn't realise, we are very sorry, we were trying to not do that".
- mrledI'm curious about specific consequences of this. I tend to think the importance of code secrecy has always been exaggerated (there are specific exceptions like hedge fund strategies and malware), even more so now in this post-Claude world. Does anyone have specific things they're trying to avoid by opting out of this?
- rrgokI'm gonna put a license fee on all my repos. 10% of revenue if my private repos have been used for AI training. 5% on all my other repos.
- roegerleDo people not browse GitHub? All I’m reading is “I’m never at the web ui”.I love falling into a rabbit hole looking at people’s projects
- bolangiHah, github can have my crap code. Anyone trained on it will be in for a world of hurt :-)
- kace91How's the codeberg experience nowadays? I think it's finally time to switch for me.
- wilsonjholmesAt least they are finally being honest about the direction of the business. I have thought for a long while that they were already doing this and just not telling anyone...
- rakel_rakelI'm looking forward to the class action lawsuit, even if only to establish a precedent!I don't have much hope, but I wish that ignoring software licensing and attribution at scale becomes harder than it currently seems.
- sethops1When Louis Rossmann started describing tech leadership as having a "rapist mentality" I brushed him off as being sensationalist. But actions like this make me think more and more he's right. The product managers pushing for changes like this are despicable scum.
- shamelessdevThis is the exact reason I vibe coded “artifact”.Not for commercial success, just wanted a git and github like experience for my new game project.Then I started getting into features specific to game dev like moving away from LFS and properly diffing binaries.paganartifact.com/benny/artifactMirror: GitHub bennyschmidt/artifact
- UhhrrrPut an ORM in your private repo which randomly 1% of the time calls DROP TABLE.
- maxlohContext: https://github.com/orgs/community/discussions/188488TLDR: As long as you aren't using Copilot, your code should be safe (according to GitHub). What data are you collecting? When an individual user has this setting enabled, the interaction data we may collect includes: - Outputs accepted or modified by the user - Inputs sent to GitHub Copilot, including code snippets shown to the model - Code context surrounding the user’s cursor position - Comment and documentation that the user wrote - File names, repository structure, and navigation patterns - Interactions with Copilot features including Chat and inline suggestions
- VladVladikoffThe most shocking part of this news to me is that they aren’t doing this already.
- hiltiOh - they didn't train silently already?! ;-) Going to move my repositories then next week.
- jokoonweren't they already using repos for training?
- jollyllamaIt's not clear to me what happens to personal repos if you're getting Copilot for work, or where to disable it there.
- jambuttersWhere does it say it will train on private? This seems like a security nightmare if it trains on hardcoded keys
- frizlabIs there a way to disable training on repositories that are in organizations?
- dalemhurleyAt least they are giving you the option to opt out, many other providers just trained on the source code.
- piekvorstPersonally, I don’t mind. Train however you want.
- hexage1814If you opt out... they will also train on your private repos.
- tartoranIf you opt out Github will probably still train on your private repo. Just migrate.
- totierne2There is always other peoples ftp servers as Linus used to say.
- mondainxGet ready for some dope code... ;)
- woodylondonjokes on them - all the code in all my repos are written by AI :)
- Sohcahtoa82I wonder how effective it would be to sabotage the training by publishing deliberately bad code. A FizzBuzz with O(n^2) complexity. A function named "quicksort" that actually implements bogosort. A "filter_xss" function that's a no-op or just does something else entirely.The possibilities are endless. I thought of this after remembering seeing a post a couple months ago about how it doesn't take a significant amount of bad data to poison an LLM's training.
- victorbjorklundThanks for the heads up.
- mxtbccagmailcomTime to place some adversarial code into GitHub to pollute training set?
- pokot0while I agree, I understood this is only when you use copilot? if not, their communication is very misleading
- livinglistThanks for posting this, I was never made aware of this by GitHub..
- i7lThanks for flagging this!
- yakbarbertrain on my private code? jokes on them
- anonundefined
- daft_pinkis there an easy way to shift all your repos to gitlab or to private if you don’t use ci/etc?
- holodukeFor 5 bucks you can host your own gitea with most GitHub functionally. I moved my 500 repos to it. Actions are working perfectly fine. I make daily snapshots on hetzner. Trust them for that backup part.
- harikbThe UI options are also shady af. The setting readsEnabled - "You will have access to this feature" as help text. Disabled - "You will not have access to this feature".WTF does that mean?
- contingenciesThank you.
- ljmNever have I seen a company try so damn hard to make something a thing than Microsoft and Copilot.And it is absolute dogshit. And offensive to actual copilots.
- gafferongamesIf you guys didn't already realize that Microsoft was a garbage company in the 90s I really don't know what to say...
- leej111Based
- jpcrsGood luck to them, my private repos are probably some of the worst code humanity has produced.
- anonundefined
- AncalagonThis is the worst year of enshittification I can recall. Literally everything is going to shit.
- AndrewKemendoI started self hosting my own git on a digital ocean droplet with Gitea (1). It’s been unbelievably fantastic and trivially easy to manage experience and I can make them public and invite contrib ans do integrations … I see zero downsidesI see no reason to ever go back to holding my code elsewhere.Don’t forget git is fairly newWhen I first started doing production code it was pre-github so we used some other kind of repo management systemThis is a perfect example of where the they’re starting to cannibalize their base and now we have the ability to get away from them entirely.(1) https://about.gitea.com/
- moralestapiaIs this the case even if you're a paid customer?If so, this might be illegal.
- apiNot your storage, not your data (unless it's encrypted with keys you control).
- nitrogen99So? It’s not like some human is spying on your private emails or chats. This is just code. Relax.
- shevy-javaMicroslop tries to make money off of our data on github. Not a big surprise though.
- 13415It is the feature "Allow GitHub to use my data for AI model training" that needs to be disabled. Right?Or am I missing some trick / dark GUI pattern? Just want to make sure.
- uwagarwhy all u programmers cant make ur own website and host ur own git servers?
- jongjongWow. This is theft. Should be illegal! It's like if I own a vault storage business and I am keeping other people's gold in my vaults and then I just take all the gold for myself and claim that the customers should have opted out of me stealing their gold but they missed the deadline...
- starkeeperSo now CoPilot will be EVEN better at writing viruses, worms and malware!
- tantalor"Don't touch my garbage!"
- bdangubicThat training will be like “OMG this is horrible… WAIT I wrote this shit”
- aplomb1026[dead]
- maltyxxx[flagged]
- seankwon816[dead]
- sholladay[dead]
- rcdwealth[dead]
- hachimanbest[dead]
- hachimanbest[dead]
- shell0xShouldn’t this be “Tell HN”?