Need help?
<- Back

Comments (282)

  • ChrisArchitect
  • kepano
    I've been saying this since 2023> If your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resisthttps://news.ycombinator.com/item?id=37124188
  • martinwoodward
    No we won’t. Details here https://github.blog/news-insights/company-news/updates-to-gi...For users of Free, Pro and Pro+ Copilot, if you don’t opt out then we will start collecting usage data of Copilot for use in model training.If you are a subscriber for Business or Pro we do not train on usage.The blog post covers more details but we do not train on private repo data at rest, just interaction data with Copilot. If you don’t use Copilot this will not affect you. However you can still opt out now if you wish and that preference will be retained if you decide to start using Copilot in the future.Hope that helps.
  • landl0rd
    This headline is false; it will not go take your private repos and dump them into a training dataset. Rather, GitHub will train on your copilot interactions with your private repos. If you do not use copilot, this makes no difference to you, though you should probably still turn it off.
  • lanxevo3
    To be precise: the opt-out is for GitHub Copilot training specifically, which has always required opt-in for public repos under their policy. The change Apr 24 is about private repos being included by default unless you opt out. If you're using Copilot in your private repos, definitely opt out unless you're comfortable with that. The setting is at github.com/settings/copilot — takes 30 seconds.
  • uberman
    If even one person in a repo does not disable this will copilot have full access to the repo? How can I determine if other members of my team have turned this off or not?
  • munk-a
    The only setting I'm seeing is on a per-user basis. Does anyone know how to blanket disable training on an organizational basis?Is there any information about how much information from an organization managed repo may be trained on if an individual user has this flag enabled? Will one leaky account cause all of our source code to be considered fair game?
  • hedayet
    To Github's credit, they have been showing a banner consistently. To my discredit - I never bothered to read that banner until I saw this HN headline
  • parsimo2010
    Jokes on them, my private repos are total dog dookie. If nobody but me can see the code then I don't have to worry about style, structure, comments, or any other best practices.You don't want an LLM trained on my private repos. Trust me.
  • SunshineTheCat
    RIP all the people who have been paying Github for years and never happen to see the notice.
  • mxtbccagmailcom
    Time to put adversarial code into GitHub to pollute the training set?
  • w10-1
    https://github.com/settings/copilot/featuresThe feature to opt out is at the bottom under privacy: "Allow GitHub to use my data for AI model training"TIL: you cannot opt out of a copilot-pro subscription. How is it a subscription if I can't cancel?(Honestly, who has time to evade all these traps? Or to migrate 150+ repo's on 6+ machines...)
  • sedatk
    I have an individual GitHub Copilot Pro subscription and also am a member of an Enterprise account that has one of its GitHub Copilot Business seats assigned to me. The opt-out setting doesn't appear on my individual profile anymore. However, I want to be able to use individual GitHub Copilot subscription for my individual work, and it seems like I can't do it anymore as Enterprise has taken over all my preferences. What a mess.
  • maplethorpe
    What's the best way to poison my repos to sabotage LLM training? Asking for a friend.
  • kristianp
    What's a good alternative for free private repos?
  • prmoustache
    While I understand the network effect of github for public project, I don't really understand why one would want to use it for private repos.There are tons of git providers including free ones that include full gitlab/gitea/forgejo to get similar features to github and there is nothing more easy to self host or host on a vps with near zero maintenance.
  • _pdp_
    Rather than defending this absurd decision, GitHub could instantly win back trust by admitting they f*** up and reversing it entirely.If they want to incentivise people to contribute their sources and copilot sessions, they could easily make it opt-in on a per-repository basis and provide some incentive, like an increased token quota.This is not hard.
  • GMoromisato
    I'm sure this is just me, but I don't mind if AI trains on my public or private repos. I suspect my imagination is just not good enough to come up with downsides.So far it's been a benefit because coding agents seems to understand my code and can follow my style.I don't store client data (much less credentials) in my repos (public or private) so I'm not worried about data leaks. And I don't expect any of my clients to decide to replace me and vibe code their way to a solution.I do worry (slightly) about large company competitors using AI to lower their prices and compete with me, but that's going to happen regardless of whether anyone trains on my code. And my own increases in efficiency due to AI have made up for that.
  • Esophagus4
    There’s a lot of furor in this thread, but people felt the same way when Google Street View came out. Eventually they worked through most of the thorny bits and people use Street View now.I suspect MSFT is in a similar spot. If they don’t train on more data, they’ll be left behind by Anthropic/OAI. If they do, they’ll annoy a few diehards for a while, they’ll work through the kinks, then everyone will get used to it.
  • jacamera
    Lots of hair splitting in the comments. The service is so unreliable at this point that I don’t trust them to not train on private repos even accidentally. You’re one vibe-coded PR away from having all your data scooped up regardless of any policy or intention.
  • jmward01
    They just lost my repos. I can not believe they snuck this in. My level of anger right now is far higher that I ever wanted to feel. I went to API access for anthropic, paying more in the process, to avoid them training on my code. And GH just -adds- this, without telling me? Without a prompt. They are dead to me.
  • endofreach
    How did people forget that github was purchased by that one company?
  • bonestamp2
    Thanks for the heads up, I assumed they had already done this with my data.
  • yonatan8070
    How do I opt out of this for my own private repos? I don't see anything related to this as I've got a ton of settings for Copilot itself (I have access to Copilot through my work org)
  • bsza
    I've been encrypting my private git repos for a while because I had suspected they were going to do something like this.https://github.com/flolu/git-gcryptIt's very easy to set up and integrates nicely into git. Obviously only works if you don't need Actions or anything that requires Github to know what's in your repo (duh).
  • JonChesterfield
    Don't give your code to Microsoft if you don't want them to have your code.This setting will make no difference to whether your code is fed into their training set. "Oops we accidentally ignored the private flag years ago and didn't realise, we are very sorry, we were trying to not do that".
  • mrled
    I'm curious about specific consequences of this. I tend to think the importance of code secrecy has always been exaggerated (there are specific exceptions like hedge fund strategies and malware), even more so now in this post-Claude world. Does anyone have specific things they're trying to avoid by opting out of this?
  • rrgok
    I'm gonna put a license fee on all my repos. 10% of revenue if my private repos have been used for AI training. 5% on all my other repos.
  • roegerle
    Do people not browse GitHub? All I’m reading is “I’m never at the web ui”.I love falling into a rabbit hole looking at people’s projects
  • bolangi
    Hah, github can have my crap code. Anyone trained on it will be in for a world of hurt :-)
  • kace91
    How's the codeberg experience nowadays? I think it's finally time to switch for me.
  • wilsonjholmes
    At least they are finally being honest about the direction of the business. I have thought for a long while that they were already doing this and just not telling anyone...
  • rakel_rakel
    I'm looking forward to the class action lawsuit, even if only to establish a precedent!I don't have much hope, but I wish that ignoring software licensing and attribution at scale becomes harder than it currently seems.
  • sethops1
    When Louis Rossmann started describing tech leadership as having a "rapist mentality" I brushed him off as being sensationalist. But actions like this make me think more and more he's right. The product managers pushing for changes like this are despicable scum.
  • shamelessdev
    This is the exact reason I vibe coded “artifact”.Not for commercial success, just wanted a git and github like experience for my new game project.Then I started getting into features specific to game dev like moving away from LFS and properly diffing binaries.paganartifact.com/benny/artifactMirror: GitHub bennyschmidt/artifact
  • Uhhrrr
    Put an ORM in your private repo which randomly 1% of the time calls DROP TABLE.
  • maxloh
    Context: https://github.com/orgs/community/discussions/188488TLDR: As long as you aren't using Copilot, your code should be safe (according to GitHub). What data are you collecting? When an individual user has this setting enabled, the interaction data we may collect includes: - Outputs accepted or modified by the user - Inputs sent to GitHub Copilot, including code snippets shown to the model - Code context surrounding the user’s cursor position - Comment and documentation that the user wrote - File names, repository structure, and navigation patterns - Interactions with Copilot features including Chat and inline suggestions
  • VladVladikoff
    The most shocking part of this news to me is that they aren’t doing this already.
  • hilti
    Oh - they didn't train silently already?! ;-) Going to move my repositories then next week.
  • jokoon
    weren't they already using repos for training?
  • jollyllama
    It's not clear to me what happens to personal repos if you're getting Copilot for work, or where to disable it there.
  • jambutters
    Where does it say it will train on private? This seems like a security nightmare if it trains on hardcoded keys
  • frizlab
    Is there a way to disable training on repositories that are in organizations?
  • dalemhurley
    At least they are giving you the option to opt out, many other providers just trained on the source code.
  • piekvorst
    Personally, I don’t mind. Train however you want.
  • hexage1814
    If you opt out... they will also train on your private repos.
  • tartoran
    If you opt out Github will probably still train on your private repo. Just migrate.
  • totierne2
    There is always other peoples ftp servers as Linus used to say.
  • mondainx
    Get ready for some dope code... ;)
  • woodylondon
    jokes on them - all the code in all my repos are written by AI :)
  • Sohcahtoa82
    I wonder how effective it would be to sabotage the training by publishing deliberately bad code. A FizzBuzz with O(n^2) complexity. A function named "quicksort" that actually implements bogosort. A "filter_xss" function that's a no-op or just does something else entirely.The possibilities are endless. I thought of this after remembering seeing a post a couple months ago about how it doesn't take a significant amount of bad data to poison an LLM's training.
  • victorbjorklund
    Thanks for the heads up.
  • mxtbccagmailcom
    Time to place some adversarial code into GitHub to pollute training set?
  • pokot0
    while I agree, I understood this is only when you use copilot? if not, their communication is very misleading
  • livinglist
    Thanks for posting this, I was never made aware of this by GitHub..
  • i7l
    Thanks for flagging this!
  • yakbarber
    train on my private code? jokes on them
  • anon
    undefined
  • daft_pink
    is there an easy way to shift all your repos to gitlab or to private if you don’t use ci/etc?
  • holoduke
    For 5 bucks you can host your own gitea with most GitHub functionally. I moved my 500 repos to it. Actions are working perfectly fine. I make daily snapshots on hetzner. Trust them for that backup part.
  • harikb
    The UI options are also shady af. The setting readsEnabled - "You will have access to this feature" as help text. Disabled - "You will not have access to this feature".WTF does that mean?
  • contingencies
    Thank you.
  • ljm
    Never have I seen a company try so damn hard to make something a thing than Microsoft and Copilot.And it is absolute dogshit. And offensive to actual copilots.
  • gafferongames
    If you guys didn't already realize that Microsoft was a garbage company in the 90s I really don't know what to say...
  • leej111
    Based
  • jpcrs
    Good luck to them, my private repos are probably some of the worst code humanity has produced.
  • anon
    undefined
  • Ancalagon
    This is the worst year of enshittification I can recall. Literally everything is going to shit.
  • AndrewKemendo
    I started self hosting my own git on a digital ocean droplet with Gitea (1). It’s been unbelievably fantastic and trivially easy to manage experience and I can make them public and invite contrib ans do integrations … I see zero downsidesI see no reason to ever go back to holding my code elsewhere.Don’t forget git is fairly newWhen I first started doing production code it was pre-github so we used some other kind of repo management systemThis is a perfect example of where the they’re starting to cannibalize their base and now we have the ability to get away from them entirely.(1) https://about.gitea.com/
  • moralestapia
    Is this the case even if you're a paid customer?If so, this might be illegal.
  • api
    Not your storage, not your data (unless it's encrypted with keys you control).
  • nitrogen99
    So? It’s not like some human is spying on your private emails or chats. This is just code. Relax.
  • shevy-java
    Microslop tries to make money off of our data on github. Not a big surprise though.
  • 13415
    It is the feature "Allow GitHub to use my data for AI model training" that needs to be disabled. Right?Or am I missing some trick / dark GUI pattern? Just want to make sure.
  • uwagar
    why all u programmers cant make ur own website and host ur own git servers?
  • jongjong
    Wow. This is theft. Should be illegal! It's like if I own a vault storage business and I am keeping other people's gold in my vaults and then I just take all the gold for myself and claim that the customers should have opted out of me stealing their gold but they missed the deadline...
  • starkeeper
    So now CoPilot will be EVEN better at writing viruses, worms and malware!
  • tantalor
    "Don't touch my garbage!"
  • bdangubic
    That training will be like “OMG this is horrible… WAIT I wrote this shit”
  • aplomb1026
    [dead]
  • maltyxxx
    [flagged]
  • seankwon816
    [dead]
  • sholladay
    [dead]
  • rcdwealth
    [dead]
  • hachimanbest
    [dead]
  • hachimanbest
    [dead]
  • shell0x
    Shouldn’t this be “Tell HN”?