GitHub Will Train AI Models on Copilot User Interactions by Default Starting April 24, Sparking Developer Backlash

Overview

GitHub announced on March 25 that it will begin using interaction data from Copilot Free, Pro, and Pro+ users to train and improve its AI models, effective April 24, 2026. The policy is opt-out by default, meaning developers must actively disable the setting to prevent their code snippets, prompts, and surrounding context from feeding into model training. Copilot Business and Enterprise customers are excluded, as are students and teachers enrolled in GitHub Education, according to the company’s changelog post.

The announcement has drawn immediate and overwhelmingly negative reactions from the developer community, with many calling it a betrayal of earlier commitments that Copilot would not use individual user data for training.

What Data Is Collected

The scope of data collection is broad. According to GitHub’s policy update blog post, the program collects accepted or modified code outputs, inputs sent to Copilot including code snippets, code context surrounding cursor position, comments and documentation, file names and repository structure, interaction patterns with Copilot features, and user feedback such as thumbs-up and thumbs-down ratings on suggestions.

GitHub’s Chief Product Officer Mario Rodriguez justified the change by citing internal data: “Real-world data from Microsoft employees showed meaningful improvements, including increased acceptance rates in multiple languages,” according to the policy update announcement.

The Private Repository Question

Perhaps the most contentious aspect concerns private repositories. GitHub states that stored private repository contents “at rest” are not used for training. However, the company’s FAQ clarifies that code snippets from private repositories can be collected for model training when users actively engage Copilot, as The Register reported. This distinction effectively redefines what “private” means on the platform: the repository itself is not crawled, but any code a developer feeds through Copilot while working in a private repository is fair game unless the user has opted out.

Data collected under this program may be shared with GitHub affiliates, including parent company Microsoft, though GitHub has committed to not sharing inputs or outputs with third-party AI model providers for their own independent training, according to the updated privacy statement.

Developer Reaction

The community response has been sharply negative. A GitHub community discussion thread about the policy received 102 upvotes on a comment demanding opt-in rather than opt-out consent, while the announcement post itself drew 59 thumbs-down reactions compared to just three positive ones, The Register noted. Among 39 comments at the time of reporting, only Martin Woodward, GitHub’s VP of developer relations, endorsed the initiative.

Developers raised specific concerns about intellectual property exposure for contractors working with proprietary client codebases, the inability to control data collection on a per-repository basis, and the risk that forked repository code from opted-out users could still reach training pipelines if the fork owner has not disabled the setting. Several developers mentioned evaluating alternatives including Codeberg, GitLab, and self-hosted Gitea instances.

How to Opt Out

Developers who wish to prevent their interaction data from being used can navigate to their GitHub settings at /settings/copilot/features under the Privacy section. Users who previously opted out of data collection for product improvements will retain that preference automatically, according to the policy update. The opt-out toggle is not currently accessible through GitHub’s mobile app.

What We Don’t Know

Several questions remain unanswered. It is unclear how GitHub will handle data already collected between April 24 and whenever a user discovers the setting and opts out. The company has not specified whether previously collected interaction data can be retroactively deleted from training datasets, nor has it addressed how the policy interacts with data protection regulations like the GDPR, which generally requires affirmative consent for data processing beyond the original service agreement. The distinction between an opt-out model described by GitHub as based on “US norms” and the stricter opt-in requirements typical in European jurisdictions could invite regulatory scrutiny, as The Register observed.

The broader context is difficult to ignore: GitHub Copilot’s underlying models were originally fine-tuned on publicly available code hosted on GitHub, a practice that itself drew legal challenges. This latest policy change extends the data pipeline from public repositories to active developer sessions, including those involving private code.