A mate of mine put it bluntly: “Whenever Anthropic says they’re vibe coding Claude, they’re also saying out loud ‘we have no rights over this product.’” He’s not wrong. The legal foundations under AI-generated code are shaky in ways most developers haven’t considered.
The paradox is simple. AI-generated code likely can’t be copyrighted by the company that ships it. But it might infringe someone else’s copyright. You get all the liability and none of the protection.
The Training Data Problem
AI coding models were trained on billions of lines of code scraped from GitHub. Much of that code was published under open source licenses: MIT, GPL, Apache, BSD. Those licenses come with conditions. Attribution. Share-alike. Notice preservation.
The models don’t track provenance. When Copilot suggests a function, it doesn’t tell you whether it was synthesized from GPL-licensed code, MIT-licensed code, or something proprietary. The attribution requirements from those licenses? Gone.
This is exactly what Doe v. GitHub alleges. The class action, filed in 2022, claims Copilot strips copyright notices and license terms from its output. Most original claims were dismissed, but two survived: breach of open-source license contract and DMCA Section 1202 violations for removing copyright management information.
The Ninth Circuit is currently deciding whether DMCA violations require an identical copy or whether near-identical output counts. If the bar is set at “substantially similar,” every AI coding assistant is exposed.
GitHub’s own research admits Copilot generates suggestions matching training code in roughly 1% of cases. That sounds small. At enterprise scale, across millions of suggestions per day, it’s a massive surface area for accidental license violations. And traditional software composition analysis tools miss it entirely: they scan declared dependencies, not AI-generated inline code.
The Ownership Void
Even if the training data problem were solved, there’s a more fundamental issue: who owns AI-generated code?
In the US, the answer is increasingly clear: nobody.
The US Copyright Office ruled in January 2025 that AI-generated outputs can only be copyrighted where a human author has contributed “sufficient expressive elements.” Prompting alone - even sophisticated, iterative prompting - isn’t enough. The Office uses a “spinning wheel” metaphor: selecting from AI outputs is like spinning a roulette wheel, not authorship.
— US Copyright Office, Part 2 Report (2025)The Office has concluded that prompts function more like instructions to a commissioned artist than like the creative tools wielded by an author. The human does not control the expressive elements of the output with sufficient specificity to be considered the author.
The courts agree. In Thaler v. Perlmutter, the D.C. Circuit affirmed in March 2025 that “the Copyright Act requires all eligible work to be authored in the first instance by a human being.” Thaler petitioned the Supreme Court in October 2025. The DOJ opposes review. As of February 2026, SCOTUS hasn’t decided whether to hear it.
The practical consequence: if your product is substantially AI-generated, you likely can’t enforce copyright on it. Competitors can copy it. You can’t sue for infringement on code that was never yours to begin with.
The Global Patchwork
It gets worse if you ship internationally. Every major jurisdiction has a different answer.
- United States: Humans only. No copyright for purely AI-generated works.
- Japan: The most permissive approach globally. Broadly allows AI training without consent and has signaled openness to AI output protections.
- China: Experimentally granting copyright in AI-assisted cases. A 2023 Beijing court allowed copyright for an AI-generated image where the user made 150+ prompt iterations.
- United Kingdom: Has existing protection for “computer-generated works” under Section 9(3) of the CDPA 1988, granting 50-year copyright to whoever made the “necessary arrangements.” But the government is considering abolishing it. The Data (Use and Access) Act 2025 punted the decision, requiring a report by March 2026.
- EU: Focused on training transparency rather than output ownership. The AI Act requires GPAI providers to publish training data summaries and respect copyright reservations. Enforcement begins August 2026 with fines up to 3% of global revenue.
India and the UK both attribute authorship of computer-generated works to “the person who undertakes necessary arrangements for their creation.” This is the most developer-friendly framing, but neither jurisdiction has tested it against modern AI coding tools.
A company shipping AI-generated code globally faces fundamentally incompatible legal regimes. Code that’s unprotectable in the US might be copyrightable in China. Code that’s fine in Japan might violate EU training transparency requirements. There’s no harmonized framework, and none is coming soon.
The Indemnity Theater
AI companies know this is a problem. Their solution: contractual indemnity.
Microsoft’s Copyright Commitment promises to defend Copilot Business and Enterprise customers if output infringes third-party IP. Anthropic offers similar protections on enterprise tiers.
OpenAI? “As-is.” If their output gets you sued, you’re on your own.
But even the indemnity providers have fine print. Microsoft’s commitment requires you to use their content filters and safety systems. Disable them and you lose protection. And all these commitments are capped and subject to standard limitations of liability.
More fundamentally: no terms of service can override copyright law. When the ToS say “we assign all rights in the output to you,” they’re assigning something that may not exist. You can’t transfer ownership of what the law says is unownable.
— A legal paradoxThe terms of service assign you ownership. The Copyright Office says there’s nothing to own. The training data may infringe someone else’s rights. And the courts haven’t decided any of this yet.
What Developers Should Actually Do
This isn’t a reason to stop using AI coding tools. But it’s a reason to think about how you use them.
- Document human contribution. The more you modify, review, and restructure AI output, the stronger your copyright claim. Accept-all-suggestions workflows are the riskiest.
- Watch for verbatim reproduction. If an AI suggestion looks suspiciously complete or specific, it may be memorized training data. Search for it before shipping.
- Know your license scanning gaps. Traditional SCA tools scan dependency manifests. They don’t catch GPL-licensed snippets inlined by Copilot. Tools like FOSSA and Codacy Guardrails are emerging to fill this gap.
- Check your provider’s indemnity. If you’re on a free or individual tier, you likely have zero IP protection from your provider.
- Treat AI output as unprotectable by default. If a piece of code is competitively critical, make sure a human wrote or substantially transformed it.
What This Doesn’t Solve
The legal framework is moving, but slowly. No major fair use decision is expected in any AI case until mid-to-late 2026. The 51+ active copyright lawsuits against AI companies are grinding through discovery. Congress has multiple bills in play: the COPIED Act, the TRAIN Act, the TRUMP AMERICA AI Act. None have passed.
Even the attempts to avoid regulation have failed. In May 2025, the House narrowly passed (215-214) a 10-year moratorium on state AI laws as part of the “One Big Beautiful Bill” - it would have prevented states from enforcing any regulation on AI models, systems, or automated decision-making. The Senate watered it down to five years, then voted almost unanimously to remove it entirely. The final bill, signed July 4, 2025, had no AI moratorium at all.
The fallback? Executive action. In December 2025, Trump signed an order creating an “AI Litigation Task Force” to challenge state AI laws case-by-case. The legislative branch can’t agree on rules. The executive branch is trying to tear up the ones that exist. And the judicial branch is still deciding the foundational questions.
The practical reality for developers shipping code today: you’re operating in a legal grey zone. The tools are useful. The productivity gains are real. But the legal foundations are unresolved, and the risk surface is larger than most teams acknowledge.
AI coding tools are a productivity multiplier. They’re also a legal experiment running at production scale. The companies building these tools know it. The companies shipping code with them should too.
Every line of AI-generated code you ship is a small bet: that training was fair use, that the output is original enough, that your jurisdiction will eventually protect it. Those are reasonable bets today. But they’re bets, not certainties. And the house rules haven’t been written yet.


