Authors
Partner, Intellectual Property, Ottawa
Partner, Technology, Toronto
Associate, Intellectual Property, Toronto
Key Takeaways
- Fundamental legal questions about the use of copyrighted materials for training AI models are being litigated in Canada and the U.S.
- Pending cases and government consultations will allow courts and Parliament to clarify how Canada’s copyright law applies to AI training and modernize the Copyright Act.
- Organizations developing or building on AI models should audit the datasets and sources used for training and monitor legal developments expected in 2026.
The recent rapid advancement of artificial intelligence (AI) technologies has raised fundamental legal questions regarding whether and how copyrighted materials may lawfully be used to train these technologies. Training refers to the process by which AI systems learn from vast datasets, often containing copyrighted text or images, to develop their capabilities and generate new outputs.
Canada’s Copyright Act was last substantively updated in 2012, long before the rise of the multi-modal language models and agentic AI platforms that are now gaining significant traction, as discussed in our Osler Legal Outlook article. Lawsuits in the U.S. and Canada, whose copyright regimes share similar defences or exceptions to copyright infringement, are testing the applicability of existing copyright frameworks to AI. Recent Canadian government consultations regarding the implication of developments in AI on copyrighted materials also make this area ripe for potential reform.
As we move into 2026, we expect to see the courts in the U.S. and Canada begin to clarify how fair dealing applies to training AI models. The recent government consultations may inform legislative changes to modernize Canada’s Copyright Act for a world increasingly shaped by AI. The result could materially affect the trajectory of AI development in Canada, including how and where models are trained and the conditions under which these technologies can be responsibly used across Canadian industries.
Intersection of AI training and copyrighted material
Modern AI cannot be developed without access to vast quantities of data, much of which contains works protected by copyright. In past years, the focus has been on the rapid development of large language models (LLMs). However, the AI landscape extends beyond such text-based models. The dominant trend over the past 18 months has been multi-modal, referring to AI systems that process and integrate multiple types of data, such as text, images, audio and video, to provide more comprehensive and nuanced outputs.
The shift toward multi-modal AI exacerbates the copyright challenges that already existed with the use of data for AI training. Literary and artistic works are implicated. Other protectable subject matter under the Copyright Act is also at issue, including sound recordings, cinematographic works and performers’ rights, each of which carries distinct rights and related exemptions. As developers compete to build increasingly capable AI models, legal questions surrounding data provenance, ownership and lawful use of datasets have become more pressing.
Fair dealing in Canada and fair use in the U.S.
The U.S. fair use and Canadian fair dealing doctrines are similar. Both are defences or exceptions often invoked when copyright infringement is alleged. Recent U.S. decisions may therefore influence Canadian jurisprudence.
Copyright protects original works of authorship once fixed in a tangible form of expression. The copyright owner has, among other rights, the exclusive right to, or authorize others to, produce, reproduce, publish or perform the work or a substantial part of the work. The copyright owner also has the right to make copies available to the public. Infringement can occur when there is an unauthorized use of copyrighted material.
The fair use and fair dealing doctrines are legal exceptions to copyright infringement that balance copyright holders’ rights with public interest in access to information and creative expression. Both doctrines require courts to assess the fairness of a use or dealing by evaluating factors such as the purpose of the use, the nature of the work, the amount used and the effect on the market for the original work. If the use or dealing is determined to be fair, there is no copyright infringement.
Fair use in the U.S. is open-ended and flexible, allowing any purpose to qualify if deemed fair under statutory factors. This adaptability accommodates new technologies and uses. In contrast, Canada’s fair dealing doctrine is limited to specific purposes listed in the Copyright Act, which include research, private study, education, parody or satire, criticism or review and news reporting. While Canadian courts give these purposes a large and liberal interpretation, uses of copyrighted material for other purposes are excluded from the fair dealing exception.
Early U.S. decisions provide insights
Several lawsuits have been commenced in the U.S. concerning the use of copyrighted material in training AI products and models. Decisions in two of these cases could signal how Canadian courts and parties may tackle these novel issues.
In the first fair use decision [PDF] involving AI, Thomson Reuters sued Ross Intelligence regarding the use of copyrighted content in the training of Ross’s AI legal research search engine. The U.S. District Court rejected Ross’s fair use defence. The court found that, while two of the four fair use factors favoured Ross, the two most important factors — purpose and character and the effect on the market — and the overall balancing favoured Thomson Reuters. Ross’s use of the copyrighted material was for a commercial purpose and both companies used the material at issue for very similar purposes. Moreover, Ross meant to compete with Thomson Reuters by developing a market substitute.
Conversely, in a proposed class action brought by a group of published authors alleging that Anthropic infringed the plaintiffs’ copyright by, among other things, copying their works to train Anthropic’s flagship LLM model, Claude, the U.S. District Court ruled [PDF] in favour of Anthropic. It was fair use for Anthropic to use copyrighted books to train Claude, no matter how the books were obtained. It was also fair use for Anthropic to digitize purchased print books for internal storage. However, it was not fair use for Anthropic to acquire and retain pirated copies of books.
Critical to the Court’s determination was first, that the plaintiffs did not allege, nor did the record show, that Claude’s output included infringing copies or substantial reproductions of the plaintiffs’ works. Second, there was additional software in place to ensure that no infringing output ever reached the users. Finally, the copies of works used to train the LLMs did not, and would not, displace demand for copies of the plaintiffs’ works.
Following the Court’s decision, the parties entered into a US$1.5-billion settlement. If approved, it is believed to be the largest publicly reported copyright settlement in history.
Canadian cases to watch
Several lawsuits have recently been commenced in Canada based on similar allegations.
One such proceeding [PDF] is a proposed class action recently commenced in British Columbia concerning the training of MosaicML Inc.’s (Mosaic) LLMs. Mosaic and its parent company, DataBricks Inc., are alleged to have breached the Copyright Act by downloading a dataset comprising pirated books to train its various LLMs and using software to remove copyright management information from the books downloaded for that purpose. Mosaic is also alleged to have taken steps to conceal its use of copyrighted material.
In another lawsuit [PDF] launched in Ontario, several of Canada’s leading media companies and news publishers allege that OpenAI, Inc. and its affiliated companies infringed their copyright. OpenAI is alleged to have improperly scraped copyrighted content from the plaintiffs’ websites and web applications, reproduced the scraped content into datasets and used those datasets to train ChatGPT. It is also alleged that OpenAI circumvented technological protection measures and breached the terms of use governing the plaintiffs’ websites.
While these cases are still in their infancy and no decisions have yet been made, they represent significant potential for judicial clarification of how our current copyright regime, including fair dealing, applies to AI training.
Canadian government consultations
Since 2021, the Government of Canada has launched two consultations regarding Canada’s copyright framework as it applies to AI. A central focus of the most recent consultation was the use of copyright-protected works in the training of AI systems.
The consultation concluded in November 2024 and culminated with Consultation on Copyright in the Age of Generative Artificial Intelligence: What we heard report. The report highlights a clear tension between stakeholders. Rights holders advocated for a framework that ensures creative works are used only with consent, credit and fair compensation. On the other hand, users, including AI developers and businesses, raised concerns that overly restrictive copyright rules could stifle innovation and hinder Canada’s competitiveness in the global AI economy.
The future trajectory of Canada’s copyright framework will likely hinge on how the government balances these competing interests. The consultations have set the stage for targeted legislative amendments to the Copyright Act, particularly in areas such as text and data mining exceptions, the fair dealing provisions and the delineation of rights and responsibilities in AI training.
Immediate implications for AI developers
The emerging case law and policy discussions carry immediate implications for organizations that train AI. Organizations should take stock of the datasets and sources used to train or fine-tune AI systems they build or license and conduct a risk assessment. They must ensure contractual arrangements and other controls are in place to provide lawful authority. Organizations should develop guidance regarding the risks of using outputs generated by AI tools trained on copyrighted material, which may reproduce substantial parts of copyrighted material.
Customers and users of AI are increasingly asking questions about dataset provenance and rights to data used to train AI tools. In response, we expect organizations to exercise greater caution in how they communicate about the training of their AI systems, whether in public statements or in discussions with clients and other stakeholders. Transparency must be balanced against risks of unnecessary legal exposure.
We also expect to see organizations refine how they allocate copyright and data-related risk in commercial contracts. For instance, LLM providers are offering limited indemnities for copyright infringement in model outputs that cover risk that it may reproduce copyrighted material used in training. Customers are seeking assurances that their proprietary data will not be used for further training. Both sides of these transactions are increasingly negotiating representations, warranties and indemnities addressing dataset rights and third-party claims. Where copyright law remains unsettled, the terms of a contract are an important vehicle for risk allocation.
Find out more about our Intellectual Property team.
Learn moreLooking ahead
This coming year is poised to provide clarity, judicial guidance and statutory reform that could significantly influence the future of AI development and how these products are trained.
Decisions in U.S. cases may inform how Canadian courts tackle these novel issues. Decisions in Canadian cases may, in turn, clarify how existing Canadian copyright principles apply to the use of copyrighted materials in training datasets, providing a judicial foundation for any legislative reforms. As the Canadian government considers its next steps, stakeholders will be watching closely to see whether Canada can craft a forward-looking copyright framework that supports both innovation and fairness in the age of AI.


