Welcome back to the series on Who is liable when Artificial Intelligence causes harm? as we thoroughly discuss the topic Copyright Liability in AI Training Data Use.
Artificial Intelligence Training Data and Emerging Copyright Risk
Beyond questions of ownership and commercialization discussed previously, artificial intelligence deployment also raises significant legal concerns regarding the use of copyrighted materials in training datasets.
Artificial intelligence systems rely heavily on large datasets to develop predictive and generative capabilities. These datasets frequently contain text, images, software code, audio, video, and other forms of copyrighted material collected from online platforms, databases, or third-party repositories.
As the commercial use of artificial intelligence continues to expand, legal scrutiny regarding the use of copyrighted works in training datasets is increasing significantly. Questions are now emerging regarding whether the use of protected materials for artificial intelligence training may constitute unauthorized reproduction, unlawful use of proprietary content, or infringement of intellectual property rights.
For businesses deploying artificial intelligence technologies, these issues are becoming increasingly important from both a compliance and risk management perspective.
Copyright Protection Under Indonesian Law
Under Law Number 28 of 2014 concerning Copyright (“Copyright Law”), copyright is recognized as an exclusive right arising automatically based on a declaratory principle once a work is manifested in tangible form.
Accordingly, copyright protection does not depend on registration or recordation. Works that are publicly accessible or distributed online may therefore remain protected under Indonesian law notwithstanding the absence of formal registration.
Protected works under the Copyright Law include, among others:
- Literary and scientific works;
- Photographs and visual content;
- Cinematographic works;
- Computer programs and software code;
- Music and audio recordings;
- Databases and compilations;
- Translations, adaptations, and
- Transformed works.
The Copyright Law grants creators exclusive economic and moral rights over protected works, including rights relating to reproduction, publication, distribution, adaptation, communication, and commercial exploitation.
As a result, unauthorized use of copyrighted materials in artificial intelligence training activities may potentially create legal exposure where protected works are copied, processed, or commercially utilized without proper authorization.
At the same time, Indonesian copyright law also recognizes important limitations. Article 41 of the Copyright Law provides that ideas, methods, concepts, systems, and data are generally not protected unless expressed in a protected form.
This distinction between protected expression and unprotected ideas may become increasingly significant in disputes involving artificial intelligence-generated outputs, style imitation, or computational analysis.
Although Indonesian copyright law does not yet specifically regulate artificial intelligence training activities, existing legal principles may still apply to the use of copyrighted materials within machine learning systems.
Does Artificial Intelligence Training Constitute Unauthorized Reproduction?
One of the primary legal issues surrounding artificial intelligence training concerns whether the process constitutes “reproduction” of copyrighted works.
Artificial intelligence systems commonly require datasets to be copied, stored, indexed, or processed during training activities. In practice, this may involve reproducing copyrighted content within servers, databases, or computational systems.
Under Indonesian copyright law, reproduction rights are broadly protected. Consequently, where copyrighted materials are collected or processed without authorization, copyright holders may argue that such activities constitute unlawful reproduction.
This issue becomes particularly significant where:
- Datasets are obtained through automated scraping;
- Copyrighted materials are copied without licensing arrangements;
- Systems are trained using proprietary databases; and
- Commercial artificial intelligence products are developed using protected works.
At present, Indonesian law does not provide a clear statutory exception specifically permitting text and data mining for commercial artificial intelligence development.
Accordingly, organizations relying on large-scale data collection practices may face increasing legal uncertainty.
The Copyright Law does recognize limited exceptions for purposes such as education, research, quotation, and certain non-commercial activities where appropriate attribution is provided. However, the scope of these exceptions remains relatively narrow and may not clearly extend to commercial artificial intelligence development involving large-scale dataset processing.
As artificial intelligence technologies continue to evolve, the absence of explicit statutory guidance regarding machine learning activities is likely to remain a significant legal issue for businesses deploying artificial intelligence systems.
Risks Relating to Artificial Intelligence-Generated Outputs
Copyright risk may also arise from the outputs generated by artificial intelligence systems.
In certain circumstances, artificial intelligence-generated outputs may reproduce substantial portions of protected works, imitate distinctive artistic styles, or replicate copyrighted software code.
Potential legal issues may include:
- Reproduction of copyrighted images or text;
- Unauthorized imitation of artistic works;
- Generation of software code substantially similar to protected repositories; and
- Commercial use of outputs derived from protected materials.
Where generated outputs closely resemble copyrighted works, rights holders may argue that the resulting content infringes their economic rights under the Copyright Law.
As generative artificial intelligence technologies become more sophisticated, disputes relating to ownership and infringement risks are likely to increase.
Under Indonesian law, copyright holders may pursue civil claims before the Commercial Court for unauthorized commercial use of protected works. In certain circumstances, infringement involving intentional commercial exploitation may also expose parties to criminal sanctions under the Copyright Law.
Consequently, organizations deploying generative artificial intelligence systems should carefully assess both technical and legal risks associated with generated outputs.
Open-Source Data and Licensing Challenges
Many artificial intelligence systems rely on open-source repositories or publicly accessible online content.
However, public accessibility does not necessarily eliminate copyright protection.
Open-source materials are generally governed by licensing terms that may impose restrictions relating to:
- Attribution obligations;
- Modification rights;
- Redistribution conditions; and
- Commercial usage limitations.
Failure to comply with applicable licensing requirements may expose organizations to both copyright infringement claims and contractual disputes.
For organizations integrating artificial intelligence tools into commercial operations, careful review of dataset licensing arrangements is therefore becoming increasingly important.
Data Governance and Compliance Considerations
Artificial intelligence training activities frequently involve large-scale collection and processing of digital information through cloud infrastructure and third-party technology services.
Where datasets contain personal information in addition to copyrighted materials, organizations may also become subject to obligations under Law Number 27 of 2022 concerning Personal Data Protection (“PDP Law”).
Compliance considerations may therefore include:
- Ensuring lawful acquisition and use of datasets;
- Maintaining documentation regarding dataset sources and processing activities;
- Implementing internal safeguards for data security and confidentiality; and
- Responding promptly to unauthorized disclosures or data incidents.
As artificial intelligence systems become increasingly data-intensive, organizations may face overlapping legal risks involving both intellectual property protection and data governance obligations.
Governance and Risk Management Considerations
Organizations deploying artificial intelligence technologies should consider implementing governance frameworks designed to manage intellectual property and compliance risks.
Recommended measures may include:
- Conducting due diligence on dataset sources and licensing status;
- Implementing contractual protections with vendors and technology providers;
- Maintaining records relating to dataset acquisition and usage;
- Establishing internal review procedures for artificial intelligence deployment; and
- Implementing safeguards designed to reduce reproduction of copyrighted materials.
Early implementation of governance mechanisms may significantly reduce legal exposure and support responsible artificial intelligence deployment.
Why Businesses Should Act Now
Legal disputes relating to artificial intelligence training data are increasing globally. At the same time, regulatory authorities are paying greater attention to how organizations collect, process, and utilize digital information.
Businesses that rely on artificial intelligence without clear governance structures may face:
- Intellectual property disputes;
- Regulatory investigations;
- Operational disruption;
- Reputational damage; and
- Contractual liability.
As artificial intelligence adoption continues to accelerate, proactive legal risk management is becoming increasingly important.
Conclusion
The use of protected works in artificial intelligence training datasets presents significant legal uncertainty under Indonesian law. Although existing copyright legislation does not yet specifically regulate machine learning activities, organizations deploying artificial intelligence technologies may still face potential liability where copyrighted materials are reproduced or commercially utilized without authorization.
As Indonesia’s regulatory framework continues to evolve, organizations should ensure that artificial intelligence deployment remains aligned with intellectual property obligations, data governance requirements, and broader compliance expectations.
Careful legal planning, structured governance mechanisms, and early risk assessment will likely become essential components of responsible artificial intelligence adoption in the digital economy.
If you, a prospective client, have further inquiries about the topic discussed above, Schinder Law Firm is one of many corporate law firms in Indonesia that has handled numerous similar matters, with many experienced and professional corporate and civil lawyers in its arsenal, making it one of the top consulting firms in Indonesia. Feel free to contact us at info@schinderlawfirm.com for further consultation.
Author:
Budhi Satya Makmur