EU issues guidance on public summaries of AI training data

The European Commission has released an explanatory notice and template detailing how providers of general-purpose AI models must disclose training data content, as required under Article 53(1)(d) of the EU AI Act.

EU issues guidance on public summaries of AI training data

The EU AI Act, which came into force on 1 August 2024, obliges providers of general-purpose AI models to publish a public summary of the data used to train their systems. This requirement, applicable from 2 August 2025, aims to improve transparency around datasets, including material protected by copyright. The newly issued explanatory notice outlines how these summaries should be structured, while an annexed template provides a uniform reporting format.

The template is divided into sections covering provider and model identification, training data modalities and sizes, lists of data sources—such as public datasets, licensed private data, web-scraped content, and user data—and details of synthetic data generation. It also requires disclosure of data processing measures, including respect for copyright reservations and steps taken to remove illegal content.

The Commission emphasises a balance between transparency and the protection of trade secrets, requiring narrative rather than technical detail. Summaries must be updated regularly, particularly after significant retraining, and published when a model enters the EU market. Existing models must comply by 2 August 2027, with the AI Office overseeing enforcement and potential fines of up to 3% of global annual turnover for non-compliance.

Why does it matter?


This notice matters because it strengthens accountability and transparency in the development of AI models, particularly those with broad and unpredictable applications.

It also addresses potential risks of bias, discrimination, and misuse, since a clearer view of data sources allows developers and downstream users to evaluate diversity and quality. For researchers and civil society, the summaries create opportunities to scrutinise AI systems.

Finally, this obligation encourages fair competition by preventing large providers from concealing practices such as training on competitors’ models or user data, while giving smaller actors and open-source communities a clearer regulatory framework.

Go to Top