France’s data watchdog issues AI guidance on legal basis and web scraping

The guidance focuses on two areas critical to AI model development: the legal basis for processing personal data under the General Data Protection Regulation (GDPR), and the collection of data through web scraping.

France’s data watchdog issues AI guidance on legal basis and web scraping

France’s data protection authority (CNIL) released two key recommendations aimed at AI developers operating within the European Union. The guidance focuses on two areas critical to AI model development: the legal basis for processing personal data under the General Data Protection Regulation (GDPR), and the collection of data through web scraping.

Legitimate interest as a legal basis for AI development

CNIL emphasises that the legitimate interest legal basis under the GDPR is likely the most appropriate for AI developers, especially considering the practical difficulty of obtaining user consent at scale. Drawing from European case law and regulatory guidance, CNIL outlines a three-part test for relying on legitimate interest when processing personal data to train AI models:

  1. Legitimacy of the interest: The purpose must be legitimate, such as scientific research, public information access, product improvement, or fraud prevention. CNIL confirms that commercial objectives may also be valid.
  2. Necessity of processing: The data processing must be essential to achieving the stated interest.
  3. Balancing test: The impact on individuals must not outweigh the interest pursued. Developers are advised to implement safeguards to mitigate risks to data subjects, such as using anonymised or synthetic data, and offering opt-out options prior to processing.

To support implementation, CNIL provides practical scenarios showing how this test may be applied in real-world AI development contexts.

Guidance on web scraping

In its second recommendation, CNIL addresses the growing use of web scraping in AI training datasets. Rather than imposing a ban, the authority sets out a list of compliance measures that developers must adopt. These include establishing clear data collection criteria, excluding sensitive or prohibited categories of data, and promptly deleting any irrelevant information.

The guidance also expands on how to apply the legitimate interest test in the context of web scraping. Developers are encouraged to maintain a list of excluded websites, respect opt-outs from scraping, and restrict data gathering to publicly accessible information. These steps aim to align data collection practices with the GDPR’s accountability principle.

This guidance is part of CNIL’s broader engagement with AI governance as data protection authorities across Europe adapt to rapid advances in machine learning technologies. Developers operating in the EU are advised to closely follow such updates to ensure ongoing compliance.

Go to Top