Chapter III — High-Risk AI Systems

Home / Articles / Chapter III / Article 10

v2024-08-01

Article 10 Data and data governance

High-risk AI systems which make use of techniques involving the training of AI models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5 whenever such data sets are used. Amendment proposed

Forthcoming amendment · proposed

REPLACEPara 1: references new Article 4a(1) for data quality; Para 5: deleted; Para 6: references Article 4a(1) for non-training-technique systems

Training, validation and testing data sets shall be subject to data governance and management practices appropriate for the intended purpose of the high-risk AI system. Those practices shall concern in particular:

(a)

the relevant design choices;

(b)

data collection processes and the origin of data, and in the case of personal data, the original purpose of the data collection;

(c)

relevant data-preparation processing operations, such as annotation, labelling, cleaning, updating, enrichment and aggregation;

(d)

the formulation of assumptions, in particular with respect to the information that the data are supposed to measure and represent;

(e)

an assessment of the availability, quantity and suitability of the data sets that are needed;

(f)

examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations;

(g)

appropriate measures to detect, prevent and mitigate possible biases identified according to point (f);

(h)

the identification of relevant data gaps or shortcomings that prevent compliance with this Regulation, and how those gaps and shortcomings can be addressed.

Training, validation and testing data sets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used. Those characteristics of the data sets may be met at the level of individual data sets or at the level of a combination thereof.

Data sets shall take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting within which the high-risk AI system is intended to be used.

Articles1

To the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems in accordance with paragraph (2), points (f) and (g) of this Article, the providers of such systems may exceptionally process special categories of personal data, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons. In addition to the provisions set out in Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680, all the following conditions must be met in order for such processing to occur:

(a)

the bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data;

(b)

the special categories of personal data are subject to technical limitations on the re-use of the personal data, and state-of-the-art security and privacy-preserving measures, including pseudonymisation;

(c)

the special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons have access to those personal data with appropriate confidentiality obligations;

(d)

the special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties;

(e)

the special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first;

(f)

the records of processing activities pursuant to Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680 include the reasons why the processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data.

Articles1

For the development of high-risk AI systems not using techniques involving the training of AI models, paragraphs 2 to 5 apply only to the testing data sets.

Annotations · 8 entries

2021-04-21

Commission Proposal — COM(2021) 206 final

Article 10 — Data and data governance

~1 paragraph(s) · Open full view →

1. High-risk AI systems which make use of techniques involving the training of models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5. 2. Training, validation and testing data sets shall be subject to appropriate data governance and management practices. Those practices shall concern in particular, (a) the relevant design choices; (b) data collection; (c) relevant data preparation processing operations, such as annotation, labelling, cleaning, enrichment and aggregation ; (d) the formulation of relevant assumptions, notably with respect to the information that the data are supposed to measure and represent; (e) a prior assessment of the availability, quantity and suitability of the data sets that are needed; (f) examination in view of possible biases; (g) the identification of any possible data gaps or shortcomings, and how those gaps and shortcomings can be addressed. 3. Training, validation and testing data sets shall be relevant, representative, free of errors and complete. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons on which the high-risk AI system is intended to be used. These characteristics of the data sets may be met at the level of individual data sets or a combination thereof. 4. Training, validation and testing data sets shall take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, behavioural or functional setting within which the high-risk AI system is intended to be used. 5. To the extent that it is strictly necessary for the purposes of ensuring bias monitoring, detection and correction in relation to the high-risk AI systems, the providers of such systems may process special categories of personal data referred to in Article 9(1) of Regulation (EU) 2016/679, Article 10 of Directive (EU) 2016/680 and Article 10(1) of Regulation (EU) 2018/1725, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons, including technical limitations on the re-use and use of state-of-the-art security and privacy-preserving measures, such as pseudonymisation, or encryption where anonymisation may significantly affect the purpose pursued. 6. Appropriate data governance and management practices shall apply for the development of high-risk AI systems other than those which make use of techniques involving the training of models in order to ensure that those high-risk AI systems comply with paragraph 2.

July 2024

Final Adopted Text — Regulation (EU) 2024/1689

Article 10 — Data and data governance

6 paragraph(s) · Current text shown above

Recital 66

Requirements should apply to high-risk AI systems as regards risk management, the quality and relevance of data sets used, technical documentation and record-keeping, transparency and the provision of information to deployers, human oversight, and robustness, accuracy and cybersecurity. Those requir…

Recital 67

High-quality data and access to high-quality data plays a vital role in providing structure and in ensuring the performance of many AI systems, especially when techniques involving the training of models are used, with a view to ensure that the high-risk AI system performs as intended and safely and…

Recital 68

For the development and assessment of high-risk AI systems, certain actors, such as providers, notified bodies and other relevant entities, such as European Digital Innovation Hubs, testing experimentation facilities and researchers, should be able to access and use high-quality data sets within the…

Recital 69

The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. In this regard, the principles of data minimisation and data protection by design and by default, as set out in Union data protection law, are applicable when personal data ar…

Recital 70

In order to protect the right of others from the discrimination that might result from the bias in AI systems, the providers should, exceptionally, to the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems, subject t…

AI Act Guide

Member State Guide Adopted 2 September 2025

EN NL

47 article citations

Cites Article 10 in §4.2, p. 14, fn. 30

Source ↗

Commission Guidelines on prohibited artificial intelligence practices established by Regulation (EU) 2024/1689 (AI Act)

European Commission Guidelines Adopted 29 July 2025

EN NL

336 article citations

Cites (3×)

Article 10 — §2.1, p. 7
Article 10 — §2.8, p. 20
Article 10 — §8.4, p. 95, fn. 179

Source ↗

View:Sort: Date ↓

GuidanceMember State
AI Act Guide
2025-09-02·nl_ez_aiact_guide_2025_v1_1
Art. 10
GuidanceEuropean Commission
Commission Guidelines on prohibited artificial intelligence practices established by Regulation (EU) 2024/1689 (AI Act)
2025-07-29·ec_guidelines_prohibited_practices_2025
Art. 10 · Art. 10 · Art. 10
RecitalRecital 66
Recital 66
Regulation (EU) 2024/1689
Requirements should apply to high-risk AI systems as regards risk management, the quality and relevance of data sets used, technical documentation and record-keeping, transparency and the provision of information to…
RecitalRecital 67
Recital 67
Regulation (EU) 2024/1689
High-quality data and access to high-quality data plays a vital role in providing structure and in ensuring the performance of many AI systems, especially when techniques involving the training of models are used, with…
RecitalRecital 68
Recital 68
Regulation (EU) 2024/1689
For the development and assessment of high-risk AI systems, certain actors, such as providers, notified bodies and other relevant entities, such as European Digital Innovation Hubs, testing experimentation facilities…
RecitalRecital 69
Recital 69
Regulation (EU) 2024/1689
The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. In this regard, the principles of data minimisation and data protection by design and by…
RecitalRecital 70
Recital 70
Regulation (EU) 2024/1689
In order to protect the right of others from the discrimination that might result from the bias in AI systems, the providers should, exceptionally, to the extent that it is strictly necessary for the purpose of ensuring…
HistoryCommission Proposal
Article 10 — Commission Proposal
2021-04-21·COM(2021) 206 final
1. High-risk AI systems which make use of techniques involving the training of models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in…

Referenced by