**Target:** Proposal for a regulation — Article 10 — paragraph 3

## Text proposed by the Commission

3. Training, validation and testing data sets shall be relevant, representative, free of errors and complete. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons on which the high-risk AI system is intended to be used. These characteristics of the data sets may be met at the level of individual data sets or a combination thereof.

## Amendment of the European Parliament

3. Training datasets, and where they are used , validation and testing datasets, including the labels, shall be relevant, sufficiently representative , appropriately vetted for errors and be as complete as possible in view of the intended purpose . They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used. These characteristics of the datasets shall be met at the level of individual datasets or a combination thereof.