Collision Course? AI and Data Protection
12th June 2023
We have looked at the regulatory landscape of AI and the implications for intellectual property rights in Parts 1 and 2 of our AI briefing papers. In this part we will discuss what AI could mean for UK and EU GDPR obligations and how businesses can ensure they are compliant with those rules. The use of personal data in AI systems is an area that is actively under review by European data protection authorities. The Italian data protection regulator has temporarily banned Open AI’s ChatGPT and other European data protection regulators are looking closely at how these type of systems can comply with existing data regulations. Is AI on a collision course with data protection rules?
When GDPR was passed into legislation in 2018 it had a major impact on the data protection regimes in the EU and the UK. GDPR asserted itself as arguably the leading piece of data privacy legislation in the world with other jurisdictions using the regulations as a benchmark for their own rules. Some US states have selected elements of GDPR as the basis for their own privacy legislation.
GDPR was adopted by the UK prior to Brexit and therefore applies essentially in the same way as GDPR does for the EU (when we refer to GDPR in this article we are referring to both the UK and EU versions). Although GDPR pre-dated the dramatic increase in AI we have seen in the past year, the regulations are relevant to the processing of personal data within AI systems and there are also specific provisions relating to automated decision making which apply in certain circumstances.
Is personal data being used?
Personal data may be used in different parts of the development of an AI system and not others. For example, personal data may be used in the training stage of an AI model but removed during implementation. Therefore different data protection analysis may apply to the different stages of the AI system lifecycle.
This note assumes that the data used by the AI tool contains personal data at some point, but this should be analysed on a case by case and phase by phase basis. If personal data is not processed by the AI system or during the implementation of that system then the use of the data will fall outside GDPR. However, it is important to remember that the definition of personal data is very wide under GDPR and includes data that is ‘capable’ of identifying an individual.
Sometimes there is a phase of data conversion when building an AI model. This takes the ‘raw’ data and classifies or simplifies it to allow the training phase to begin. This may involve an element of ‘pseudonymisation’ where some personal identifiers such as contact details are removed. If a controller decides to anonymise the data set it must be genuinely anonymised to fall outside of GDPR. It cannot be the case that the controller retains the ability to re-identify individuals via a separate data set to the anonymised training data. Although this would be considered mitigation in the risk of using personal data in this way, it means that the personal data is still subject to GDPR.
Key data protection principles and AI
GDPR sets out a number of key principles which data controllers are accountable for: lawfulness, fairness and transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity and confidentiality (i.e. security). There are risks that AI’s capabilities could cause data controllers to fail to follow some of these principles.
Assuming that a controller is processing personal data within its AI tool (or contracting with a data processor to use an AI tool on its behalf) then the controller should assess whether the use of AI is in line with the data protection principles. This will apply to underlying training data used by the AI, or any personal data used in the direct operation of the AI tool by the controller or its processors (e.g. question setting or feedback).
Lawfulness, fairness and transparency
Under the principle of lawfulness, the controller must have a lawful basis to process the personal data, these include consent, legitimate interests of the controller, and processing necessary for the performance of a contract. The controller needs to consider on what lawful basis it is processing the data as part of the AI tool and this will impact decision making throughout the use of the data.
The controller must be clear open and honest with people in the way in which their personal data is being used. As AI tools have emerged and become more widely available, privacy policies may need to be updated to account for this.
We cover the concept of fairness in more detail below, covering the ICO guidance on this.
AI has a habit of creating unintended outcomes. It is possible that without clear parameters built into the AI tool a data controller could use personal data in a manner that is not consistent with its privacy notices.
There is a similar issue with the purpose limitation. Personal data cannot be collected for one stated purpose and then further processed in a manner that is incompatible with the stated purpose (with some limited exceptions). If AI is used as a tool to achieve the original purpose that is one thing, but a controller cannot use personal data within the AI tool for a different purpose (even if the AI tool comes up with a great new idea). For example, training data parameters should be matched to the purpose for which any personal data was collected.
Large language model or generative AI tools that have received so much publicity recently are built around having large data sets against which the AI tool can establish patterns and generate output. There is also a benefit of adding and refining the data set from outputs that the AI tool makes. There is an inherent advantage to retaining large data sets. This is at odds with the principle of data minimisation for personal data.
Controllers would need to have processes in place to review and purge personal data from the data sets used by their AI tools that go beyond what is necessary for the purpose for which that personal data was collected. The ICO has given guidance on how to apply data minimisation principles in the context of AI, which is summarised below.
A data controller must ensure that the personal data that they process is accurate and, where necessary, kept up to date. GDPR requires that “every reasonable step” is taken to ensure that inaccurate personal data is erased or rectified without delay. It is known that AI tools can “hallucinate” and give inaccurate responses. We are starting to see cases where AI output has created defamatory statements about individuals, which would certainly breach the accuracy principle under data protection law. Initially it seems very difficult for a controller to police an AI system to ensure that the personal data being processed is accurate, although perhaps processes could be put in place to check any output references to individuals.
Controllers can more easily have in place procedures to take reasonable steps to erase or rectify inaccurate data without delay, but may need to ensure access to underlying training data and/or require that its data processors provide this access and/or assistance where necessary.
The ICO guidance on accuracy seems somewhat AI friendly, acknowledging that AI outputs do not need to be 100% statistically accurate.
GDPR requires that data is kept in a form that identifies data subjects for no longer than is necessary for the purposes for which the personal data was processed (with some limited exceptions). Therefore AI tools should not retain personal data indefinitely and controllers would need to ensure that any personal data used by their AI tools are reviewed and purged accordingly.
Integrity and Confidentiality
Appropriate security measures should be taken in respect of personal data, including protection against “unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures”. AI is therefore no different to any other form of IT system when processing personal data. Both controllers and processors are responsible for ensuring the appropriate level of security when processing personal data via AI tools.
Rights of Data Subjects
There are a number of enforceable rights given to data subjects under GDPR: the right to be informed, right of access, right to rectification, right to erasure, right to restrict processing, right to data portability, right to object and, importantly, rights related to automated decision making including profiling.
The data subject rights are derived from the data protection principles and therefore pose similar issues to those discussed above, for example there is overlap between the principle of transparency and the right to be informed so we will not cover these again. However, there are some specific rights that are worth considering.
Right of access
The data subject has the right to obtain from the controller, confirmation as to whether or not personal data concerning him or her is being processed, and, where that is the case, access to the personal data and certain other information. This includes the recipients or categories of recipients of the data, the expected storage period as well as, if applicable, the automated decision making details referred to below. Therefore, if an AI tool is processing personal data an individual data subject is entitled to obtain a copy of the relevant data and the controller should be prepared to provide this information. The use of AI tools that include the processing of personal data need to account for this access but where large uncontrolled training data sets are used, this may be difficult (although this may become less of an issue in the operational part of the AI lifecycle).
Right to erasure
The “right to be forgotten” is a well-known part of GDPR, giving data subjects the right, in certain circumstances, to obtain from the controller the erasure of his or her personal data without undue delay. The right to request erasure depends to some extent on the lawful basis that the personal data is being processed. However, assuming that the relevant circumstances apply then the data controller will be required to erase all personal data of that data subject. The definition of personal data under GDPR is very wide and therefore the obligation to trawl through the AI data set to remove all of this information is potentially onerous.
We cover the ICO guidance on rights of access and rights of erasure in AI systems later in this note.
Automated Decision Making
GDPR gives specific rights to data subjects in respect of decisions that are based solely on automated decision making. This has direct applicability to AI systems that process personal data and are designed to make decisions on behalf of organisations without human intervention (see below for the guidance on the line between automated and partial automated systems).
Under Article 13.2(f) the controller must provide the data subject with information about the existence of automated decision-making, and meaningful information about “the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject”.
Under Article 22, there is a more restrictive lawful basis for processing personal data in this way. This is because individuals have the right not to be subject to a decision based solely on automated processing, “which produces legal effects concerning him or her or similarly significantly affects him or her”.
However, this restriction does not apply if one of the following conditions are met:
- is necessary for entering into, or performance of, a contract between the data subject and a data controller;
- is authorised by applicable law to which the controller is subject (and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests); or
- is based on the data subject’s explicit consent.
Where the first and third conditions above apply, the controller must implement suitable measures to “safeguard the data subject’s rights and freedoms and legitimate interests”. This means at least: (i) providing the right to obtain human intervention from the controller, (ii) allowing the data subject to express his or her point of view and (iii) to contest the decision.
In addition, this type of processing activity is considered “high risk” and as a result a Data Protection Impact Assessment is required to be completed beforehand.
There are further protections for automated decision processing with respect to special category data (for example racial or ethnic origin data, genetic data, biometric data, health or data concerning sexual orientation). In these cases, the permitted automated decisions do not apply unless there is explicit consent from the data subject (a higher bar than “standard” GDPR consent) or it is in the substantial public interest.
Therefore any AI tool which processes personal data and automates decisions that have a significant effect, such as employment decisions, need to be carefully designed. The controller would need to ensure one of the conditions for automated processing is met, and that there are mechanisms for human intervention and appeals. Any AI system that uses special category data such as health information is, in addition, likely to need very clear consents from the individual data subject.
What is clear from considering the extent of data subject rights is that an AI system that includes personal data must be carefully designed to allow the controller to meet data subject requests in exercising these rights. This is likely to need potentially complex processes to track, correct, extract and erase individual personal data within the personal data used by the tool.
Information Commissioner’s Office Guidance
The ICO has provided detailed guidance for best practice in complying with GDPR when using AI. The guidance suggest a risk-based approach, looking at what measures can be put in place to mitigate these risks to individuals. In addition, the ICO has produced an AI Risk toolkit to help organisations with practical steps when developing AI systems. This is available at www.ico.org.uk.
Governance and Risk Management Guidance
In order to comply with the accountability principle, the ICO suggests that senior management should be engaged (upskilling where necessary). The tasks and assessments required for managing risk in AI cannot be delegated to data scientists or engineers. The ICO expects companies to be able to demonstrate how they have fulfilled obligations to integrate their data protection obligations into AI using privacy by design and privacy by default (two key principles of GDPR).
A key part of mitigating the issues posed by AI is to carry out a Data Protection Impact Assessment. The ICO considers it very likely that a DPIA is required prior to the use of AI because the type of processing undertaken by AI is often going to pose a high risk to individuals rights. The ICO provides further information on what should be included in a DPIA and how to conduct one on its website.
As a minimum the ICO expects organisations to tell individuals the purposes for processing their personal data; the retention periods for that personal data; and who that organisation will share that data with. Where the data is collected directly from the data subject, this needs to be done at the time of collection (and before it is used for AI data training purposes).
As explained above, there must be a lawful basis for processing the personal data. The ICO suggests that it may be helpful to separate out the research and development phase from the deployment phase. These phases may have different purposes and a different lawful basis. Also, AI systems that are developed for a general purpose may then be deployed for specific purposes, once these specific purposes are known then there needs to be an analysis of the lawful basis.
The ICO guidance also makes the point that at the training model stage there is no automated decisions being made and only at the deployment phase will there be decisions that may trigger the automated processing rules.
The ICO goes into some detail about the differences between the requirement for accuracy in a data protection context and the idea of statistical accuracy. The guidance confirms that AI systems do not need to be 100% statistically accurate to comply with GDPR. However, the ICO does expect organisations to record where systems are making statistically informed predictions rather than facts. It also states that where an individual has requested a correction to inaccurate data then additional information should be added to their record ‘countering the incorrect inference’. In summary the ICO guidance advises that:
- adequate training is developed within the organisation to understand the associated statistical accuracy requirements and measures;
- where applicable, data is clearly labelled as inferences and predictions, and not as factual statements;
- common terminology is used to discuss statistical accuracy performance measures,
- including their limitations and any adverse impact on individuals;
- after deployment, proportional monitoring should by implemented;
- any claims made by third parties around accuracy of an AI system should be examined as part of the procurement process of AI systems;
- statistical accuracy should be regularly reviewed to guard against changing population data and concept/ model drift; and
- data governance practices and systems should be reviewed to ensure they remain fit for purpose following implementation of AI systems.
We have covered the requirements of lawful basis and transparency above. The guidance goes into some detail about what fairness means in the context of AI.
The key points made by the ICO:
- Organisations should only process personal data in ways that people would “reasonably expect”.
- Personal data should not be used it in any way that could have “unjustified adverse effects” on data subjects.
- If the AI system is used to infer data about people, organisations need to ensure that the system is “sufficiently” statistically accurate and avoids discrimination.
- Fairness relates to both how organisations go about the processing and also the outcome or impact of that processing.
- Other legislation may be applicable, for example the Equality Act 2010, depending on how the AI system is used.
The ICO guidance makes some specific points on security:
- Securing training data: large data sets are required for AI systems which may need to be copied and imported, as well as shared with third parties. The ICO advises that data movements should be recorded and documented, with clear audit trails. Intermediate transfer files that are no longer required should be deleted. Privacy measures may need to be added before sharing data with external organisations.
- Relationships with external parties: The guidance notes that a lot of AI development takes place using third party code, much of it being open source. Machine Learning development frameworks often have further external dependencies and organisations implementing the frameworks should take account of any possible additional security vulnerabilities.
- Specific AI security risks are highlighted by the ICO, including: ‘model inversion attacks’ (whereby a hacker infers private data about an individual from the inputs and outputs observed using the AI system), and ‘membership inference attacks’ (where a hacker can deduce whether an individual was part of a training data set).
Consider the security implications of making the AI output ‘explainable’ so that this does not increase the risk of vulnerability to a privacy attack.
Data Minimisation Guidance
The ICO recognises that the data minimisation principle is challenging for AI systems but capable of compliance. The principle is that personal data processed must be ‘adequate, relevant and limited’. A key question is whether the same objective can be achieved be processing less personal data. If the answer is yes then it is unlikely that an organisation is following the data minimisation principle.
The ICO is looking for organisations to implement data minimisation practices and techniques, as well as ensuring that data minimisation is fully considered at the design phase. Where AI systems are supplied by third parties then data minimisation should be part of the procurement and due diligence process.
The ICO guidance notes that data minimisation may need to be balanced against other requirements such as the need to produce appropriately statistically accurate models from larger data sets. So the guidance is that organisations should be looking to achieve statistically accurate models with the least amount of personal data necessary.
Where there are a number of personal data fields in a data set, not all of these may be relevant to the question that the AI system is tasked with answering. There should be an assessment of which of these elements are relevant to the purpose and exclude those that are not relevant. Personal data should not be included in training on the basis that it may be useful in the future and nor should there be retrospective justification established for activities that were outside the original purpose.
The guidance also goes into detail on privacy enhancing methods such as deliberately adding ‘noise’ to data sets, using synthetic, artificially generated data, and federated learning so that there are sub-sets of data that are used to train an AI model and then brought together in a single system later. However, it seems that some of these methods may bring a trade-off between a more robust compliance approach and the ultimate accuracy of the AI system.
There is also advice on privacy measures that can be taken at the point the AI system is in operational use. These include; converting data into less human readable forms, making inferences locally rather than on centralised or cloud servers, and privacy-preserving queries which allow a user to receive the output of the AI system without disclosing all personal data to the party operating the system. It is likely that these techniques will have some effect on the efficiency or scope of use of an AI tool and will therefore involve compromises between privacy and use.
Organisations should also consider data retention periods. If training data does not need to be updated once the system is in use then it should be deleted. If the training data is stated to have a limited period (e.g. 12 months’ worth of data) then older data should be deleted in line with that period.
Data Subject Rights Guidance
A data subject may request details of what information is held on him or her within the AI model, including the relevant training data. There is still a requirement on the relevant controller to respond to this in line with the GDPR requirements. Therefore if it is possible to identify which portion of the training data relates to that individual then this must be provided. It is important to note that controllers cannot contract out of these responsibilities by using external suppliers, therefore the controller should carry out due diligence on the supplier’s system to ensure that any such requests can be met and the appropriate assistance obligations are containing in the contract with the processor.
The ICO’s guidance on the right to rectification allows an element of proportionality for controllers. The more important it is for the AI model that the personal data is accurate, the greater the effort it expects Controller to put into checking its accuracy and, if necessary, taking steps to rectify it.
The guidance on the right to erasure is stronger and the ICO anticipates that the erasure of one individual’s personal data from the training data will have less impact on the overall system. Therefore the ICO considers that controllers are unlikely to have a justification for not fulfilling the request to erase personal data from AI training datasets. However, the deletion of the data from a training data set will not require the deletion of the AI model itself, except to the extent it contains that data or can be used to infer it.
There is also a potential for data portability rights to apply to training data, unless the training data is sufficiently amended so that it falls outside the definition of being ‘provided” by the data subject. This will require the controller to provide the data to the data subject in a ‘structured, commonly used and machine readable format’.
The guidance on the right to be informed contains some useful reminders:
- If data used for training the AI system was initially collected for a different purpose, then the data subjects must be informed, so this is likely to result in a new privacy communication to customers.
- The automated decision notice (where applicable) does not need to be given at the training data stage, this needs to be given when the relevant automated decision is being made, i.e. during operational use of the AI model.
Once personal data is used in the operational stage of the AI model then there are some slightly different implications:
- Where the output data is a prediction rather than a statement of fact, it is not subject to the right to rectification because it is not ‘inaccurate’.
- The output data is out of scope of the data portability rights because it has been subject to further analysis and is not “provided to’ the controller by the data subject.
Automated Decision Making Guidance
AI models that are used to make decisions are subject to the additional requirements unless there is ‘meaningful human intervention’ in every decision, i.e. the AI model is simply advising a human decision maker.
For a system to have meaningful human intervention:
Human reviewers must be involved in checking the system’s recommendation and should not apply the AI recommendation in a routine fashion.
Involvement must be active and meaningful and not just a token gesture. They should have the authority within the organisation to go against the recommendation.
Reviewers must consider all available input data, and also take into account other ‘additional factors’.
Where an AI model falls within the remit of the rules on automated decision making controllers must implement suitable safeguards. A key one of these is the right to obtain human intervention on the decision made by the system. The ICO recognises that the complex data patterns in machine learning systems may make it difficult for a human to understand the process taking place to make the decision. The ICO’s guidance includes the following recommendations:
- Build in the requirements necessary to support a meaningful human review at the system design phase. In particular, effective user-interface design to support human reviews and interventions.
- Design and deliver appropriate training and support for human reviewers.
Give staff the appropriate authority to address or escalate individuals’ concerns and, if necessary, override the AI system’s decision.
- The process for individuals to exercise their rights should be simple and user friendly. For example, including a contact link on a website that communicates the decision.
- Records must be kept of all decisions made by an AI system and this should also include whether an individual requested human intervention, expressed any views, contested the decision, and whether you changed the decision as a result. The data should be monitored and analysed to improve the AI system.
The ICO is concerned that AI systems will create “automation bias”, where a human reviewer goes along with the AI recommendation where it should have been challenged. As well as training reviewers appropriately, the ICO recommends that they must have access to additional information beyond the AI data set (for example by talking to the individual involved).
There is also a concern about the ‘interpretability’ of AI outputs. If a human reviewer is unable to predict how the systems output will be affected by different inputs, identify important inputs, or identify the output might be wrong, then this points to low interpretability. This could cause issues with providing appropriate safeguards on human review of automated decisions.
The ICO suggests that in some cases, organisations could address this via local explanations of discrete parts of the AI model rather than the model as a whole. Confidence scores could also be provided with AI outputs to assist human reviewers in understanding and interpreting the outcomes. These technical solutions do not replace the need for interpretability to be built into the AI model at the design phase or proper risk management and training procedures for staff working with AI models.
The existing GDPR rules will apply to AI models that process personal data in some way. The additional GDPR rules relating to automated decision making could apply to certain AI models. In this case, organisations will need to ensure they have met the additional transparency and process requirements for automated decision making set out in GDPR.
There are some inherent challenges that AI creates for GDPR compliance, such as the use of large data sets for unspecified purposes, and the ability to track and correct individual personal data. An AI system that has not be designed with GDPR in mind, that processes personal data is likely to be on a collision course with the regulator. The ICO is giving clear guidance that AI systems processing personal data can be compliant but to do this they must be designed from the ground up, enabling those responsible for the AI model to secure personal data, minimise personal data use, meet data subject rights, and where applicable, comply with the rules on automated decisions.
For more information on the issues raised in this note or for any of your IT or data legal issues please get in touch with us:
Codified Legal 17 April 2023
7 Stratford Place