Ethical Concerns in AI's Collection of Personal Data

We examine two real-life cases to demonstrate the importance of keeping personal data safe. In the first case, a girl under the age of…

Neda Peyrone, PhD

~12 min read · January 28, 2024 (Updated: February 24, 2024) · Free: Yes

We examine two real-life cases to demonstrate the importance of keeping personal data safe. In the first case, a girl under the age of thirteen witnessed her mother being murdered, and her father was accountable for this horrible event. Because her personal data might be disclosed in such a delicate matter, there are serious concerns over the child's safety and privacy. As for the second case, it is about a robbery and murder that happened in a jewelry shop. It was a very serious crime, and the public got a picture of the suspect by mistake. This individual was previously involved in another case, but he has since completely turned his life around, is working in a stable job, and is actively pursuing his college education. But the facts that leaked damaged his reputation and made people suspicious of him. Situations such as these highlight the significance of having strong privacy laws. They remind us that people may suffer lifelong harm if their personal data is disclosed. This highlights the importance of secure and confidential handling of personal data, especially in the era of AI-driven technology.

In my opinion, we must guarantee data subjects feel secure in the knowledge that their information is only used for the purposes for which they have given consent. Which systems have consent-based data retention policies that are well-defined? Which systems enable data subjects to request information about the usage of their data? Which systems facilitate consent opt-in or opt-out?

Most systems utilize a technique known as role-based access control (RBAC) to restrict who can access specific resources depending on the role of the user. Ever wonder what happens to data subjects who opt out of their consent? They might not see certain features, but it doesn't mean the system stops processing their data. This is because consent is not always a part of the system's initial design. Others may thus still be able to access their data.

Therefore, this is a significant gap. How can we ensure that lawyers understand computing and computer experts understand the law?

Figure 1: Demonstration of AI systems collecting, utilizing, and making decisions based on individuals' personal data. Additionally, I used the command prompt to guide ChatGPT 4 to generate this image.

AI's Role in the Modern World

AI is used in many areas, including:

Smart Devices: AI powers voice recognition and personalization in gadgets (Figure 2).

Figure 2: Demonstrating how voice control operates smart devices [1].

Healthcare: AI helps diagnose diseases, suggests treatment, spots abnormalities in X-rays (Figure 3), assists in surgeries, and generates new compounds for drug discovery.

Figure 3: Demonstrating how an AI system identifies abnormal chest X-rays [2].

Transportation: AI helps to improve self-driving cars' safety (Figure 4), decrease energy consumption and CO2 emissions, and optimize traffic flow in traffic systems.

Figure 4: Image from iStock/Kinwun.

Business: Companies adopt AI to improve their business operations, like using chatbots to handle frequent customer questions (Figure 5), analyzing customer behavior to tailor product recommendations, and optimizing delivery processes.

Figure 5: Image from Netguru.com.

Entertainment: Streaming services use AI for personalized movie and music recommendations based on customer histories (Figure 6). Additionally, it allows customers to customize their interests, which further refines the AI's recommendations.

Figure 6: Demonstrating a YouTube Music playlist mixed according to a user's listening history.

Education: AI provides adaptive learning analysis and customizes learning materials based on learner performance in real-time, ensuring that no learner is stuck on subjects (Figure 7).

Figure 7: Image from Edweek.org

Enhancing AI Systems with Personal Data

Personal data can help reduce bias in AI models for several reasons:

Diversity and inclusivity: Using data from diverse and inclusive population groups helps avoid bias arising from limited or specific training data.
Identification and mitigation of bias: Training AI models with diverse personal data helps developers identify and address any biases, such as when an AI system yields less accurate or unfair outcomes for certain groups.
Enhancing Predictive Accuracy: Personal data helps AI provide more accurate predictions, such as in healthcare, guiding to better diagnosis and treatment using specific, real-world information.

AI Models Bias and Data Privacy Law Conflicts

Recent news reveals the conflict between AI bias and privacy laws:

Challenges in complying with privacy laws

Large language models (LLMs) face unique challenges with the 'right to be forgotten,' as specific data points cannot be easily isolated and removed, which complicates the erasure of individual data. Existing solutions:

Machine Unlearning: It is a process of removing specific data from a machine learning (ML) model aimed at preserving privacy without complete model retraining [3], as shown in Figures 8 and 9.

Figure 8: Demonstrating Machine Unlearning includes four steps: Unlearning, Slicing, Retaining, and Reinitialization [3].

Figure 9: Demonstrating Machine Unlearning includes four steps: Unlearning, Slicing, Retaining, and Reinitialization [3].

Making AI Forget You: Data Deletion in Machine Learning: This study introduces two new k-means clustering algorithms that more efficiently remove specific data from machine learning models without sacrificing the quality of the results [4].

Balancing privacy rules and AI development

Julie Babayan from Adobe noted that Congress might limit researchers in developing unbiased tools, as these require diverse, inclusive population groups [5].

Legal issues in data collection for AI training

Lawsuits have shown the risks of AI training with personal data. For instance, IBM used public images for an unbiased dataset, while Clearview AI formed a facial recognition database, both without obtaining consent [6].

Impact of data privacy regulations

The Cambridge Analytica-Facebook scandal and similar breaches led to stricter laws like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR), affecting how organizations use personal data in AI training while respecting privacy [7].

The Growth in Personal Data Collection

In 2021, 70% of businesses reported an increase in data collection, while 86% of the public grew concerned about data privacy, and 68% were troubled by companies' data practices [8].

The need for accurate AI predictions and personalized experiences drives the increased collection of large datasets. Therefore, organizations using AI must balance their need for personal data and compliance with privacy regulations.

Figure 10: Image from Datamatters.sidley.com.

Principles of Ethical AI

Ethical AI principles ensure that AI technologies behave responsibly and benefit everyone fairly. These principles generally include:

Fairness: AI shouldn't discriminate against certain population groups based on their race, gender, or age.
Transparency: We should understand the internal workings of AI models and how they make decisions based on what data or factors.
Accountability: It's important to clarify who holds the responsibility for the decisions AI systems make.
Privacy: AI systems should protect user privacy throughout data handling and comply with relevant privacy laws without sacrificing their effectiveness.
Safety and security: AI systems should be reliable and not cause any accidental or unlawful destruction to ensure their safety and security to anyone.
Beneficence: AI should contribute positively to society by efficiently solving problems across various sectors and avoiding negative consequences.

Relevant Studies

Here are studies on AI's ethical data use across sectors:

Healthcare AI for Disease Prediction [9]

Improve accuracy: AI improves disease prediction accuracy by processing large datasets.
Early detection: AI can detect illnesses in their early stages, which improves effective treatments.
Personalized healthcare: AI can customize treatments to suit each patient based on their health profile.
Efficiency: AI speeds up analysis, which saves time and resources.
Data integration: AI improves disease prediction by analyzing different kinds of health data.

AI in Education for Personalized Learning [10]

Customized learning paths: AI can personalize education by analyzing each learner's experiences and needs.
Adaptive learning: AI aids to adapt learners' speed of learning.
Immediate feedback: AI provides instant feedback to learners, which aims for quicker understanding.
Accessibility: AI makes education available, which offers personalized learning opportunities regardless of location or resources.
Efficiency in learning: AI simplifies the learning process by efficiently targeting areas for improvement.

AI in Financial Services for Fraud Detection [11]

Enhanced detection capabilities: AI can analyze large amounts of transactions to recognize anomalous activities.
Real-time analysis: AI enables real-time monitoring to detect fraud and response.
Reduced false positives: AI more accurately determines between honest and dishonest transactions, which minimizes false positives.
Adaptive learning: AI continually improves its ability to detect complex fraud schemes.
Cost efficiency: Automating fraud detection with AI reduces operational costs.

Common Disadvantages

Typical drawbacks are:

Bias and fairness: AI might amplify existing biases, which cause unfair results.
Transparency and explainability: Many AIs lack transparency, which makes it hard to interpret and trust their predictions.
Privacy concerns: Using special categories of personal data in AI makes individuals concerned about their data privacy and security.
Responsibility and accountability: In every decision made by AI, there is a need to be someone accountable, especially when errors occur.
Ethical implications: Depending too much on AI for making decisions can reduce the individuals' decision-making power.

Figure 11: Demonstrating a boy crying next to a robot, this symbolizes how AI systems that collect and use data can make decisions that negatively impact his life. Additionally, I used the command prompt to guide ChatGPT 4 to generate this image.

Privacy Issues

Existing issues are:

Unauthorized Surveillance

Privacy violations can corrupt public trust in institutions. For instance, the NSA's surveillance practices revealed by Edward Snowden in 2013 significantly impacted public trust in government agencies (Figure 12).

Figure 12: Image from Theguardian.com.

Data Misuse

This occurs when personal information is collected for one purpose but used for another, often without the knowledge or consent of the individual. For example, someone gives their email to a bookstore to get newsletters. Later, they learn the bookstore gave their email to other companies for ads without telling them or asking if it's okay.

Lack of User Consent

Sephora's CCPA Violation in California: Sephora has to pay $1.2 million in penalties, inform California customers it sells their personal data, and neglect their ways to opt-out (Figure 13).

Figure 13: Image from Techrepublic.com

TikTok's Child Privacy Issues in the UK: TikTok may have to pay $45.1 million in penalties for failing to safeguard children's privacy in the United Kingdom (UK), as shown in Figure 14.

Figure 14: Image from Businesstimes.com.sg.

How do organizations handle individuals' personal data?

How do we know behind the scenes they manage personal data appropriately within given consent?

Figure 15: Image from Castoredc.com

General Data Protection Regulation (GDPR)

Important requirements of the GDPR include: 1) Personal data can only be processed with the data subject's consent, 2) Data subjects can withdraw their consent at any time, and 3) Data subjects have the right to access and transfer their data.

The GDPR recognizes the importance of the 'Privacy by Design' (PbD) concept in protecting personal data, but it does not specify details for practical implementation, as shown in Figure 16.

Figure 16: The GDPR's key requirements.

Principles of PbD (Ann Cavoukian, Ph.D.)

Most system developers refer to seven basic principles of PbD for designing personal data as guidelines in engineering system design and development. The core idea is that privacy protection must be an integral part of the system. Adding this component should not reduce the existing system's effectiveness. This ensures that in processing personal data, there is no violation of privacy.

PbD is used to control the entire engineering design process (Figure 16), which comprises seven fundamentals:

Proactive not Reactive
Privacy as the Default
Privacy Embedded into Design
Full Functionality
End-to-End Security
Visibility and Transparency
Respect for User Privacy

Figure 16: The seven principles of PbD.

Consent Management (CM)

Consent management is an essential part of software systems by adopting the PbD concept. It handles permissions and controls access to personal data based on the data subject's consent (Figure 17).

Figure 17: The consent management's key requirements.

The Consent Lifecycle in CM:

Kurteva et al. [12] revealed the management of the consent lifecycle, analyzing and summarizing findings from various research works that introduce consent management using ontology methods (Figure 18).

This group of research defines classes and attributes related to consent management. The consent lifecycle includes four main stages:

Consent manipulation: Managing consent.
Consent validation: Verifying that the consent is still valid and not revoked before processing personal data.
Consent notification: Informing data subjects about the purpose and details of the consent request.
Consent decision-making: The decision-making process of the data owners, whether to allow or deny the processing of their personal data.

Figure 18: The consent lifecycle within consent-based approaches [12].

The Competency Questions for CM extended from [12]

Figure 20 is an example; there are more, which consist of a set of questions from Kurteva et al. to check which aspects of the presented consent management system cover the requirements of the GDPR. It includes questions about consent, personal data, data subjects, data controllers, data processors, and third parties. Additionally, we have added questions 7, 8, 14, 16, 20, and 21 to further encompass the GDPR requirements. If you are interested in how to develop consent management in centralized and distributed systems, you can check out my two research papers and my thesis.

N. Peyrone and D. Wichadakul, "Formal models for consent-based privacy," in J. Logical and Algebraic Methods in Programming, vol. 128, p. 100789, 2022, doi: 10.1016/j.jlamp.2022.100789.
N. Peyrone and D. Wichadakul, "A formal model for blockchain-based consent management in data sharing," in J. Logical and Algebraic Methods in Programming, vol. 134, p. 100886, 2023, doi: 10.1016/j.jlamp.2023.100886.
N. Peyrone, "Formal models for consent management in healthcare software system development," Chulalongkorn University Theses and Dissertations (Chula ETD), no. 5803, 2022. [Online]. Available: https://digital.car.chula.ac.th/chulaetd/5803

Figure 19: Demonstrating an example of the competency questions for CM, mapping to relevant entity/process and GDPR articles.

Discussion

What personal data are you comfortable sharing with AI systems?
How should AI balance privacy and convenience?

Conclusion

While AI offers immense potential through personal data utilization, it raises significant ethical concerns. Balancing innovation with privacy, fairness, and security is crucial. Ethical frameworks and regulations are essential to ensure AI respects individual rights and societal values. As we advance technologically, maintaining ethical integrity in AI development is key to its positive and responsible integration into society.

References

[1] S. Milivojša, S. Ivanović, T. Erić, M. Antić, and N. Smiljković, "Implementation of voice control interface for smart home automation system," in Proc. 2017 IEEE 7th International Conference on Consumer Electronics — Berlin (ICCE-Berlin), Berlin, Germany, 2017, pp. 263–264, doi: 10.1109/ICCE-Berlin.2017.8210646. [2] N. H. Nguyen, H. Q. Nguyen, N. T. Nguyen, T. V. Nguyen, H. H. Pham, and T. N.-M. Nguyen, "Deployment and validation of an AI system for detecting abnormal chest radiographs in clinical settings," Front. Digit. Health, vol. 4, p. 890759, 2022, doi: 10.3389/fdgth.2022.890759. [3] L. Bourtoule et al., "Machine Unlearning," arXiv preprint arXiv:2002.09564, 2020. [4] A. Ginart, M. Y. Guan, G. Valiant, and J. Zou, "Making AI Forget You: Data Deletion in Machine Learning," arXiv preprint arXiv:1912.03817, 2019.[5] B. Bordelon, "Could Congress fix AI bias with privacy rules?" Morning Tech, POLITICO, Mar. 29, 2022. [Online]. Available: https://www.politico.com/newsletters/morning-tech/2022/03/29/could-congress-fix-ai-bias-with-privacy-rules-00021193. [Accessed: Jan. 23, 2024].[6] L. Johnston, "Recent Cases Highlight Growing Conflict Between AI and Data Privacy," Haynes and Boone, LLP, Apr. 20, 2020. [Online]. Available: https://www.haynesboone.com/news/publications/recent-cases-highlight-growing-conflict-between-ai-and-data-privacy. [Accessed: Jan. 23, 2024]. [7] R. Schmelzer, "Clashes between AI and data privacy affect model training," TechTarget, Sep. 23, 2019. [Online]. Available: https://www.techtarget.com/searchenterpriseai/feature/Clashes-between-AI-and-data-privacy-affect-model-training. [Accessed: Jan. 23, 2024]. [8] KPMG, "Privacy Concerns Rise as Businesses Report Increased Personal Data Collection," KPMG. [Online]. Available: https://info.kpmg.us/news-perspectives/technology-innovation/data-privacy-survey.html. [Accessed: Jan. 23, 2024]. [9] T. Grote and P. Berens, "On the ethics of algorithmic decision-making in healthcare," J. Med. Ethics, vol. 46, no. 3, pp. 205–211, Mar. 2020, doi: 10.1136/medethics-2019–105586. [10] V. Dignum, "The role and challenges of education for responsible AI," London Review of Education, vol. 19, no. 1, 2021, DOI: 10.14324/LRE.19.1.01. [11] C. G. B. de Oliveira and E. E. S. Ruiz, "Why Talking about ethics is not enough: a proposal for Fintech's AI ethics," arXiv preprint arXiv:2106.06134, 2021. [12] A. Kurteva, T. R. Chhetri, H. J. Pandit, and A. Fensel, "Consent through the lens of semantics: state of the art survey and best practices," Semantic Web — Interoperability, Usability, Applicability, IOS Press, in press.

#ethical-ai #gdpr #consent-management #data-privacy #data-protection

< Go to the original