Data Ethics, Big Data Ethics and AI Ethics

William Chan
November 24, 2023

Over the last decade or so, there has been considerable hype around terms such as ‘data’, ‘big data’ and ‘AI’. It would be hard for people to be considered serious about technology and business if they had not addressed these terms in certain ways. What comes with this hype is people’s passion for interrogating ethical problems in relation to these terms: data ethics, big data ethics and AI ethics. Most organisations and technological experts, for them to have reasonable level of credibility, must demonstrate their dedication to at least one of these.

But the boundaries between these three fields are often blurred. This is unsurprising, in a sense, because they are interconnected in many ways. However, an understanding of how they diverge and converge would put you in a better position to identify the right ethicists you need. In this article, therefore, I aim to clarify the following:

the distinction between data ethics and big data ethics; and
where data ethics and AI ethics intersect and come apart.

Data Ethics and Big Data Ethics

Data refers to ‘Information, in any form, on which computer programs operate’[1]. In my article, ‘Taking Data Ownership Seriously’, I made a distinction between the main types of data which you should handle with great care.[2] Data ethics, in Alex McKeown’s (IGS’s Head of Data Ethics) terms, ‘is a broad, umbrella term covering a range of ethical issues relating to different applications of data’[3]. Many organisations nowadays are particularly interested in the following ethical questions about data:

Moral Values Applicable to Data Practices: What should be the moral values (e.g. privacy, transparency, fairness, accountability, consent) that guide data practices, and how should we weigh the relative importance of these values in different circumstances?
Privacy: What does it take for individuals, organisations and the state to respect and protect the privacy of individuals, which, in most cases, are data subjects (i.e. individuals to whom personal data relates), in the processing of data?
Consent: What does it take for individuals, organisations and the state to process the data of individuals with their consent, which must at least be informed and voluntary? In which scenarios should informed and voluntary consent apply, and in what circumstances is it unnecessary?
Transparency: What does it take for data practices to be transparent? What initiatives should organisations take for their data practices to be visible, and to whom? What data practices can be legitimately untransparent?
Fairness: What biases in data processing are morally objectionable and should thus be avoided? How can we ensure that data is processed in a non-discriminatory way? How should the benefits and costs of data practices be allocated?
Accountability: Who should be morally responsible for (the outcomes of) data practices? What measures should there be to hold people committing irresponsible data practices to account?
Institutions: What institutions (e.g. law, regulatory bodies) should be established or sustained, by the state(s) or organisations, to maintain people’s compliance with the key ethical values regulating data practices?

People, however, also speak of ‘data ethics’ and ‘big data ethics’ almost interchangeably. But what is ‘big data’? Neil M. Richards (Professor of Law, Washington University in St. Louis) and Jonathan H. King (Fellow, The Cordell Institute) have a succinct summary of how it ought to be understood:

We are on the cusp of a “Big Data” Revolution. Increasingly large datasets are being mined for important predictions and often surprising insights…The scale of the Big Data Revolution is such that all kinds of human activities and decisions are beginning to be influenced by big data predictions, including dating, shopping, medicine, education, voting, law enforcement, terrorism prevention, and cybersecurity. This transformation is comparable to the Industrial Revolution in the ways our pre-big data society will be left radically changed.[4]

Big data, in other words, has at least two interpretations. First, it can be understood as an era in which mass data collection has become a major means by which people make predictions and create insights. In this sense, mass data collection is a feature of modern society. Second, it can refer to the practice of mass data collection itself.

Such a distinction is important as it points to two big questions troubling big data ethicists:

Macro: How should we arrange our social, political and economic institutions, given that mass data collection has become a crucial part of human productivity and tremendously shaped human interactions?
Micro: What practices of mass data collection are morally permissible?

These questions in big data ethics specifically, of course, are themselves questions in data ethics in general. Data ethicists investigate questions about institutions, for instance; they also ask questions about morally permissible data practices, which include practices of mass data collection.

But data ethics and big data ethics differ in at least two ways:

Data ethics itself does not make assumptions about mass data collection as a crucial feature of modern society. In other words, someone can legitimately be described as a data ethicist, without a deep understanding of the perpetuating effects of mass data collection on human society. For example, a data ethicist could be an expert at what makes data practices transparent, but such expertise need not be founded on some substantive knowledge about the social impact of mass data collection.

Big data ethics has more specific demands on the skills and knowledge one possesses. To be a reasonably qualified big data ethicist, someone must demonstrate a sufficient awareness of the key moral considerations that apply to the design of social, economic and political institutions in general. In short, they should be sensitive to wider questions about social, economic and political justice.instead of merely interrogating the specific moral values applicable to data practices.

Many data ethics experts can provide advice on ethical questions in big data at the micro-level, but not many of them can demonstrate sufficient expertise to resolve the macro-level question. Such expertise, in particular, is often accumulated through sustained training in humanities or social sciences. That is why, if your organisation has practices of mass data collection, you are advised to consult a team with people from these backgrounds. A key strength of the IGS data ethics team is that we have experts academically trained in politics and philosophy, who have the skills necessary for sound advice on big data ethics.

Data Ethics and AI Ethics

Similar to data ethics and big data ethics, AI ethics and data ethics are intertwined in various ways. To appreciate this, it is necessary to define what ‘AI’ is first. As Elizabeth Rough and Nikki Sutherland pointed out,

AI can broadly be thought of as technologies that enable computers to simulate elements of human intelligence, such as perception, learning and reasoning. To achieve this, AI systems rely upon large data sets from which they can decipher patterns and correlations, thereby enabling the system to ‘learn’ how to anticipate future events. It does this by upon/creating rules – algorithms – based on the dataset, which it can use to interpret new data.[5]

There is considerable controversy over the proper definition of AI, and I do not want this definition to dictate how we should understand it. I start with it for a simple reason: it coheres with the senses in which AI is normally understood, among the key AI ethical questions I mention below.

AI ethicists are broadly interested in questions about how AI should be designed and regulated, and how we should allocate the benefits and risks produced by AI. Meanwhile, data is the foundation of AI: AI learns from and makes decisions based on the datasets it is given. The key moral considerations applicable to responsible data practices, therefore, will have implications for how AI should be designed and regulated, and how we should distribute its risks and benefits. This is also one reason why moral concepts such as accountability, transparency, fairness and privacy have become a focal point of discussion in both AI ethics and data ethics. But this sometimes leads people to think that, as long as someone masters what is practically required by these values over data practices, they can legitimately claim to be AI ethicists.

The story is not that simple. Data ethicists will be able to engage with AI ethical questions to the extent that such questions are fundamentally about data processing. But some questions in AI ethics are less easily answerable by the typical data ethicists, who often focus on whether and how data practices depart from the key moral values underpinning responsible data processing (and in most cases, only the values highlighted by the key data protection documents, including the GDPR and DPA). For instance, a data ethicist might struggle to answer these questions about AI:

AI and the Existential Risks of Humans: The fast growth of AI has raised questions about the long-term influence of AI on the prospects of human society. There is, for example, the view that artificial general intelligence (AGI), which refers to a type of AI that can understand, learn and integrate knowledge across various tasks, at a level that is on a par with if not surpasses that of human beings, will bring humanity to an end. Whether this view makes sense is a problem that has caught the attention of many AI ethicists. But this question is outside the key concerns of most data ethicists in a commercial setting today.

Regulations of AI Uses and Development: While data is among the principal resources for AI to work, knowledge of data ethics alone is inadequate to help us address comprehensively how AI should be used and developed. For example, the existential risks produced by AI would have implications for how we should regulate AI, but whether these alleged existential risks are well-grounded is not a primary question for most data ethics specialists. Some AI ethicists are also interested in the personhood of human-like AI, and what legal and moral status we should confer to it. But an understanding of data ethics alone would not enable one to handle this question adequately.

My claim is not that data ethicists cannot answer these questions about AI. In our field, there are some extremely competent people who are genuinely specialised at both AI and data ethics, although there are many people who simply pretend so for their narrow commercial interests. Specialising at both AI and data ethics should also be an aspiration for us, as data ethicists who have an interest in witnessing a better human future created by technological improvements, and in providing our customers with insights into the interactions between AI and data. Yet we ought not to run aspirations and our actual competence together.

Conclusion

We have to admit that, although data ethicists could have a lot to offer to AI ethics, AI ethics is concerned with a broader set of questions, some of which could be beyond our capacity to answer satisfactorily. As responsible consultants we should be honest about our exact expertise, such that our services are in line with what our customers expect. Unfortunately, we are in an era where one must at least claim to be aware of ethical issues in AI and data to look professional and maximise commercial gains. In this climate, it has become increasingly difficult for customers to locate the right experts for the ethical challenges facing them.

Here in IGS, however, we are fully aware of what we can reasonably achieve. Since we have academically trained data ethics experts at law, politics and philosophy, we are highly confident of addressing all your (big) data ethical challenges. We will also be able to address your questions about AI ethics, insofar as they involve data processing. But we are also realistic and responsible: we will not pretend that we can handle questions about how AI shapes the future of humanity, for instance. Our priority is always to ensure that our services would be the right investment for you.

[1] Andrew Butterfield, Gerard Ekembe Ngondi, and Anne Kerr, “Data,” in A dictionary of computer science (Oxford University Press, 2016).

[2] William Chan, ‘Taking data ownership seriously’, Information Governance Services. https://www.informationgovernanceservices.com/taking-data-ownership-seriously/

[3] Alex McKeown,

[4] Neil M. King Richards, Jonathan H., “Big Data Ethics,” Wake Forest L. Rev. 49 (2014).

[5] Elizabeth Rough and Nikki Sutherland, Debate on Artificial Intelligence (House of Commons Library, 2023), 2.