During the 22nd and 23rd of November 2019 I attended the 2nd Transatlantic Conference on Data & Ethics.1
The main goal of the conference was to bring ethicists, computer scientists and economists together to talk about the moral problems arising from the recent proliferation of data driven technologies and research methods. Topics ranged from big data and privacy to value-based engineering. Rather than summarizing the different contributions at the conference, I want to use this blog post to briefly reflect on what we mean when we speak about data and the connection between data and ethics. Although the term data is ever-present in research and everyday life, definitions are often vague. It is not always necessary to give a precise definition of the words we use but sometimes the conflicting use of a word can lead to problems in communication.
What do we mean when we speak about data? The philosopher of information, Luciano Floridi distinguishes three different meanings of the term.2 The first definition relates data to evidence, while the second notion relates it to computing machines in general. The third definition explains the meaning of data in terms of information. Data in this sense describes everything that provides information about a specific subject. Here I will only discuss the first two definitions, which in my experience are the ones most often used in research.
The first definition of data I want to introduce identifies data with a kind of information that functions as evidence. Evidence includes everything that can be used to confirm or disconfirm a hypothesis. In this sense the X-ray pictures count as evidence for a broken arm, the blood under my fingernails is evidence for my fight with John and the clock I see on my computer display is evidence for the current time. When scientists colloquially talk about data, they are often using the term with this meaning in mind.
Sabina Leonelli gives a very similar definition. For her data includes,
“[…] any product of research activities, ranging from artifacts such as photographs to symbols such as letters or numbers, that is collected, stored, and disseminated in order to be used as evidence for knowledge claims.”3
A different, although not incompatible, definition is sometimes used in computer science. This definition relates data to the basic function of computers. According to this understanding of computers, computers are devices that manipulate data. In one introductory textbook the following definition is given,
“A digital computer is a sequential device that generally operates on data one step at a time. The data are represented in binary format, and a single transistor is used to represent a binary digit in a digital computer.”4
A popular model of the architecture of a computer, the Von Neumann architecture, defines digital computers in exactly this way. 5 Everything, that goes through the computer and is manipulated or processed, is data. This definition of data is not necessarily in opposition to the data evidence concept mentioned above. Of course, the output of the computational machine can be used as evidence, but not every piece of information within a computer is necessarily used as evidence.
Following the two definitions above we can also define big data along either from the evidence or the computing perspectives. If the expression big data is related to evidence, what is potentially meant is that we study large-scale phenomena or that the available data covers a lot of evidence about a subject matter. Big data from the computer science perspective is concerned with quantities of data that cannot be worked with traditional methods. This definition again focuses on the capabilities of the computing machine.
How do we get from data to ethics? Ethics is the philosophical area concerned with what we should or should not do, what is good or bad, and the values that ground our moral judgement. With this in mind, it is easy to see that the recent increase of available data and data analysis leads to morally significant problems. Where is the data we use coming from? Are the privacy rights of people affected? What are we allowed to do with it? What are the consequences for individuals and our society?
Take one recent example that exemplifies the ethically problematic use of data and made its way into the headlines. In 2016 the company Cambridge Analytica collected the data of millions of Facebook users. This data, in turn, was used to influence political elections. An analysis like, with data related to approximately 87 million people was not possible a few decades ago.6 The prospect to use the new possibilities may excite social scientists or commercial companies, but it also raises significant moral issues.
Those issues are made even more problematic by the fact that we are uncertain about what can and cannot be done with data like this. Here, as it is often the case when it comes to digital technologies, commercial, political and scientific interest intersect and we are faced with difficult choices. Ethical problems related to the use of data can arise when it comes to the reliability of the way the data is used and on the other hand from the question if the way the data is used is permissible in the first place. The problem of the reliability of data technologies has often been underestimated. A careless handling of available data invites different kinds of biases and can lead to the establishment of spurious correlations.
In a recent Ted-talk Joy Buolamwini nicely illustrated the biases that can affect our software via the route of biased data.
The buzz around big data and data analysis raises important ethical questions. The two definitions of data above were intended to illustrate the different ways in which the expression data can be used. Researchers exploiting the new methods to generate and study data have to be aware that this is not business as usual. The thoughtless use of big data can lead to serious ethical harm. Ethics itself is faced with the challenge to provide suitable answers to those question. This does not only require a solid understanding of traditional ethical theories but also calls for an active engagement with new technologies, a practice that is still absent in many discussion surrounding this topic.
- Floridi, Luciano , ‘Data’, in William A. Darity (ed.), The International Encyclopedia of the Social Sciences, 2nd edn, Detroit: Macmillan 2008. p. 234.
- Leonelli, Sabina. Data-Centric Biology: A Philosophical Study. Chicago ; London: The University of Chicago Press, 2016. p. 77.
- O’Regan, Gerard. Introduction to the History of Computing. Undergraduate Topics in Computer Science. Cham: Springer International Publishing, 2016. p. 2.