Do Princess Leia and Sherlock Holmes Exist? AKA Why Philosophy Matters for Good Data Governance.

Introduction

On a couple of occasions I’ve written articles which, among other things, try to show how and why skills in philosophical analysis, far from being irrelevant, abstract, or untethered from real-world concerns, are indispensable and irreplaceable for many kinds of practical applications involving data.

I’m going to continue that theme in this article, although using a slightly different approach. Those previous articles either: tried to give an exposition of the benefits and usefulness of philosophical reasoning and ethical theory; or used a case study, considered in depth, to demonstrate how those expertise are valuable. However, in this article, I’m going to present four data-related philosophical challenges, which might at first sight seem purely theoretical, and therefore of potentially limited practical relevance, and go on to show how those challenges in turn pose salient challenges for robust data governance.

Each of the philosophical challenges pose risks for data governance, and the development of effective solutions depends on having first identified the risks. So, with that in mind, I’m going to show how and why that the ability to think philosophically is a vitally important practical skill in any field involving data use.

So, here are the four challenges and the risks that they pose.

Ontological Uncertainty

‘Ontology’ is just the formal or technical name for the study of being: What does and does not exist? What does it mean for something to exist? How do we classify things that exist from things that don’t? And so on.

This might, at first, sound like an absurd list of questions to spend one’s time thinking about, since the answers seem so obvious. However, by way of a brief digression, we’ll see that the situation is more ambiguous than it appears.

For instance, we might ask whether, for example, Princess Leia or Sherlock Holmes exists; or whether numbers exist. Of course, neither Princess Leia nor Sherlock Holmes physically exists, but they exist as idea realised in fiction, and ideas definitely exist. Likewise, we use numbers all the time and we seem to be answerable to them in some way: they can be employed to represent features of physical reality, to predict real-world outcomes, and so on. But we have never encountered a number’ in the world any more than we’ve ever encountered Princess Leia or Sherlock Holmes. We certainly encounter objects, things, people, and so on in the quantities that numbers represent – three chairs, for example – but we’ve never come across the numbers themselves. So, perhaps numbers too are ‘just’ ideas. And if they’re just ideas, perhaps they exist only in the same way that Princess Leia or Sherlock Holmes exist. But this is counter-intuitive, since we tend to think, pre-reflectively, that Princess Leia or Sherlock Holmes obviously don’t exist, but numbers obviously do.

So, what is the relevance of this digression? Well, it’s to show early on that apparently obvious answers to apparently easy questions might be more complex than they seem to be, and so might require more scrutiny than we had thought. So, having got that in mind, let’s think specifically about data.

Does data exist? Yes, certainly it does. But it doesn’t exist in a single unified form; after all, there are many kinds of data.

For example, raw data differs from processed data, which differs from metadata and it’s not always clear what qualifies ‘as data’ at different stages in the data lifecycle. Or, consider whether a digital data record is fundamentally the same as its analogue counterpart. On one hand, for sure, if both contain the same information, the data are identical, aren’t they? On the other, a digital data record stored in a computer is ontologically different from an analogue data record written on a card and stored in a filing cabinet. So, figuring out exactly how the data is the same or different requires some careful unpicking.

And we can think of another of the examples. Metadata is data about data, which is to say, it describes the information that the original data conveys; so does it have the same ontological status as the data it describes, or not? On one hand, if the same information is conveyed by the original data and the metadata to which it refers, is there any fundamental difference between them, insofar as data just is information? On the other hand, metadata cannot be fundamentally the same as the original data which it describes; after all, a description of a fire engine is not identical to a fire engine, is it? Here, again, once we give it some thought, teasing apart exactly how and why two kinds of data are the same or different is not straightforward.

This is all intellectually interesting, as far as it goes, but it’s still not completely clear why it’s practically relevant to the practice of good data governance. I said at the start that these philosophical conundrums present real-world governance challenges, so let’s think about what these might be in this first case.

Governance Implications

Clarity is a key determinant of the effectiveness of data governance policies, procedures, and processes. If these features of governance are unclear, ambiguous, under-specified, and so on, it will be difficult to implement them with sufficient confidence that risks have been mitigated successfully. However, as we saw above, there might be respects in which aspects of what must be governed – namely, data – are unclear, ambiguous, and under-specified. And this, at least nominally, poses a risk for the effectiveness of data governance.

For instance, we saw above in the metadata case that there might be uncertainty about what does and doesn’t constitute data. But if what constitutes data is uncertain, this presents a difficulty for setting clear boundaries in data governance policies. In that metadata case, should regulations treat metadata in the same way as the original data which it describes? And on what basis is the answer to this question justified, given that there seem to be grounds both for saying metadata is and is not the same as the original data that it describes?

Clearly, the unanswered questions that are revealed when one applies sufficient critical scrutiny could realistically pose problems for the reliability of data governance structures. Ambiguities such as those sketched out here could, without attention and further specification about what is and is not meant by terms such as data or metadata, could lead to inconsistencies or loopholes in regulatory frameworks. And this in turn would complicate and weaken relevant privacy and security measures if challenged, or trying to attribute responsibility and accountability for harm caused by a data breach.

Epistemological Relativity

The second challenge also involves a technical term: epistemological. Again, though, what it represents is quite easy to understand. Epistemology is the study of knowledge – what knowledge is and isn’t, what counts as good grounds for making such a distinction, what forms of knowledge there are and for what purposes they are and aren’t reliable, how we acquire knowledge, and so on. But how does this relate to data?

Well, first, we often use data as the basis for acquiring knowledge. We might say that we know something if we have seen the data which provides the evidence necessary for it, and if someone tells us that they know something, we might ask them – depending on what the knowledge claim relates to – to show us the data that can back up the claim. Second, data, and in particular quantitative rather than qualitative data, is often thought of as being objective, neutral, factual, not influenced by personal bias, interpretation, or the limitations of a subjective viewpoint. None of this seems to be especially problematic, so where does the challenge lie with respect to knowledge?

Although data might often be thought of as being objective, neutral, factual, unbiased and so on, the situation is more complex than this. For instance, how data is generated, interpreted, and used, along with other choices that need to be made in the collection and production of data, can depend on subjective perspectives, on biases, and contextual pressures such as cost-effectiveness, and so on. As such, data, despite having the appearance of objectivity by the time it reaches an audience, has necessarily been filtered through a sequence of human choices and judgements before it gets to that point.

What is more, even when data finds an audience, the appearance of objectivity has no bearing on the reality that different stakeholders might view the same set of data differently, draw different conclusions, or use it to support different positions that might be consistent with the data, depending on how it is presented.

Therefore, despite the apparent objectivity of data which we might tend to unreflectively assume, in fact we should not regard knowledge that derived from the data as necessarily genuinely objective. This is both because of our own biases and interpretive tendencies, and because of the subjective human choices made in the production and presentation of the data. So, with that in mind, what implications for data governance might there be?

Implications for Governance

The under-recognised epistemological subjectivity of data can pose several problems for ensuring good data governance in practice, which it is essential to recognise and mitigate in an organisation’s governance processes.

One problem is that if interpretation of data varies depending on the observer’s perspective or contextual outlook, then the ability to develop universally applicable governance models is compromised.

Let’s think of this by way of analogy: two people are presented with statistics about the length of prison sentences handed down over the past five years; person A thinks, for their own reasons, that prison sentences should, in general, be tougher and longer; person B thinks, for their own reasons, that prison sentences are already too long and should, in general, be shortened in the interest of a more lenient sentencing policy. All the data shows is the actual lengths of the sentences. So what set of beliefs should drive sentencing policy, given that there is disagreement between two members of society who have different beliefs about what is appropriate? How are we to know that whatever policy is developed on the basis of the data is the right one, ethically speaking?

Likewise, data about, for example, the efficiency of an organisational data pipeline might look to manager C to be overly cautious, such that the pipeline could be made still more efficient still by making monitoring less stringent. By contrast, manager D thinks that the risk of breaches in the pipeline is still insufficiently mitigated, such that oversight should be made more stringent, with a resulting decrease in efficiency being a price worth paying longer term in the interests of maintaining a reputation for high standards and trustworthiness. There just is no clear way to adjudicate whether C or D is correct by reference to the data alone. Sure, the data might be reasonably objective; but what the data means, in terms of what we infer should be done is a matter of interpretation, driven by subjectively held values.

I probably don’t need to push this point much further, so I’ll give one more example, but I’ll treat it more briefly, since the important point here is probably evident by now.

Another problem is that any governance system which assumes data is purely objective might overlook biases or misinterpretations built into the data collection, analysis, and production process, via the choices made in the designing of the process. And this, in turn, can lead to incomplete or flawed policies. The challenge that arises from this is how the conditions for achieving vital goals constitutive of good – for which I mean not only legal but ethical – data governance, such as fairness, transparency, lack of bias, accountability, and so on, are to be made an integral structural feature of data governance processes and practices.

In the first example here, we saw some epistemological limitations of even reasonably objective data when it comes to making decisions that must be based thereon, given that values and opinions are likely to differ among those who have to make decisions. In this second example, we see risks that follow from a lack of appreciation of threats to objectivity that might be built into the process by those who designed the process, through choices driven by their subjective values. In the second example, then, the governance risk follows from insufficient attention being paid to the potential epistemological uncertainty of data being in fact as objective as it might appear.

Context-Dependence of Data

This next challenge is connected to some of what was touched on in the previous one, insofar as it focuses on potential differences of interpretation that individuals might make when presented with data about whatever the matter is that requires attention. However, it’s worth underlining the risk of context-dependence in its own right, since the governance challenge that this poses is especially pertinent in the contemporary context of ubiquitous data sharing and linkage, in ways that one might not necessarily always be aware of.

The meaning and significance of data are highly context-dependent. Data that seems trivial in one context might be highly sensitive in another. For instance, a seemingly harmless – and, it’s important to point out, objectively true in this case – GPS location point could become sensitive when combined with other data that reveals patterns of behaviour or private habits that an individual would prefer were not available and could not be known. So, even though individual data points might represent objectively verifiable, and on their own, benign, facts about an individual’s activities, the inferences that might be drawn from this information will expand and vary if and when it’s linked to other data points, via sharing it with the organisations who collected these other data points. And this has potentially quite serious implications for good data governance.

Implications for Governance

Where I refer to ‘good’ governance in the paragraph above, I mean it in the sense of governance carried out something like ‘in accordance with protecting the privacy of individuals whose data is collected and respecting their preferences about the level of monitoring that they are prepared to accept as they go about their lives’. After all, we are well aware that privacy and autonomy are important values that must be protected in data governance, and the ethical significance of this is amplified in line with the potential for data linkage and the expansion of the picture about an individual that this creates.

So, contextuality makes it practically difficult to classify data as sensitive or non-sensitive in any sort of general manner that can be easily met in similarly general governance policies. For example, if location data shows that an individual goes to a pub, is this objective fact to be considered a sensitive matter, or not? Either interpretation is possible and will follow from what else could be known in addition to that data point. For instance, if this person very rarely visits pubs, it is probably not especially sensitive. However, if this individual has been convicted of a crime such as drink-driving or assault while under the influence or alcohol, or if they have recently received a liver transplant due to damage caused by alcoholism, then this single data point could well be extremely sensitive.

Identifying any particular threshold at which linkage of data points converts what is known about an individual from non-sensitive to sensitive is not likely to be possible. The possibilities vary so greatly that it is difficult to specify exhaustively in governance processes what should and should not be considered sensitive, and for what reasons. As such, governance models must, somehow, be designed in such a way that they can take account of the context in which data is collected, processed, and shared.

But, of course, such flexibility can easily introduce significant complexity and create its own ambiguities, if a situation occurs which is not obviously covered in the model. Finally, here, this challenge is particularly practically relevant to the application of privacy regulations such as GDPR, which require data processors to assess the context of data use when designing and monitoring data governance structures and processes.

Ownership and Control Ambiguities

Philosophical questions surrounding ownership and control of data are complex. To show why, we can start with a straightforward question, the answer to which is anything but: who really ‘owns’ data — the individual to whom it pertains, the organisation that collects it, or a third party that processes it? On what grounds can the answers we give be substantiated and backed up? Moreover, in more complex and diffuse data systems, how do we define control over data in decentralised networks such as blockchain, where data is distributed across multiple nodes?

It is fair to note here that, in the UK and EU, no individual is legally designated as ‘owner’ of data about them. Rather, certain control rights, including the right to revoke their data and have data held about them deleted, are awarded to individuals to strengthen the protections to which they are entitled. However, in other jurisdictions, there are some legal conceptions of data ownership, based on conditions stipulated that must be satisfied to be designated as ‘owner’. Nevertheless, however these are constructed, they will be contestable, since ownership is a concept which admits of more than one interpretation or definition. And since any account of ownership will be contestable and open to challenge, it is not a concept that can be completely free of ambiguity. This, for reasons with which we are probably familiar by this stage, poses practical challenges for adequately robust data governance.

Implications for Governance

The straightforward challenge here is that any disagreements over the meaning of data ownership and control can make it difficult to establish clear legal and ethical guidelines for data use. If ownership is unclear, then so are the rights and responsibilities that are associated with data protection, data breaches, consent to certain kinds of use, and so on. And this lack of clarity can, in turn, create gaps in accountability, particularly when data is shared across jurisdictional borders, or between multiple stakeholders in a supply chain.

This final point is especially pertinent given that, as I pointed out above, personal ‘ownership’ of data is not a concept with any legal force in the UK, whereas it is elsewhere. Therefore, any professional business collaboration which must take place across jurisdictions, with different conceptions of what can and should be meant by data ownership, must be governed in a way that takes account of this fundamental difference in conceptual understanding and legal status.

Having looked at four specific challenges for good data governance that arise from philosophical questions about data, what it is and what it means, we can derive some general implications by way of a summary. Needless to say, the four examples I’ve walked us through here do not cover or exhaust the range of potential challenges that might arise from deep ambiguities in understanding data. However, I hope that they give a sense of the depth to which these ambiguities might go, and therefore, how formidable any resulting governance challenges might be.

Summary: Overall Implications for Data Governance

Regulatory Complexity

The four philosophical challenges that I’ve laid out here complicate the creation of one-size-fits-all data regulations and policies, which necessitates the creation of more granular and adaptable governance models.

Ethical Dilemmas

The numerous ways that data can be interpreted and used raises ethical concerns about fairness, bias, and privacy, which governance models must address with a dynamism commensurate to the challenge.

Accountability

Ensuring accountability in the face of ontological doubt about what data is and is not, and given ambiguities about ownership, is a serious challenge, especially when multiple stakeholders are involved in data use and processing.

Transparency and Consent

Effective data governance relies on – among other things – transparency. However, the dynamic and context-dependent nature of data complicates efforts to clearly inform users about how their data will be used, which in turn raises issues around the meaningfulness and validity of consent for data use.

Technological Change

As data practices evolve – for instance, through AI and machine learning, which I’ve not mentioned here but which we have in previous articles – governance frameworks must adapt. However, they must do this in a way that aligns with and follows from the consideration of complex philosophical questions about the nature of data. These question, while seeming abstract at first, are in fact of significant practical relevance. Engaging in deliberation to sufficient depth and ensuring that governance evolves appropriately as a function of this can be a slow and uncertain process.

Conclusion

There are two conclusions to which I want to draw your attention in making these final remarks.

The first conclusion is somewhat obvious. It’s clear from the examples we’ve seen, and others that could be presented with more space, that challenges of the kind outlined highlight the need for a flexible, ethically grounded, context-sensitive approach to data governance, capable of adapting to evolving technologies and societal norms.

Although this is of paramount importance, it’s also an unremarkable conclusion, given that this kind of approach is what we would expect of any organisation that is trying to ensure that it governs data – which could include our data – in a responsible way. Helping organisations to do this is our specialty at IGS, so if your organisation needs support in responsible and ethically sound data governance, then get in touch with us for the help that you need.

The second conclusion is -or at least might have been before reading this article – less obvious but is equally important with respect to ensuring appropriately high ethical standards of data governance. And this is that many questions about data which might seem purely theoretical philosophical concerns and therefore irrelevant in any kind of practical way, in fact are of very significant practical relevance. And for this reason, it would be a misapprehension to think that skills in philosophical analysis are unnecessary or not useful for practical purposes in data governance.

As the examples shown here indicate, specialised training in thinking through seemingly abstract questions such as ‘what is data, and on what grounds can we be sure of our answer?’ is of eminent practical relevance. And it’s relevant precisely because, if policies are implemented which, it turns out, require us to ask, for example, whether or not data and meta data are the same, then we need the tools to arrive at an answer. So, if we haven’t given close logical attention to these conceptual issues, but it then becomes clear that it would be helpful for us to be clear about what they do and don’t mean and why, then, then we will encounter otherwise avoidable governance challenges which we are then not properly equipped to resolve.

I won’t labour the point here by supplying further examples. IGS’ approach to consultancy support in data governance encapsulates not only legal compliance, but literacy about practical ethical demands as well. One essential tool in being able to deliver the latter to the required standard – we think – is formal training in philosophical and ethical theory and analysis. This is, to be sure, not the only essential tool, as it’s vital to have the skills for applying that theory and analysis to resolve practical challenges. Nevertheless, without this tool, there is a risk that advice given about ambiguities that arise in data governance dilemmas becomes impoverished, and therefore incomplete and unreliable.

At IGS we offer a comprehensive consultancy service that not only ensures your organisation can be legally compliant, but ethically responsible too. So, as above, for optimally comprehensive and rigorous advice and support across all aspects of data governance, contact IGS.

Share:

More Posts

Send Us A Message