Ethics and the presumption of data reuse

Alex McKeown
September 7, 2023

Data (platforms) and widening the presumption of data reuse

The rationale underlying big data-driven healthcare, research, and commerce is that linkage and integration of datasets pertaining to a wide variety of personal indices will be able to provide an increasingly granular understanding of how biological, environmental, and social factors interact to produce correlations that are useful for making predictions about individual behaviour, choices, health states, disease risk, and so on. The approach typically employs algorithmic prediction enabled by advances in machine learning[1], which has the capacity to generate new knowledge more quickly than traditional scientific approaches and can reduce human bias in the collection and analysis of data.

Where effective, it has been claimed that the predictive capability of these innovations can be equal to conventional statistical and epidemiological means[2]. It has also been claimed that, given the current technological trajectory and anticipated advances in these technologies, machine learning methods may enhance and eventually possibly exceed, human analytic capacities[3]. Crucially, these techniques do not necessarily always reduce bias, as they can sometime replicate human biases embedded in the data deriving from the criteria according to which the data were collected. This is an ethical risk of which you and your organisation should be aware if it engages in this kind of integrative big data analysis.

Assuming this risk has been assessed and managed, however, then if the integration of diverse datasets does enable development of more accurate risk indicators; prognostic factors; better treatments and interventions; more profitable retail algorithms or any other commercial or business application relevant to an organisation, this obviates the need for the sharing, linkage, and reuse of data. A platform-based approach is an appropriate model for facilitating this.

Data platforms represent a new paradigm for analysis and research. They are increasingly common in healthcare and medicine[4], but they are widely used in a range of sectors beyond this too[5], given the value and novelty of the insights that they may be able to produce. Platform-based approaches thus require new thinking about consent. You can find an overview of some alternative models of consent developed for the big data analytics context here.

In the platform model, datasets are pooled for remote access and analysis, such that by their integration, more granular insights can be achieved for tailoring the service that is offered to specific individuals[6]. This might be, for example, developing more effective, personalised treatment regimens to treat disease; or gaining a more details understanding of someone’s retail preferences.

This purported eventual capacity is a central plank of the rationale for machine learning techniques being applied to big data both generally and in the specific context of data platforms. The increasing proliferation of these techniques is contributing to the establishment of a new data infrastructure. If it is true that they can reveal novel, useful associations, then machine learning can underwrite the justification for reuse, as it keeps open the possibility that an individual’s data may have value in future in ways that could otherwise not have been predicted.

In these instances, if your organisation wishes to maximise the value of the data it analyses, you may wish to encourage those whose data you propose to collect to consent to their data being used for any and all unspecified further purposes in future. This is to say, you may wish them to agree to a presumption of reuse, without them stipulating that you must recontact them to seek permission for future uses. However, if you do wish to do this, you and your organisation should think about the ethical implications of such a proposal, if you want to ensure ethical practice with respect to the data that you hold and the purposes to which you wish to put it. How these matters are to be balanced may differ, depending on what it is that your organisation does.

Different contexts for data use and reuse

In one respect the case for a presumption of reuse is easier to defend in some contexts than it is in others. It is more straightforward to justify such a presumption in, for example, health care and research than it is in a commercial context such as retail. We do not mean by this that a presumption of data reuse in commerce is ethically unjustifiable; just that in the health case there is a clear societal imperative to optimise the potential health gains from research and individuals may themselves derive health benefits from their data being used in research. Indeed, it is in view of this imperative that the NHS has already ‘committed to the principle of collect once, use many times’[7].

Crucially, a commitment to the principle of reuse follows from the recognition that imposing too many restrictions on the sharing of data for reuse hinders the data flows needed for the delivery of appropriate care in a modern healthcare system. So, if this is plausible and we think that, wherever possible, it is a good thing to try and optimise the potential population health gains using the technological means available, then there are grounds for arguing for the legitimacy of a presumption of the wider reuse of data than currently permitted by standard models of consent.

The second step in the argument here is also related to the strength of the reasons why an organisation might wish to reuse personal data without seeking consent for doing so. The likelihood of individuals being comfortable with the idea that they should presume the reuse of their data depends on a robust case being made in favour of it which shows that asking for such a presumption is reasonable. Such a case is grounded in three salient circumstantial considerations.

The first two of these considerations are the conditions outlined earlier relevant to data platforms, namely that: (1) optimising the potential benefit available from data requires its sharing and reuse; and (2) the reach of machine learning to uncover otherwise unpredictable associations across and uses for personal data requires a reassessment of consent. The third consideration is (3) that institutions and organisations holding personal data play a fundamental role in establishing reasonable expectations for the reuse of that data.

The third of these considerations deserves examination. We have so far focused mainly on the justification for a presumption of the reuse of data for health research. However, as indicated earlier, there are other professional contexts, industries, and applications whose organisational aims may be best served by unconsented reuse of personal data. Some of these, particularly where profit, rather than benefits to health, are the primary goal may be met with more suspicion.

For instance, there has been visible public distrust of how companies[8] such as social media platforms mishandle personal data. In view of legitimate public concerns about instances where secondary use of data – for whatever purpose – has been inadequately governed[9], it is also vital for the public legitimacy of a wider presumption of data reuse that the institutions handling the data can be relied on to devise their processes responsibly, such that trust in these institutions is not undermined. In the context of health care and research this is so because, given that the aim of this is to produce public benefit, so researchers and institutions are answerable to society[10]. In commercial contexts it is also the case, because companies and other organisations depend on those who wish to use their services being willing to disclose personal information to them.

General public agreement to a greater presumption of the reuse of data, whether for healthcare purposes or otherwise, but increasingly commonly in data platforms, would require at least, in the first instance, putting the arguments in favour of it to those people who are uncomfortable with its implications. Also, it is important to be aware: first, that some proportion of these people will remain unpersuaded and continue to object; and second, that these efforts are time consuming and move at a slower pace than the data technologies for which we are trying to find the ethical governance solutions.

‘Reasonable’ expectations

All of the factors laid out so far underline the moral imperative of ensuring, whatever your organisation wishes to do with respect to data reuse, that the proposals you make to people whose data you hold or wish to hold about how you wish to use their data are reasonable for them to expect[11]. In this section we will think in some more detail about what this might mean.

A model of data reuse justified on the basis that it would be ‘reasonable’ to expect it to occur, depends on showing that this evolution, in both ethics and law, aligns with the multilateral nature of contemporary data-driven healthcare and commerce. Given the inadequacy of the standard model of consent in this context, the law attempts to ground ‘reasonableness’ in terms other than the subjective mindset and preferences of a particular individual. Regardless of whether such an individual happened to expect the data sharing, the question is whether a ‘reasonable’ person with ordinary sensibilities in their position would expect their identifiable information to be shared. This characterisation of ‘reasonableness’ creates a more stable, objective basis for sharing information.

Of course, what is at stake here for individuals is a potential breach of privacy. Although privacy is a broad concept, and difficult to define categorically[12], it is generally associated with norms of exclusivity or control[13]. In other words, information is deemed ‘private’ if a reasonable, ordinary person would consider it to be so. This is challenging, though, because the norms that underpin these ‘reasonable’ expectations are not static and are liable to change in line with societal shifts and developments in public discourse.

Also, and of crucial ethical significance, it does not follow from reuse being a ‘reasonable expectation’ in the sense that one would not be surprised to find out that it happened, that the proposal of a presumption of reuse is reasonable in the sense of it being what ought to be the case. For example, just because there might be a ‘reasonable expectation’ in some countries that if I am convicted of a murder, I am likely to be put to death by the State, the expectation is agnostic about whether or not capital punishment itself is reasonable in the sense of being morally justifiable.

In this respect, there is a distinction that must be made explicit between reasonableness in the descriptive or statistical sense (what people actually take to be reasonable), and in the evaluative or normative sense (what is justifiably or ought to be taken to be reasonable), to avoid confusion between the two ways in which the term may be understood[14].

Of course, while it is vital that you are aware of this distinction, in terms of ensuring the ethical integrity of your organisation’s data use processes, it does not follow that it will neutralise any and all possible objections from those people whose data you wish to use. The nature of ethics is that it is always contestable; as such, it is imperative that your organisation thinks carefully about its processes and the reasons it has for implementing the ones that it does, so that it can provide the strongest possible justification for them if or when challenged. Given the persistence of moral disagreement, it is worth saying something about it briefly here.

Unavoidable moral disagreement and how to respond to it

Disagreement about ethics can never be fully extinguished. There is no way to prove, in the way that is more readily available in the sciences, that a statement about the right course of action is true. Unlike the sciences, competing moral positions have competing standards of justification, so they do not share a method by which disagreements might be settled more finally[15]. As such, even universal consensus on a given view would be insufficient for providing proof that it is ethical. In the absence of something more akin to empirical confirmation available in the sciences, even majority support for a particular view does not provide evidence that the reasons for whatever is being proposed are ethically sound. Here again we could think about the capital punishment case: it would not follow from capital punishment being approved of by the majority of a society that capital punishment is in fact ethically permissible.

What does this persistent challenge mean in the context of a presumption for the reuse of data? Well, we could just concede that progress towards a new norm of data handling, for research and other purposes, in which presumptions of the non-disclosure of data are relaxed might be slow, incremental, and imperfect: it just is a fact about trust that it takes time, effort, and resources to build[16], particularly in the area of educating the public about how this kind of new platform-led data science is done and why it is important or valuable in a range of sectors. Importantly, the pace at which the public gains trust in this new norm may differ between sectors. Irrespective of the sector, however, it is probable that the more visible the benefit for those whose data is used, the greater the likely acceptability.

Given the analysis laid out here, what conclusions could we draw that are relevant to ensuring ethical practice in the reuse of data and the presumption among the public that may need to accompany it? The analysis here entails the conclusion that encouraging a presumption for the unconsented reuse of data is ethically justifiable if the goals of the data use are sufficiently valuable at both the individual and societal level, assuming that the institutions handling the data are trustworthy and can be relied on not to abuse the trust placed in them by the public.

A technical objection to this conclusion, though, is that the claim it makes is question-begging. This means, in other words, that it is precisely the trustworthiness of institutions and organisations holding personal data that cannot be assumed; it is the responsibility of those institutions and organisations to show that they are trustworthy. As such, even if trustworthiness cannot be assumed and must be demonstrated, this can be done using rigorous governance, the application of appropriate regulations, and the kind of careful balancing of ethical considerations demonstrated here, for instance with regard to the implications of legal terms such as ‘reasonable expectations’. Finally, the introduction of updated legislative standards such as the General Data Protection Regulation[17] to hold institutions to account if they misuse data may help to strengthen their relationships with the public.

Summary: institutional and organisational trustworthiness is an ethical imperative

In terms of what you and your organisation should consider with respect to ensuring ethical conduct in proposals to reuse data, the most important point to make by way of summary, is that the focus should be on trustworthiness, rather than trust. Being trustworthy is within the power of institutions and their governance in a way that being trusted is not: we and you should attempt to ensure that institutions and organisations holding and using data operate and are governed in a trustworthy way and this will allow those individuals whose data are held and to be used, to have firm grounds on which to trust.

In the contemporary context, it is increasingly likely that data will be linked for reuse according to a platform model. Since the big data analytic approach derives part of its value from being able to make novel predictions and insights based on the integration of diverse datasets, so a platform model – in other words, a model specifically designed for the pooling and integration of different datasets – is well-suited to a wide range of contemporary applications in data analysis across healthcare, commerce, and other sectors.

Given that the platform model necessarily involves data linkage to identify hitherto unpredictable future uses, so if your organisation wishes to encourage a wider presumption of the reuse of data among those people whose data you hold or wish to hold, you are ethically obliged to consider the ethical dimensions of doing so. The more robust, thorough, and clearly articulated your justification is for doing so, the more trustworthy your organisation will be for those people of whose data you are or wish to be a custodian.

At IGS we have the theoretical knowledge and practical experience needed for helping you and your organisation to think clearly about the ethical implications of your data handling practices. In the contemporary context of big data analytics, where it may be in your organisation’s interests to encourage a presumption of the linkage, integration and reuse of data of the people whose data you hold or wish to hold, IGS is ideally placed to provide the guidance you need to ensure that you do so ethically.

[1] https://link.springer.com/article/10.1007/s11948-015-9652-2

[2] https://journals.sagepub.com/doi/10.1177/0162243917736139

[3] https://www.thelancet.com/journals/landig/article/PIIS2589-7500(19)30024-X/fulltext

[4] https://www.researchallofus.org/frequently-asked-questions/?_ga=2.75278546.1902005278.1576753949-1286912872.1576753949

[5] https://tinyurl.com/y2wdd56p

[6] https://www.thelancet.com/journals/lanpsy/article/PIIS2215-0366(16)30089-X/fulltext

[7] https://www.england.nhs.uk/wp-content/uploads/2014/10/memo-underst-25-09-14.pdf

[8] https://www.ft.com/content/4ade8884-1b40-11ea-97df-cc63de1d73f4

[9] https://www.wired.co.uk/article/care-data-nhs-england-closed

[10] https://jme.bmj.com/content/47/12/e26

[11] https://academic.oup.com/medlaw/article/27/3/432/5479980

[12] https://tinyurl.com/3ewys253