‘Topics’ in the post-cookie era: just another less-terrible attempt at privacy friendly ad-targeting?

Mila Reed
April 12, 2023

Introduction

‘Topics’, an application programming interface (API), is the name of one of the ad-targeting components of Google Sandbox, a collaborative initiative by Google to build new privacy-preserving technologies as an alternative to third-party cookies. Launched in January 2022, ‘Topics’ follow the same goal as the (unsuccessful) Federated Learning of Cohorts: that of being a privacy friendly alternative to third-party cookies in the post-cookie era of ad-targeting. In this article, we will explore how privacy-friendly ‘Topics’ really is and how it is being perceived by regulators such as the Competition and Markets Authority (CMA), which have recently published its interesting third update report on Google’s implementation of the binding commitments it has accepted to address competition concerns relating to Google Sandbox, ‘Topics’ inclusive.[1]

The concept behind ‘Topics’

The concept behind ‘Topics’ is that the API tracks the user’s interests via their browser in a way that preserves their privacy whilst still showing them relevant content and ads. To do so, the user’s browser infers a handful of recognizable, interest-based categories based on their recent browser history in order to help sites serve them relevant ads. The difference between this and third-party cookies is that the specific sites an individual has visited are no longer shared across the web, like they could have been with third-party cookies.[2]

More specifically, the user’s browser will note the themes relating to participating sites they visit based on a limited set of topics. These themes – or topics – are selected from a publicly visible human-curated list. They are recognizable and high level. Google has made sure to emphasize that it aims to exclude from this list all sensitive categories of information like race, sexual orientation and religion. The proposed list contains around 350 topics to reduce the risk of browser fingerprinting. ‘Topics’ then labels the websites a user visits. If they’ve recently visited a sports website, the browser will probably include “sports” as one of their topics, and a website won’t need to know who they are, or which specific websites they’ve been on, to show them an ad about sports.[3]

The topics will be shared with the websites the user visits at a rate of one new topic per week. The user has control over their topics, and will be able to easily access them to see them and remove any they don’t like, or disable them completely in Chrome Settings.[4] The topics will deleted after three weeks, will be erased after clearing the browsing history and won’t be available when the user browses in incognito mode. So what’s the catch for individuals’ privacy?

The catch

Described by some as ‘a way for web browsers to watch what people do online in a non-creepy way’, [5] ‘Topics’ has not failed to attract significant criticism. The World Wide Web Consortium (W3C) – the international body that works to guide the development of web standards – called upon Google in January this year to re-think ‘Topics’, writing that the API “fails to protect users from unwanted tracking and profiling” and “maintains the status quo of inappropriate surveillance on the web.” “We do not want to see it proceed further” was its damning conclusion.[6]

Others, such as Browser engine developers WebKit, have warned against the potential for pre-existing privacy deficiencies on the web to be used as excuses for privacy deficiencies in new specs and proposals.[7] Mozilla in turn deemed ‘Topics’ to be more likely to reduce the usefulness of the information for advertisers, rather than to be a meaningful protector of privacy.[8] While this latter criticism is not necessarily privacy-related, it is worth mentioning that it holds water for advertisers. Due to the abovementioned rate of one topic per week, the slow speed will likely lead to consumers being targeted with ads for old needs that have already been met, among other practical inefficiencies.[9]

So how privacy-friendly is ‘Topics’?

There are a number of important privacy-related aspects to be considered when it comes to ‘Topics’. The subsequent list does not provide an exhaustive analysis of all the privacy-related aspects of ‘Topics’, but can serve as a useful starting point to an interesting discussion.

User control

User control essentially refers to an individual’s ability to control what happens to their data. A problematic element of the utilization of third-party cookies in ad-targeted advertising is that websites used to store them on users’ device without their permission, before Regulation 6 of the Privacy and Electronic Communications Regulations (which sits alongside the UK GDPR) prohibited such practices. At first glance, ‘Topics’ offers the user much better control over what happens with their data, as it allows users to have control over their topics, easily access them to see them and remove any they don’t like, and disable them completely in Chrome Settings.

Yet here a point made by Amy Guy of Tag comes to mind, namely that individuals can suddenly become vulnerable in unpredictable ways, and so cannot be expected to possess a complete understanding of all possible topics as they relate to their own personal circumstances. Further, people cannot be expected to understand the true consequences of sharing such data with websites and advertisers, nor to continually revise their browser settings as their personal and/or global circumstances change.[10] It is therefore advisable that users acknowledge the limited user control of ‘Topics’.

Browser fingerprinting

Browser fingerprinting refers to the phenomenon whereby websites track users via scripts that identify lots of information about the user’s device and browser that, when stitched together, forms that individuals’ unique online fingerprint. The harm? Among numerous other examples, browser fingerprinting can be used to connect a user’s location to not only the ads they see but the prices of the items they are advertised. If an individual is fingerprinted as living in an affluent area, they can expect prices to rise on almost everything they see online.[11]

Third-party cookies did nothing to tackle browser fingerprinting, because while they can easily be blocked by an individuals’ browser, browser fingerprinting happens entirely on the server side. It is worth acknowledging that ‘Topics’ does do more to tackle browser fingerprinting, seeing as the proposed list of topics contains around 350 topics, which is a relatively limited around, meaning companies have a relatively more limited profile of the user (as compared to, for example, if there were 1000 topics). Nonetheless, it is arguable that ultimately nothing really changes. In fact, ‘Topics’ may very well enable browser fingerprinting, by use of the so-called “past observability” rule. This rule could allow for building some cross-domain unique user identifiers from ‘Topics’, potentially on a large scale.[12]

Discrimination

Discrimination is linked to one of the biggest privacy concerns faced by third-party cookies, precisely due to how they enable ad networks to track and store incredible amounts of information, including personally identifiable information. Because they place third-party cookies on a massive variety of sites, they may have access to sensitive information such as medical history, sexual orientation and gender identity.[13]

‘Topics’ sets out to tackle this issue, for as mentioned above, Google aims to exclude from the topics list any topic that falls within a sensitive category of information. Yet it is relevant to draw attention to what others have not failed to notice: the danger of Google being the arbiter of what “sensitive” data is. Indeed, it is valid to make the point that only individuals themselves should decide what they consider sensitive or not. Apart from the more obvious types of information, such as race and ethnicity, it is arguable that there is no such thing as categorically non-sensitive data: information that is safe to share about one person in one context, might be closely guarded secrets to another individual in a different context.[14] For an example, ‘Topics’ may very well identify ‘job hunting’ as one of an individual’s topics after they browse through some job hunting websites. This might not be sensitive information to a graduate fresh out of university, yet to an individual who is already employed, it may very well be.

It is also worth drawing attention to Guy’s argument that ‘Topics’ could be used to customise content in a discriminatory manner, using stereotypes, inferences or assumptions based on the topics revealed. In other words, a topic could be used, accurately or inaccurately, to infer protected characteristics, which is thereby used in selecting an ad to show.[15]

The CMA’s take

Google Chrome is the world’s dominant web browser, and as a result the CMA has been closely monitoring the progress of Google Sandbox, including its ‘Topics’ proposal. On the 31^st of January 2023, the CMA published its third update on the implementation by Google of the binding commitments it has accepted to address competition concerns relating to Google proposals to remove third party cookies on Chrome and replace them with alternative “Privacy Sandbox” tools and technologies.[16]

The verdict? So far, the CMA is happy. It’s January report finds that based on the evidence available, Google has complied with the commitments it has made. Yet this may change, as readers are reminded that ‘(b)ased on all the work undertaken to data, and submissions and feedback, the CMA’s current overall priorities are, among others, ‘[c]ontinuing to engage with Google on the design and development of its Privacy Sandbox proposals, focussing on Topics API and First Party Sets’. In the context of regulatory input, it is worth mentioning here there has been no peep out of the U.K. ICO Commissioner, despite an attempt by TAG to reach out for questioning.

Conclusion

‘Topics’ poses multiple concerns for individuals’ privacy in ad-targeting. With limitations in user control, possibility for browser fingerprinting and the threat of discrimination, it is arguable that ‘Topics’ is a better alternative to third-party cookies when it comes to ad-targeting and privacy, yet, ultimately, still fails to hold up an ideal privacy standard. Broader questions remain unanswered at this point: Is individuals’ privacy in ad-targeting simply stuck between a rock and a (slightly softer) hard place? How privacy friendly can ad-targeting ever get?

[1]https://assets.publishing.service.gov.uk/media/63d7b9a88fa8f5188ae8f788/CMA_update_report_-_January_2023__.pdf.

[2] https://privacysandbox.com/proposals/topics

[3] https://privacysandbox.com/proposals/topics.

[4] https://blog.google/products/chrome/get-know-new-topics-api-privacy-sandbox/

[5] https://www.theregister.com/2023/01/18/google_topics_api/.

[6] https://github.com/w3ctag/design-reviews/issues/726#issuecomment-1379908459

[7] https://github.com/WebKit/standards-positions/issues/111#issuecomment-1359609317.

[8] https://github.com/mozilla/standards-positions/issues/622#issuecomment-1372979100.

[9] https://nanointeractive.com/assessing-google-topics-api/

[10] https://github.com/w3ctag/design-reviews/issues/726#issuecomment-1379908459;

[11] https://www.avast.com/c-what-is-browser-fingerprinting#topic-3

[12] https://github.com/patcg-individual-drafts/topics/issues/74