Damage Quantification in Algorithmic Abuse Cases – the Elusive Counterfactual

(by Peter Ormosi) On the 5th of  February 2024 the Competition Appeal Tribunal published a ruling to determine the carriage dispute in relation to two applications to commence collective proceedings against Amazon regarding Amazon’s Buy Box.[1] Interestingly, the ruling (which went unanimously in favour of Mr Hammond’s application) included a short discussion of the counterfactual.[2] The proposed method for the determining the counterfactual in Mr Hammond’s opt-out class action included re-running Amazon’s original algorithm without the abuse (and logging the resulting outcome as the counterfactual). The CAT seemed sympathetic to this solution. I personally think the proposal in Mr Hammond’s application is a viable approach, and I do believe the CAT must be commended for understanding the novelty of the issue and being open to a method untested in courts. But this ruling draws attention to a much more general question: how to find the counterfactual in a case where the offence is related to conduct by an algorithm?

The two class action claims follow investigations by the European Commission (EC) and the Competition and Markets Authority (CMA) into the operation of Amazon’s Buy Box.[3] Both investigations were closed without a binding infringement decision when Amazon offered commitments. For this reason, Mr Hammond’s claim will have to prove not only the quantum, but the infringement as well. But even if they were follow-on claims, the claimant would still have to establish what the counterfactual is, i.e. what would have been the ranking of suppliers in the Buy Box in the absence of Amazon’s infringement to demonstrate the Amazon’s algorithm caused them harm.

Neither the EC [4] or the CMA[5] case documents explicitly define what this counterfactual would be, but there are implicit hints in the commitments. The CMA agreed to the commitment that Amazon would offer objectively verifiable and non-discriminatory conditions and criteria in the Buy Box ranking. The EC accepted Amazon’s commitments to (1) treat all sellers equally when ranking the offers, and (2) to show not one but two featured offers (let’s put aside this second commitment as I do not think it will have material impact on whether the Buy Box ranking offers a level-playing field for suppliers).

Both sets of commitments suggest that the EC and the CMA were satisfied that in the absence of the infringement, all suppliers would be treated in an objectively verifiable, non-discriminatory way for the purposes of the Buy Box ranking. Based on this, one could argue that a non-discriminating Buy Box, or a ranking based on a non-discriminating algorithm is an adequate counterfactual.

But what counts as non-discriminatory on the Buy Box? More importantly, what does the term non-discriminatory mean for an algorithm, and especially what would a non-discriminatory ranking look like? This question has implications beyond the Buy Box case, because finding the counterfactual world in an algorithm that’s stripped off of its infringing components will be at the heart of many damage claims that follow. But this question also bears relevance for choosing the appropriate remedy in these cases, because requiring a platform to adjust their algorithms to cease discriminating in how they rank suppliers is an increasingly used behavioural remedy (see also Google Shopping[6]).

To illustrate some of the issues, let’s take rule-based algorithms[7] first (I believe that the problematic parts of Amazon Buy Box are likely to be rule-based). The fact that the ranking is done by an algorithm makes it quantifiable whether an outcome is non-discriminatory. But quantifiable does not necessarily mean clearer for establishing a counterfactual. It is easy to demonstrate why this through a simple example.

Assume that there is a product, and price is the only factor that mattered in ranking sellers selling this product. In this case one could produce an objective ranking of all products and use it as the non-discriminatory benchmark or counterfactual. But what if there is a two-dimensional product feature space, where the two features that matter for ranking are price and delivery-speed? An algorithm that minimises price would rank products differently from an algorithm that minimises delivery speed. Both would be non-discriminatory. Because consumers vary in how much utility they gain from either (or both) of these factors, the algorithm might also choose to minimise price-adjusted delivery speed in its ranking. This again would lead to a different (but still non-discriminatory) ranking order. Which one of the three possible non-discriminatory rankings should be used as counterfactual?

An algorithm ranks products based on a potentially high-dimensional feature space. It is easy to see how much more complex this question would become, and how many more possible non-discriminatory counterfactual rankings could be produced with a larger number of features. As a sidenote, neither the EC, nor the CMA Buy Box commitments provide details of how the algorithm should achieve non-discrimination, which leaves interpretation to the operator of the algorithm and risks creating an ineffective remedy.

Now let’s look at learning-based algorithms.[8] These are more difficult to discuss analytically, but simulations can give us some useful insight. For example, Castellini et al. (2024)[9] brings together a specific set of learning algorithms used for product ranking by online platforms: recommender systems. Recommender systems are often at the heart of discussions on self-preferencing abuses.[10]

Castellini et al. (2024) demonstrate a crucial point for my argument in this post. Even in a simple simulation environment, with a small set of assumptions, the counterfactual is very sensitive to the assumptions. Looking at the original algorithm, and re-running it without the infringement can give different results depending on the details of various parameter settings. For example, it matters whether we assume that the algorithm is completely re-set (no feedback loops) or it ‘remembers’ previous self-preferencing through feedback loops. Because it is a learning-based algorithm, it also matters whether consumers have limited attention (something that the EC Google Shopping decision showed to be the case), and to what extent their attention is limited. It also matters what type of learning algorithm the platform uses for ranking items, and so on. When re-running an algorithm without the offending part, many more assumptions will have to be made, and there is no way of knowing which one of these assumptions would have been chosen in a counterfactual world.

Finally, algorithms are dynamic. Take the Amazon Buy Box case. It is difficult to tell whether in a world without the infringement, Amazon would continue to use the same algorithm. It is more likely that the algorithm would have changed into something altogether different. If, for example, Amazon shifted to a learning-based recommender system, it might exhibit popularity bias[11] without any malintent from Amazon. Both Fletcher et al (2023)[12] and Castellini et al. (2024) showed that biases in a recommender system can make certain products disproportionately popular (and ranked higher) whilst leave others disproportionately underrepresented in rankings even where the algorithm is user-centric (i.e. where there is no exclusionary or exploitative practice by the platform). As such, it is easily possible that a claimant would have similarly small market shares (and would be disproportionately underrepresented in rankings) as under the factual world, even in a counterfactual world where the abuse did not happen.

To conclude, finding the counterfactual in exclusionary abuse cases is always difficult because of the large number of conceivable but-for worlds that could have existed had there been no exclusionary practice. Where the abuse is linked to an algorithm, there is the possibility to analyse in more detail some (or all) possible counterfactuals through simulations. But choosing which of these many but-for worlds would have prevailed without the infringement is still down to a solid understanding of the economics of the case, and to finding the most fitting theory of harm.

Edited by Sebastian Peyer


[1]             Hunter & Hammond (1568 & 1595) – Judgment (Carriage) 5 Feb 2024

[2]             Ibid., paras 19-22.

[3]             The Amazon Buy Box is a prominent feature on Amazon’s user interface. It is the box on the right-hand side of the page where shoppers can add items to their cart or proceed with their purchase directly. The Buy Box showcases price and shipping details of sellers of the given product. The Buy Box is significant for sellers on Amazon because winning it typically leads to increased sales.

[4]             https://competition-cases.ec.europa.eu/cases/AT.40703

[5]             https://www.gov.uk/cma-cases/investigation-into-amazons-marketplace

[6]             Case AT.39740, Google Search (Shopping)

[7]             A rule-based algorithm operates based on a pre-defined set of rules and conditions, typically crafted by the operators of the algorithm.

[8]             Unlike rule-based algorithms, learning-based algorithms learn patterns and relationships from data. They adjust their parameters or model structures based on the input data, allowing them to generalise to new, unseen examples. Learning-based algorithms include techniques such as supervised learning, unsupervised learning, and reinforcement learning.

[9]             Castellini, J., Fletcher, A., Ormosi, P., Savani, R. (2024) Supplier competition on subscription-based platforms in the presence of recommender systems, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4428125

[10]            Details of how this simulation framework works are given in Castellini et al. (2024), which also shares the source code to run simulations on recommender systems, allowing anyone to do simple (or more complex) exercises as the one highlighted here.

[11]            Popularity bias refers to the tendency for autonomous recommendations, rankings, or decisions to disproportionately favor items that are already popular or widely known, leading to a reinforcing cycle where popular items receive even more attention and become even more popular. Popularity bias in recommender systems has been widely documented in the computer science literature. For a review, see Fletcher et al. (2023), ibid.

[12]            Fletcher, A., Ormosi, P. L., & Savani, R. (2023). Recommender systems and supplier competition on platforms. Journal of Competition Law & Economics, 19(3), 397-426.

Leave a comment