Chasing the algorithmic magic: an introduction to the Critical Data Studies Group at the University of Leeds

Category: Research blog
Date: Monday 19 September 2016

The Facebook Trends feed is a list of topics and hashtags that have recently spiked in popularity on Facebook. This list is personalized based on a number of factors, including Pages you've liked, your location and what's trending across Facebook.

In May 2016 the service was at the centre of a big controversy. Allegations were made that human curators were involved in selecting and moderating political news items adopting a liberal bias, favouring left-leaning topics at the expense of conservative ones. The Trends system has been discussed extensively since the accusations, and while some of the claims of systematic political bias may have been exaggerated, it is clear that a significant degree of obfuscation was going on at Facebook. Far from being a neutral and fully computational content aggregator, the Trending feed was, and still is, the result of multiple interactions between humans and machines. One of such interactions is what Tarleton Gillespie calls ‘clickworking‘. Clickworking is the ‘human computation’ that happens after the Facebook algorithm has identified spikes of activity or other patterns in a data stream. The human curators then go through this content using a proceduralised, repetitive set of steps in order to identify and categorise the trending items. While the curators are asked to make the sorts of decisions that only humans can make, i.e. based on meaning, they are forced to do so by following a set of standardised instructions in order to maximise efficiency, and to reduce variations caused by differences in judgement. The curators are fully assimilated in the computational process, acting like human interpreter programs that execute instructions in a scripting language.

***

Class Dojo is a school/classroom management app – currently one of the most successful and widely adopted educational technologies in the US and the UK. Last year, the NYT reported that the app was being used by at least one teacher in roughly one out of three schools the US. The system is, in essence, a gamified, point-based tracking tool that allows educators to score and compare students’ behaviour, and share information with parents. The tool streamlines and automates classroom-level behaviour management by integrating, among other things, database and analytics functionalities that greatly enhance tracking and comparing. My friend and colleague Ben Williamson describes Class Dojo as a sociotechnical system in which human and non-human elements interact and intersect, and where computation meets human politics, psychological expertise and traditional school governance – all of them cleverly packaged and marketed as a ground-breaking educational innovation.

***

No Man’s Sky is a video game in which an entire universe of 18 quintillion planets is procedurally generated. Playing as a lone intergalactic traveller equipped with only a scrappy spaceship, limited life-supporting resources and a vague set of goals, you fly from one planet to another collecting materials to protect yourself from environmental hazards and to upgrade your vessel, cataloguing the flora and fauna you encounter, fighting or escaping from space pirates, and engaging in simple trade interactions with various alien species. The game’s main selling point is that, by recombining a finite number of variables and conditions established by the developers, it manages to create a staggeringly large amount of variations for the player to discover at his or her own pace. After the considerable hype in the phase leading up to its release, the game suffered a significant backlash from players and critics alike. A classic case of overpromising and underdelivering, No Man’s Sky has been criticised for feeding unrealistic expectations and for overselling the procedural generation aspect, which turned out to be rather more underwhelming than originally advertised. While it is certainly true that the game can generate an incredibly large number of life-size planets – all of them reachable and fully explorable – these come across as fairly minor iterations on a small number of ‘types’, offering very few changes in terms of interactive opportunities and game play. In fact, game play in No Man’s Sky appears governed by another, less fanciful set of procedural rules, whereby player progression is tied to an arbitrary system of trivial milestones (e.g. number of steps made while walking on a planet) that rewards repetitive and mundane behaviours.

***

Although radically different in many aspects, the three examples above share one fundamental feature, they are all ‘algorithmic systems’ in which computational, social, economic and human aspects are more or less entangled and interdependent. There are now many academics and researchers who are increasingly interested in the critical analysis of these systems. Our Critical Data Studies (CDS) group at the University of Leeds is part of this trend. In this introductory post, I would like to explore a few basic assumptions and lay some groundwork, discussing the empirical and theoretical concerns that brought us together.

To begin with, we borrowed the label ‘critical data studies’ from others who have already done a great deal to situate this emerging area of inquiry at the intersection of critical theory, the humanities and computer science. Rob Kitchin and Tracey Lauriault, for instance, talk about ‘data assemblages’ as (often messy) entanglements of technical, informational and economic factors, and invite us to question the view of these phenomena as benign, objective and non-ideological. Focusing on the epistemological and political implications of data, Kitchin, Lauriault and others ask important questions about power, governance, surveillance and erosion of privacy. The Critical Data Studies at Leeds was born because individuals from different disciplinary backgrounds found themselves drawn to such questions for different reasons: methodological curiosity, theoretical interest and, in many cases, specific research challenges. One of these challenges is, for instance, the problem of automation in data assemblages. In fact, the emphasis on automation may lead to preferring the definition ‘algorithmic assemblage’ or indeed ‘algorithmic system’: a system purported to manipulate and process information with reduced or no human intervention. The semantic shift from data (as in ‘data assemblage’) to the relationships between human and non-human agency within and without data (as in ‘algorithmic system’), may even help us identify new lines of inquiry.

Automation

The problem of how to approach and investigate automation lies at the very heart of the relationship between humans and technology, and it is not a prerogative of ‘critical data studies’. Marx famously saw automation as an inevitable development of capitalism – the process by which live labour becomes abstracted and assimilated within the technological process, displacing workers rather than enhancing their productivity. Marx was very critical of the utopian view of automation as an example of technology ‘leaping to the aid of the individual worker’. For him, automation does not enter the production process to enhance individual productivity or to support human labour, but

to reduce massively available labour power to its necessary measure. Machinery enters only where labour capacity is on hand in masses.

For Marx automation is driven by the need for efficiency, that is, by the capitalist’s need to reduce potential losses caused by large amounts of human variation across an abundant, and disposable, multitude of workers. Interestingly, Marx saw this process of displacement as a precursor of the most advanced stage of socialism, as it gives workers more free time to organise collective action. This position is somewhat echoed in recent accounts of the ‘post-human’ opportunities offered by pervasive digitisation. Donna Haraway, for instance, sees clear links between digital automation, the feminisation of work and the rise of the so-called homework economy, with its high levels of underemployment and the proliferation of low-skill/low wage jobs in which humans simply monitor or assist automated information workflows (the clickworking described by Gillespie). These new forms of entanglement brought about by digital automation are problematic, but according to Haraway they also set the conditions for new cross-gender and cross-race alliances, inherently hybrid and ‘cybernetic’, as growing multitudes of digitally connected men and women contend with similar situations.

Much more could be said about automation and mechanisation as themes of philosophical and theoretical interest, but this goes beyond the scope of this post. I would like to make an exception for Heiddegger’s analysis of technology, which I personally find very helpful to gain some philosophical perspective. In his attempt to describe the ‘essence’ of technology Heidegger developed the notion of enframing: the process by which the world is reconfigured according to the principles of mechanisation and exploitation. He calls it a ‘technological way of ordering the world’ whose purpose is to turn nature in a regulated series of instrumental opportunities, that is, to assign functions to objects and living things solely on the basis of their actual or potential usefulness as means to an end (the world becomes ‘standing reserve’). Enframing for Heidegger is ‘not merely human doing’, because it never happens ‘exclusively in man, or decisively through man’. Rather, the imposition of enframing is always the work of a partly automatic device, an apparatus (an ensemble of human and technology) that ends up encompassing its own creator.

The algorithmic system as a unit of analysis

In the last part of this post, I would like to discuss some sort of ‘heuristic strategy’ that can help us in the challenge of subjecting algorithmic systems to (critical) empirical scrutiny. Having established that these systems should be viewed as ‘assemblages’ made of technical, political, economic and symbolic components, an important part of the analyst’s task is to engage in a detailed investigation of the specific subcomponents in the assemblage, asking questions that broadly relate to two aspects:

the juncture points between human and non-human agency;
a ‘discursive’ aspect as defined by a repertoire of rhetorical positions: disclosure, obfuscation, objectivism, utopianism, and so forth.

This means we can begin to study an algorithmic system through a range of focused, methodologically tractable research questions. For instance: what are the subcomponents in the system that are claimed to operate without human intervention? How do these subcomponents actually (technically) manipulate information? Most importantly, what happens at the juncture points where these subcomponents get in contact with human agency? One of these juncture points is, for instance, at the design level when programmers set the conditions and rules of procedural automation. Alternatively, it could be at the very heart of the computational process, like in the human computation scenarios of Facebook’ clickworking and Amazon’s Mechanical Turk. It could be at the culmination of a traditional production and marketing trajectory, where end-users engage with a system, or one of its subcomponents, as consumers. It could also be somewhere above the system, where human factors shape the governance criteria that regulate the emerging algorithmic landscape, or where investment strategies and decisions determine the economic and material conditions for the system’s very existence, despite being ostensibly removed from the operational or usage aspects. As we engage in the detailed mapping of these juncture points, an entire geography begins to emerge.

Concluding…

Approximately since Alan Turing, a digital computer is broadly described as an automatic machine ‘intended to carry out any operations which could be done by a human computer’, albeit a theoretical human equipped with a vast allowance of information storage, individual ‘executive units’ that carry out the ‘various individual operations involved in a calculation’, and a control mechanism that ensures that instructions are obeyed - a mechanism constructed in such a way to be a necessary component for the correct operation of the overall system. Symbolically, this ‘automatic treatment’ of information has the effect of creating a black box whose functioning is difficult to understand or willingly obscured, thus leading to a mistaken impression of objectivity and neutral detachment. A familiar, quasi-religious prejudice of mechanistic transcendence (machines transcending human limitations and biases) takes hold of perceptions and judgements whenever automatisms are at play.

In such a scenario, the task of the critical data researcher is always twofold:

to open up the ‘black box’ of automation and describe – as in detail as possible - the messy entanglements between humans and machines. This effort can be purely descriptive, but it can also involve the technical development of ‘open bonnet’ systems where the entanglements are not only described but also acted out and exposed, or where alternative, less dehumanising or exploitative forms of human-machine interactions can occur.
To analyse the discursive positions deployed to legitimate, promote and validate the algorithmic system and its subcomponents. Using established discourse-analytic methods, a researcher could examine how claims to objectivity and neutrality are rhetorically constructed, or document and interrogate the emergence of new discursive positions as the debate around big data and automation shifts over the years.

These two research foci should ideally be present at the same time, but the task is probably too demanding for a single analyst. As we set out to study data assemblages and algorithmic systems, the need for collaborative inquiry becomes immediately apparent, as do the benefits that access to computational expertise can grant. These concerns (and the related methodological anxieties) led to the constitution of our group, and I am certain they will keep shaping our discussions as we move forward, trying to establish productive links with other groups and individuals active in this important area of interdisciplinary research.