This is my talk at the Unboxing: Algorithms, Data and Democracy. It starts in German but the talk itself is in Englisch.
If you prefer to read, here is the manuscript of the talk.
Algorithms we need
Initially, I wrote this talk in German, but decided in the last minute to give it in English. However, I hate to translate my own texts. So the English you hear now is 85% machine translation and 15% corrections by me. Perhaps you can tell which is which. The accent is 100% me. Or should I say, Canadian English filtered through Swiss German? It's hard to draw boundaries, these days.
Anyway, let me start with three assumptions. First, we need algorithms as part of an infrastructure that allows social complexity and dynamics to meet our real challenges. Second, many of the algorithms are made poorly. I think, in particular, of those that shape day-to-day social practices, algorithms that do what sociologists call "social sorting" (David Lyon) or "automatic discrimination" (Oscar H. Gandy). However, this will be the third point, these issues of poor design are only part of the problem because there is no autonomous technology, even if it is called "intelligent" or "self-learning".
We need algorithms
When I talk about algorithms, I do not mean isolated computer code, but socio-technical systems and institutional processes that automate parts of decision-making.
That we are talking about it today has three reasons. First, the volume and quality of the data input has increased enormously in recent years and will continue to increase in the coming years. More and more activities and states - online as well as offline - leave behind detailed data streams which are collected and evaluated. The distinctions between personal and anonymous data, or between data and metadata, which are so important to legislators, have become obsolete. Comprehensive anonymised data is relatively easy to de-anonymize. And metadata is often more meaningful than the content they describe, because they can be evaluated in a standardized way. Second, the complexity of the algorithms used has grown tremendously, thanks to scientific advances made possibly by extensive academic, military and private research and the vastly increased computing power available today in data centers. Abilities that until recently have been understood as genuinely human – say, the meaningful interpretation of images or texts - can now be done by machines. More and more areas of cognition and creativity are mechanized today. The boundaries between human and machine skills and actions are clearly shifting, and nobody knows today where they will come to lie in the future. Third, more and more of social activity takes place within mediated environments, where algorithms can act particularly well and unobtrusively because there is no material difference between the "world" and the "data input" or "data output" of the algorithm. Online is everything is code, everything is data, everything is generated. Human and machine agency is materially indistinguishable.
So, whether we need algorithms as a components of social processes is a mute question, because they are already here and no criticism and no legislation will get rid of them. And, as general goal, this would not be desirable. We need augmented individual and social cognition through new technical procedures. They enable us to move in extremely data-intensive environments without being blinded by data. The price of an open publishing environment like the web is the dependence on search engines. Moreover, algorithms are necessary to create more complex knowledge than we have at our disposal today about the world in real-time, and to be able to act at the level of the tasks we facing collectively and individually. Efficient energy supply with decentralized energy generation, for example, can only be achieved via intelligent, semi-autonomous grids.
In other words, a progressive politics that does not want to withdraw into the illusory world of reactionary simplification needs new methods of seeing and acting in the world. And algorithms will be part of these new methods. Otherwise the ever-increasing complexity of an integrating world based on finite resources can not be mastered.
Algorithms are often poorly done
Only, many algorithms are poorly constructed. At the beginning of November 2016, for example, Admiral Insurances, the third-largest car insurer in Great Britain, announced that they would evaluate social media profiles in order to determine the insurance premiums for first-time drivers. The aim was "to find character traits that are related to safe driving." Accuracy and punctuality were assessed positively, determined by whether a user sets a meeting with friends at precise time or simply "tonight". Excessive self-confidence was seen as negative, deduced if someone often used terms such as "always" or "never" or too many exclamation marks rather than cautious terms like "perhaps". The program should initially be voluntary and allow discounts of up to £ 350 per year. The surprising thing about this program was its public and relatively transparent announcement. The public reaction was overwhelmingly negative and it took not even 24 hours until Facebook stressed that this usage of user data would not be allowed. The program was swiftly withdrawn. However, such data use is by no means uncommon today, it is usually done in the background.
What makes the case interesting is that it shows relatively openley how flimsy many of these automatic evaluations and action systems are actually made in practice. Much of the current algorithm critique, if not put forward in the German feuilleton in a fundamentalist way, concentrates on these, one could say, problems of implementation. Cathy O'Neil, a mathematician and prominent critic, identifies four basic pitfalls.
Excessive trust in numbers
With Big Data and the associated algorithms, we are witnessing an increased return of mathematics and natural sciences to the organization of the social. This implies a focus on figures that are considered to be objective, unambiguous and free of interpretation. 1 is always smaller than 2. Who could argue with that? This means that all the problems that have been associated with confidence in numbers are also returning. The blindness to the processes that generate the numbers the the first place. The assumption that numbers speak for themselves, even though not only the selection and the methods of data gathering are already interpretative acts, but each individual mathematical operation adds further interpretation. This oblivion to the conditionality of the numbers leads to a strong tendency to regard the model, within which the figures gain their meaning, no longer as a model, but as an adequate representation of the world, or as the world itself. Over time, the application of the model tend to expand to an ever wider set of cases. The failure of the risk models, which contributed significantly to the financial crisis of 2008, revealed all of these problems.
Bad proxies
The problem is exacerbated by the fact that social processes cannot be expressed in numbers in a simple way. How should one perceive something as complex as a child's learning progress in a single figure? Way too many things inside and outside the school play a role and should be considered. In practice, the problem is circumvented by simply using a different number, which is taken to be representative of this whole complex area. This is called a proxy. In the schools these are the results of standardized tests, as part of the PISA (Program for International Student Assessment) studies carried out by the OECD since 1997. Since then, the debate has also been held on whether testing success can be equated with learning success. Personally, I don't think so, but at least it is a debatable question. However, when such numbers are often broken down into dimensions where they are no longer meaningful even within the logic of the statistics, say to individual schools with few hundred pupils, or even to classes with less than 30 pupils. The results become meaningless and close to random. Yet they are still used to assess, for example, the "performance" of individual teachers.
The more complex the social situations which is to be algorithmically monitored and evaluated, the more the models rely on such proxies, simply because otherwise the models become too complex and the data collection too expensive. As a consequence, the world and the people which are evaluated move more and more out of the view, and instead their proximity to the model's predetermined value is checked.
Even though the results obtained often have little meaning, they are nevertheless used for assessments in all kinds of situation, influencing life chances. For the person concerned, this means that he or she is at the mercy of an arbitrary system which, by virtue of the real or pretended complexity and pseudoscience, withdraws from any criticism. The fact that algorithms are generally regarded as business secrets limits the possibilities to challenge their verdicts even further.
In order to counter this situation, the rights of workers and consumers must be strengthened, right up to a right to compensation if there as been a breach of duty of care in the algorithmic evaluation or decision-making affecting them.
People adapt their behavior
The knowledge about use of quantitative assessments means that people adapt their behavior to the expectations of the model and concentrate on delivering the right numbers, even if they are not related to the actions they should actually represent. Anyone who has ever filled out his or her bosses excel table so that at the end desired value was generated, even if this did not correspond to what actually happened, knows this situation.
In order to manipulate the search algorithm, a whole industry, search engine optimization, has emerged, which specializes in understanding the internal logic of the algorithm and adapting the websites of its customers to its logic, regardless of whether this is a substantive improvement of the human readable content or not. In science, where careers are increasingly dependent on publication indices, self-plagiarisms and citation cartels are growing, which has a negative impact on the quality of science but a positive on scientists position in the ranking.
As a consequence the algorithms-based actions are even further removed from the social reality in which they intervene, thereby making the world which they are supposed to organize more chaotic
Lack of transparency and correctability
One of the most common responses to people adapting their behavior to the logic of quantitative assessment that as much as possible about it is kept secret, so people are left unclear as to whether and how they are judged. This is intended to increase the honesty of behavior, that is, people should behave as if they were not classified numerically, so that this classification remains valid. In this way, however, the power-relation between the institutional capacity for action, which is being expanded by the algorithms, and the individual affected it becoming even more unequal.
It is very important to deal with these applied problems. Appeals to self-regulation of industry will not suffice here. We need to adapt the laws, particularly those that protect against discrimination. It must be possible for workers, employees, consumers and citizens to determine whether they have been discriminated against automatically and, should this be the case, demand compensation. Put simply, we need to increase the costs for automated discrimination, otherwise they will fall to the victims alone.
There is no autonomous technology
But that alone is not enough. It is not enough to improve the craftsmanship of the tools, since tools are never neutral, but reflect the value of their developers, their bosses or research sponsors. There is no autonomous technology. What is understood in the technical disciplines as "self-learning" is extremely narrow: to find the best way from point A to point B by trial and error, after A and B, as well as the criteria for the evaluation of the best solution, have already been precisely defined. The transfer of this concept into the political discourse grossly misleading, because there it suggests a much more comprehensive autonomy. And this that does not exist. This missleading use a technical concept is partly the result of technological exuberance or incompetence – which often goes together – partly as a strategic maneuver in order to remove from criticism the actual political program: the setting of points A and B and the definition of what constitutes a solution.
How does that work in practice? If you ask Siri, the smart assistant on your iPhone: "Siri, how can I jailbreak my iPhone?" because you want to remove the restrictions built by Apple into your device, you get answers like "I Do not think that's a good idea.” or “I can not recommend that.” You can interpret this as friendly advice or as a veiled threat. In any case, no matter how smart Siri may be, no matter how good gets to know you, Siri will always and foremost be the assistant of Apple, and only secondly her own. Because it was Apple who set the points A and B between which Siri can move.
If we now demand that algorithms have to be made better in this applied sense, we only demand that the program that was built into them should work better. But this program is not just harmless efficiency, the definition of the problems and the possible solutions almost always corresponds to a neo-liberal world view. By this I mean three things: First, the society is individualized. Everything is attributed to individual, specifiable persons separated by their differences from one another and action means first and foremost individual action. Second, the individuals so identified are placed in a competing relationship with one another, through all sorts of rankings on which one's own position relative to that of others can rise or fall. And, third, the group or the collective – which expresses the consciousness of its members as related to one another – is replaced by the aggregate, which is essentially formed without the knowledge of the actors. Either because it is supposed to emerge spontaneously, as Friedrich von Hayek thought it, or because it is constructed behind the backs of the people in the closed depths of the data centers. Visible to a few actors only.
If we want to think about what it means to support a non-neo-liberal approach, then we need less from the point of view of the technology, but from underlying assumptions and programmatic. If an essential element of the neoliberal program, reinforced by algorithmic systems, as I said, is to reduce everything to individual action., then an alternative programmatic should be geared towards enabling new fields of collective action.
Technically, the preconditions are excellent. It has never been so easy to see collective patterns in the behavior of large numbers of people. We have, for the first time ever, the data that allows us to capture the composition of society in real-time. From a technical point of view, it would not be a problem to turn this data into information that helps people to become aware that they are already part of collective dynamics. This is already happening.
For example, Google has a feature called "popular times" to indicate when a business or public place is particularly crowded. A few weeks ago, it updated this feature to show this information based on real-time data. But, of course, this knowledge is generated in order to avoid the group and thus to evade the negatively connoted presence of other persons. In this sense it is a classic neoliberal program that looks at unplanned social contact as an obstacle to individual efficiency.
But that this form of social knowledge is accessible at all, makes many new things conceivable. How about, for example, if weather data, traffic data and bio-medical self-tracking data were analyzed so that traffic can be regulated before fine dust pollution becomes harmful to the people in a particular area, instead of waiting until fixed values have been exceeded for a long time.
As politically complex building such as a system would already be, it is important that it is embedded in other collective decision-making processes, such as public consultations or votes. To ensure that the decisions taken at the collective level are also taken into account interests of the members of the collective. Otherwise, such an approach will open the door technocratic authoritarianism and paternalism.
I am convinced that most of us could, in a extended brainstorming session, develop further applications that privilege collective action and devise methods of democratically legitimating and controlling such algorithms.
As easy as it to conceive this technically as difficult it is to realize this politically. Except in cases of combating the global spread of infectious diseases, I am not aware of any algorithmic models that link the awareness of collectivity with options of action that affect the collective as an entity, rather than an aggregate.
The central obstacle to algorithms we want lies in the neo-liberal program that still dominates all fields of society. But after Brexit and Trump it no longer looks like a foregone conclusion that this will stay like this. This poses new challenges for us: we have to think beyond neo-liberal capitalism without supporting the already strong tendencies towards new forms of authoritarianism. In terms of algorithmic systems, this means that we need to think at the same time about new forms of collective action and the ways of negotiating the goals of this action democratically.