Composite Images. On the Transformation of Visual Truth Claims

This is an older article (2021), but perhaps more topical than when it came out, as we continue to grapple with how to deal with generated images. This is why I am putting it up now. Here’s the nicely lay-outed version from the book.

tl’dr: Images, in particular the photographic images, have long ceased to serve as an index of reality. We need new methods of establishing visual truth claims that take seriously the artificiality of all images. Here, I examine and expand on the approach which moves away from claiming to represent reality ‘as it is’. Instead, rather, it aims at providing evidence for its factuality in constructing the real.

Composite Images. On the transformation of visual truth claims.

It was a peculiarity of Western modernity – the cultural constellation that extended, roughly, from the mid 15th to the mid 20th century (McLuhan 1962) – to claim that images not only represent the material world as it is appears to a single, distant observer, but also that what can be seen in this way is what counts. In other words, it constituted both a visual and an epistemological regime (Latour 1986).

On the surface, contemporary digital technologies of image-making are extending the visual dimension of this regime by creating ever more ‘realistically’ detailed images (HD, UltraHD, 4K, 8K, Gigapixel etc). However, they are also fundamentally undermining it, by moving from recording to generative methods in the production of images. But also the epistemological dimension has come under pressure. Reality is becoming ever more complex and distributed, eroding the ability of the single, distant observer to make sense of it (Stalder 2018a). Second generation cybernetics has argued since the 1970s that the observer is always part of the system he/she/it observes (Von Foerster 2003). From the 1980s onward the feminist science studies has critiqued the position of the distant observer as “view from nowhere” or “god’s view” and proposed the notion of situated knowledge(s) – located, embodied, entangled, and partial – instead (Haraway 1988).

While there has been much debate about the loss of representational quality of digital images over the last 25 years (Batchen 1994), I want advance a different argument. Digital images, if we understand as composites rather than indices, offer the possibility to develop a new visual regime in line with the transformed epistemological situation. From this, a new type realism and visual truth claims can be created.

Reality – Mirror – Image

The central perspective, first articulated by the architect Filippo Brunelleschi around 1420, was the key element in the modern organization of the visual. Its aim was to represent the visible world by accurately transferring three dimensional objects onto a two dimensional surface. The claim to veracity and realism was established in a famous experiment which the art historian Samuel Edgerton (2009, 46) describes as follows:

After painting the picture [of the San Giovanni Baptistery in Florence] Brunelleschi drilled a small hole through the panel and then had his viewer look through this hole from the backside of the panel. In this other hand, the viewer was to hold a mirror in front of the picture, reflecting the image as seen through this hole.

Figure 1: Brunelleschi’s arrangement allows a viewer to compare a reflected drawing to the thing itself, by removing the mirror

Holding the mirror in the hand, the viewer could not only see the mirror image of the painting, but by moving the mirror away, s/he could see the Baptistery itself and thus establish the direct correspondence between what the external reality that the immobilized eye could see and the representation that the painting showed. The mirror, one could say, was the first in a long line of technical apparatuses to produce images that would represent the world as it could be seen ’naturally’ by the painter and, later, photographer or camera man, so that others who were absent in time and/or space could see it as well. This direct correspondence between the image and external reality, between the sign and the signified, is called its “indexicality” (Gunning 2004).

The representational character of the images was created and stabilized along two vectors. The first was the technical character of the apparatus itself that produced the visual image. While its physical set-up was often sophisticated, based on state-of-the-art technologies¹ and thus quite variable over time, the mechanistic premise remained straight-forward and consistent across all devices: the apparatus was to simply reflect/record what was in front of it, without any interference with the object and independent of the person setting up the apparatus. It showed the world as an individual person would see it, but this way of seeing was not individualistic in a modern sense, it was, as one would later say, scientific, that is, inter-subjective and objectively rational (i.e. following logical principles). The second dimension was a cultural one, creating its epistemological capacities. All images appeared always contextualized in a particular physical and narrative order that implicated a certain type of reading. For example, in modern museum, images are almost always hung at eye level, which, as Daniel Rubinstein (2015) reminds us, “reinforces the rhetorical tropes of perspectival painting inherited from the Renaissance.” Probably even more importantly, they are also placed in a specific and stable narrative order which frames our expectations and guides our reading of the images. In the modern museum, this context is so stable and can be taken for granted to such a degree, that receded into the background and the idea emerged that the work of art is autonomous, that is, can be read independently from its surroundings and contains everything that is relevant within it. Also outside the rarefied world of auratic originals, books, magazines and even commercial catalogs are presenting images as part of a particular cultural narrative that stabilizes their meaning and helps to assess their correspondence to an external reality.

These two elements – the apparatus as a mechanistic image-recording device and the insertion of each image into a specific and decipherable cultural narrative – created the conditions in which the image could assume a representational character. Of course, not all images were representational and PR and propaganda have long abused this expectation of a correspondence between reality and image, but even these abuses relied on the underlying assumption remaining reasonable in most other cases. However, both of these elements are no longer operative in contemporary digital image production.

Politics – Generation – Reality

In the mid 1930s, Walter Benjamin was the first to notice that the ability to mass reproduce images had a profound effect on their meaning to the viewer. For one, the viewer no longer had to visit the place of the image, often a specialized location such as museum or church, where it retained some of its cult heritage. Rather, the image would come to the viewer and could be viewed on his or her own surroundings, hence it would not only appear in a different context, but also as an object of use, not distinctly different from other objects in that profane surrounding. In the process, as Benjamin wrote, the function of the image is reversed, “instead of being based on ritual, it begins to be based on another practice – politics” (Benjamin 1969). By this he meant that the meaning of the image enters into a process of negotiations, in which its heritage – its original meaning as ascertained by specialists – is no longer the decisive factor. Rather, meaning is based on what is being done with the image in the present. The image opens up, and in that opening different meanings come onto conflict with one other, hence it enters the realm of politics.

Second, the reproducibility also allows for different and conflicting arrangements of these images. In principle, every image can be brought into direct contact with any other. Continuous narrative arrangements are broken up and replaced by the principle of the montage which, as Benjamin put it, created a “chock-effect”. The shock-effect was the unpredictability of the montage and its fragmented aesthetics. For Adorno, similar to Benjamin, “the principle of montage was conceived as an act against a surreptitiously achieved organic unity; it was meant to shock” (Adorno 2002, 157). This positive effect of breaking up outdated conventions and assumptions – about images and about reality – was short-lived, however. As Adorno noted, “once this shock is neutralized, the assemblage once more becomes merely indifferent material; the technique no longer suffices to trigger communication between the aesthetic and the extra-aesthetic, and its interest dwindles to a cultural-historical curiosity” (idib.).

Under contemporary conditions, the reproducibility of images is, of course, orders of magnitude greater than Benjamin and Adorno could have imagined and the development they described has reach its potentially final point. Images are losing all traces of being rare things that have their more or less fixed place in institutions, books, magazines, or films that exist in a particular location (even as fleetingly as a projection in a cinema). Rather, they are reach us as steady stream of poor images, “cop[ies] in motion” (Steyerl 2009) on our screens, which, thanks to the mobile phone, we take with us everywhere. In other words, images form now a flow that neither has a clear origin or a clear destination, but its every essence is its ability to circulate at high speeds (Joselit 2013). That stream, moreover, is endlessly variable, its sequences and composition, at least on the level of signification, is arbitrary (Stalder 2018b). The old classification schemes – stable, coherent, and biased –, which still dominate Gutenberg era institutions such as museums and archives, are no longer capable of organizing this mass of images. They give way to dynamic arrangements, unstable, without coherence and differently biased. Anything can be followed by anything else, and the sequence itself does not need to contain a human-readable meaning, very much unlike the montage in a collage or a film. Thus, the process of removing the image from its origin and opening up its meaning has become so extreme, that images in and of themselves are loosing all its meaning, because, depending on a freely variable context, they can mean more or less anything (Rubinstein 2018). Our entire visual experience, it seems, has turned onto a gigantic, high-speed, never-ending Kuleshov effect.² Everything is rearranged constantly, producing endless meanings, which, potentially, cancel each other out. In this sense, both Benjamin, who saw this as an opening for new political engagements, and Adorno, who saw how the cultural industry used it for its own controlling purposes, remain relevant. The first view points to the emergence of ever new “communities of practice” (Lave and Wenger 1991) which – by organizing their own selection of materials – produce specialized knowledges and cultures, often in contradiction to mainstream positions. This has led to a greater cultural diversity as part of a long historical trend of social liberalization, even if also the enemies of this development profit from the greater freedom this affords (an old problem, which Karl Popper called the “paradox of tolerance”³). But, at the same time, as the streams of images and other cultural artifacts became ever larger, the role of algorithms in organizing them has grown ever more pronounced, tipping the balance of meaning-making power decisively in favor of those who control the infrastructure that enables and channels these flows. The problem is, “meaning-making” in this case is not necessarily human-readable, at least not for those swimming in the thus generated streams. Rather, it’s an machinic meaning, geared not towards intelligibility, but towards metrics, engagement in this case, a stimulus-response mechanism which provides the energy to keep the streams flowing. Meaning, in a cultural, narrative sense is not necessary for this strategy. If Adorno & Horckheimer’s cultural industries produced false consciousness, contemporary social mass media require no consciousness at all. Exited confusion is all that is necessary. As a consequence, the cultural vectors that stabilized images are no longer operative.

The technical apparatuses of image-making (distinct from reproduction and circulation) have been transformed as least as profoundly. Even devices that create relatively conventionally-looking images, say cameras on mobile phones, are not longer passive recording devices, but are actively and adaptively generating images, often by combining multiple shots into a single image and processing the underlying data according to some statistically-defined criteria of a “good image”. The relation between the recording of an external reality and the computational generation of an image is, in practice, unknown and constantly changing. Both in terms of the concrete situation in the image is being recorded (sometimes more, sometimes less computational adjustments are necessary to make it adhere to predefined standards), and in terms to the capacities of the device which changes as the software is updated, with or without the knowledge of its user. But this is only the tip of the iceberg. New post-processing methods are now capable of not only adding or removing static details, but of transforming the very structure of the image, including moving images, turning a rainy day into a sunny one, adding the face of one person onto the body of another one, and so so. At the same time, computer graphics are constantly advancing, leading to a merger between images that are recorded and images that are generated, to the degree that their combination becomes a question of production expediency, rather than ontology. Under such circumstances, images can no loner make reliable representational claims.

Unsurprisingly, the most extreme case of such generated images, called “deep fake”, because they use “deep” learning methods to create non-representational (“fake”) images, have been called a “threat do democracy” because people could no longer trust the images they see on social media (Parkin 2019), reviving, once more, debates about the loss indexicality of digital images. However, despite the technology to produce deep fakes being widely available and the political contest of the most recent US presidential election being fought bitterly around very narrow margins, deep fakes played no role in it. While the reasons are not entirely clear, it has been suggested that the production of deep fakes is still somewhat cumbersome and easy-to-detect, and, more importantly, not necessary because there are other, cheaper and more effective means to spread political disinformation (Simonite 2020). Deep fakes are, however, are very damaging in another areas. According to current estimates, 96% of all deep fakes are pornography related, usually by placing the face of a women into the body of an actor in a pornographic scene (Ajder et al. 2019). The problem here is not the untruthfulness of the image. That is usually quite obvious. Yet, there are effective, not by revealing an already existing truth, but by colonizing the imagination. They are not about the past or the present, they are about the future. Such images cannot be “unseen”, that is, forgotten, even if the viewer knows that these images are entirely fabricated. They clearly visualize a personalized misogynist fantasy – the reduction of a particular woman to a sex object – but it is exactly the power of the projection that does the harm, not its confusion with reality. These videos show a vision if the how the world should be (in the eyes of those who produce them), that gains its power from being counterfactual. In other words, the sequence of relations has been turned around. As part of a political vision, an image is generated which then shapes reality.

From Representation to Evidence

This new sequence of relationships creates a challenging situation. It undermines established forms of asserting truth (by way of representational images), and often empowers undemocratic, authoritarian actors (with misogyny being a form of authoritarianism). What should be done about this? There is a tendency to task the providers of the infrastructure with policing the flows of material through their networks, hoping to reestablish the boundary between indexical and generated images.⁴ While some of this might be necessary, it’s very possible that this does more damage than good. First, with digital images, drawing this boundary is more or less arbitrary. Second, delegating this power to social media companies only further concentrates power in their hands, or, more precisely, in the hands of their owners (Facebook’s majority owner, by statue, is a single person, Mark Zuckerberg). Do we really want to put that much power, that is so poorly defined, in such few hands?

Another approach might be to drop the simple representational claim of the imagine inherited from the renaissance, the central perspective and the photographic lens. That claim, to stress the point, was a double one. First, that the three dimensional world could be transferred to a two dimensional pane corresponding to what a single person could see, at a particular time, from a particular vantage point. Second, that it was possible to make sense of the world, from the point of view of a single person, standing at a particular point, at a particular time. While the first claim can still be technically accurate (if less and less practiced), it’s importance is greatly diminished because in a world of dynamic, distributed complexity, the single point of view is no longer capable of representing reality. In such a context, visual accounts of reality become based on data visualization. They do not point to a single thing – neither in front, in, or behind the camera – but rather to dynamic networks of relationships. In other words, images are composites. Their truth claim lies in their method of composition, rather than in their indexicality.

It’s not surprising that the defining image of the Covid-19 epidemic is not photograph, but graph, the curve representing the rise of infections in relation to the local healthcare system’s capacity to care for patients and that the image, representing past, present and future, is not only descriptive, but, by being part of the situation is makes accessible, also perspective (“flatten the curve!”).

It is, perhaps, the best to first acknowledge the artificiality of all images (even those that cultural habit makes appear natural) and not to fight, but to embrace the composite character of contemporary images, both in terms of the production of individual images and sequences, but also in how these fragmentary, fragile and contested accounts are put together to create a more complex account of the word beyond the here and now visible to the naked eye.

With this in mind, the artist Paolo Cirio calls for a new approach which he names “evidentiary realism”, to provide a new aesthetics of truth claims. In the catalog of the eponymous exhibition, he writes:

The real is present and concrete, yet complexity, scale, speed, and opacity hide it from sight. The contemporary features of the social landscape are unintelligible at first glance. Although we see the shocking results of our social reality, we are nonetheless often unable to see the systems and processes that generate such conditions. … [Evidentiary] realism looks beyond visible social conditions. [It] examines the underpinning economic, political, legal, linguistic, and cultural structures that impact society at large. These evolving social fields are highly interconnected and often too complex and high-speed to grasp—if not secret, imperceptible, opaque, or manipulated by advanced rhetorical devices. Reality today can only be fully apprehended by pointing at evidence from the language, programs, infrastructures, relations, data, and technology that power structures control, manipulate, and hide. (Cirio 2017, 3)

In other words, given the complexity, scale and speed, contemporary reality can no longer be represented by an image. However, evidence of its constitutive processes can be assembled in images. Perhaps the most advanced example of this post-indexical realism is the work of Forensic Architecture, a multidisciplinary research group based at Goldsmiths, University of London. In this work, complex composite images (still and moving) are created from a myriad of sources, some created with intention, say, a smart phone video, some recorded automatically, say, GPS data. Each of these sources, in isolation, is so partial and fragmentary, recording only tiny aspects of the complex situation, all of them are floating around in ways that render their meaning highly unstable and contestable. Rather than looking for the one strong image (or, more generally, a single reliable data source), Forensic Architecture assembles a great number of weak, contested and unstable data points (some of which are images in the traditional sense, but their meta-data is equally important) and relates them so that they provide a stabilizing, interpretative context for each other. In this manner, from weak materials, as strong composite emerges. Aesthetically, the fact that images are a composite is not hidden, on the contrary, it is a major narrative devise, it opens up the composite to an examination of the methods of composition and thus makes its truth claims accessible and debatable. It’s important to note that Forensic Architecture sees it’s practice not as uninterested observation, but as part of a counter-hegemonic struggle (Weizman 2017). Its practice often is situated in the context of court trials where evidence presented is always partisan – presented by either one of the parties with an interest to advance its case – but also with a required claim to factuality.

Figure 2: Forensic Architecture and the Invisible Institute, still from Six Durations of a Split Second: The Killing of Harith Augustus (2019).

Embracing the artificiality of the image for creating a new kind of realism – de-centred, multi-perspective, composed of heterogeneous sources, transparent to the underlying processes (of data generation and composition) – open us the most promising approach to overcoming the crisis of the image and generating new visual truth claims. It acknowledges both the new generative character of image-making as well as the contemporary character of reality – complex, dynamic and drawn out in space and time, full of actors with their unique vantage points and saturated with diverse data and media. In other words, the composite image creates both an aesthetic and epistemological regime. It updates the capacity of make debatable truth claims, which lies at the heart of collective, peaceful contestation of the past, present and future that is democracy. In this spirit, we should to embrace, rather than fear, the artificiality of the composite image.

References

Adorno, Theodor W. 2002. Aesthetic Theory. eds. Rolf Tiedemann and Gretel Adorno. London; New York: Continuum.

Ajder, Henry, Giorgio Patrini, Francesco Cavalli, and Laurence Cullen. 2019. The State of Deepfakes: Landscape, Threats, and Impact. Deeptrace. https://regmedia.co.uk/2019/10/08/deepfake_report.pdf .

Batchen, Geoffry. 1994. “Phantasm: Digital Imaging and the Death of Photography.” Aperture 136: 46–51.

Belting, Hans. 2011. Florence and Baghdad: Renaissance Art and Arab Science. 1st English language ed. Cambridge, Mass: Belknap Press of Harvard University Press.

Benjamin, Walter. 1969. “The Work of Art in the Age of Mechanical Reproduction (1935).” In Illuminations, ed. Hannah Arendt. New York: Schocken Books, 214–18.

Cirio, Paolo. 2017. Evidentiary Realism: Investigative, Forensic, and Documentary Art. Berlin: Nome Gallery.

Edgerton, Samuel Y. 2009. The Mirror, the Window, and the Telescope: How Renaissance Linear Perspective Changed Our Vision of the Universe. Ithaca: Cornell University Press.

Gunning, Tom. 2004. “What’s the Point of an Index? Or, Faking Photographs.” NORDICOM Review 10(1/2): 39–49.

Haraway, Donna. 1988. “Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective.” Feminist Studies 14(3): 575–99.

Joselit, David. 2013. After Art. Princeton: Princeton University Press.

Latour, Bruno. 1986. “Visualization and Cognition: Drawing Things Together.” In Knowledge and Society - Studies in the Sociology of Culture Past and Present, ed. Henrikka Kuklick. Stamford, CT: JAI Press Inc, 1–40. http://www.bruno-latour.fr/sites/default/files/21-DRAWING-THINGS-TOGETHER-GB.pdf (February 4, 2021).

Lave, Jean, and Etienne Wenger. 1991. Situated Learning: Legitimate Peripheral Participation. Cambridge [England] ; New York: Cambridge University Press.

McLuhan, Marshall. 1962. The Gutenberg Galaxy: The Making of Typographic Man. Toronto: University of Toronto Press.

Parkin, Simon. 2019. “The Rise of the Deepfake and the Threat to Democracy | Technology | The Guardian.” The Guardian online (June 22). https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-of-the-deepfake-and-the-threat-to-democracy (January 6, 2021).

Rubinstein, Daniel. 2015. “What Is 21st Century Photography?” The Photographers’ Gallery Blog (Feb. 23). https://thephotographersgallery.org.uk/content/what-21st-century-photography (January 4, 2021).

———. 2018. “Post-Representational Photography, or the Grin of Schrödinger’s Cat.” In Photography Reframed: New Visions in Contemporary Photographic Culture, eds. Benedict Burbridge and Annebella Pollen. London New York: I.B. Tauris, 8–18.

Simonite, Tom. 2020. “What Happened to the Deepfake Threat to the Election?” Wired (Nov. 16). https://www.wired.com/story/what-happened-deepfake-threat-election/ (January 6, 2021).

Stalder, Felix. 2018a. “From Inter-Subjectivity to Multi-Subjectivity: Knowledge Claims and the Digital Condition.” In BEING PROFILED:COGITAS ERGO SUM, eds. Emre Bayamlioglu, Irina Baraliuc, Liisa Janssens, and Mireille Hildebrandt. Amsterdam: University of Amsterdam Press, 98–101. https://www.aup.nl/en/book/9789463722124/being-profiled-cogitas-ergo-sum (May 3, 2019).

———. 2018b. The Digital Condition. Cambridge, UK ; Medford, MA: Polity Press.

Steyerl, Hito. 2009. “In Defense of the Poor Image.” e-flux Journal (November) (10). https://www.e-flux.com/journal/10/61362/in-defense-of-the-poor-image/ (January 6, 2021).

Von Foerster, Heinz. 2003. “Cybernetics of Cybernetics.” In Understanding Understanding: Essays on Cybernetics and Cognition, New York: Springer, 283–86.

Weizman, Eyal. 2017. Forensic Architecture: Violence at the Threshold of Detectability. Brooklyn, NY: Zone Books.

Source: Stalder, Felix. 2021. “Composite Images. On the Transformation of Visual Truth Claims.” In Automated Photography, edited by Milo Keller, Florian Amoser, and Claus Gunti. Mörel Books & Ecal, p.213-22

This also applied to Brunelleschi. At the time, a flat mirror was a sophisticated and advanced apparatus and the optical/mathematical knowledge had arrived only recently in Europe from the Arab world (Belting 2011). ↩︎
“The Kuleshov effect is a film editing (montage) effect demonstrated by Soviet filmmaker Lev Kuleshov in the 1910s and 1920s. It is a mental phenomenon by which viewers derive more meaning from the interaction of two sequential shots than from a single shot in isolation.” https://en.wikipedia.org/wiki/Kuleshov_effect ↩︎
“The paradox of tolerance states that if a society is tolerant without limit, its ability to be tolerant is eventually seized or destroyed by the intolerant. Karl Popper described it as the seemingly paradoxical idea that in order to maintain a tolerant society, the society must be intolerant of intolerance.” https://en.wikipedia.org/wiki/Paradox_of_tolerance ↩︎
European Commission (2019). Code of Practice on Disinformation. October.
https://ec.europa.eu/digital-single-market/en/code-practice-disinformation ↩︎