Reflections on Digital Source Criticism – Doctoral Training Unit “Digital History & Hermeneutics”

From 30 to 31 October 2017 we had a DTU skills training on Digital Source Criticism, organised by Prof. Dr. Andreas Fickers, Dr. Stefania Scagliola, and Andy O’Dwyer. In this report we reflect on the lecture, exercises and demonstration of several digitisation tools.

Dealing with meta-sources in the age of big data

(Lecture Prof. Andreas Fickers)

Professor Andreas Fickers started his talk with the observation that there is a growing gap between the technological development in our society and in the humanities. According to Roy Rosenzweig it is a matter of epistemological urgency to keep up with this technological development if the humanities don’t want to become obsolete.¹ For historians this has strong implications for each individual step of the research process. Each of those traditional steps corresponds with a new set of skills (a form of multimodal literacy).

• searching: algorithmic criticism
• documenting: digital source criticism
• analysing: tool criticism
• presenting: interface criticism

Based on this list, digital historians should try to answer the following questions:

1. How does digitization affect the concept and function of the archive / archiving?
2. What new heuristics of search are needed in the age of internet and big data?
3. How to develop a critical methodology of digital source critique?
4. What new historical questions can new digital techniques produce?
5. What are the new possibilities of digital storytelling?
6. Does digital history enable a new public engagement with history?

Digitisation and Archiving

Some of those questions connect to existing problems of source criticism. Two well-known principles archivists use are respect des fonds and respect de l’ordre. Both principles ensure that the temporal and spatial information the sources provide are not distorted. In the digitization of large parts of our cultural heritage however, we can see a shift from conservation to accessibility. The digital age has shifted the challenge of scarcity to one of abundance. The source has transformed into a document and has now become data. With the advent of the internet, the control of the archive has been fractured and the power relation between user and institution has changed. Digitisation challenges our traditional concepts of source critique and archival logic such as originality.

Heuristics of Search

Peter Haber described the impact of using search engines in digital collections on traditional heuristics in history as the “Google-syndrome”.² Where our search previously started from browsing through the library shelf or archive catalogues, we now search for a specific keyword. Yet search engines are a black box as Stephen Ramsay in his book on Reading Machines points out, and scholars have rarely looked at algorithmic critique.³ Shouldn’t we understand the mechanisms that provide us with those ‘top results’ and push ‘less relevant’ information to the bottom of the page? The main question that arises is whether the democratisation promised by the internet and search engines such as Google in particular have led to the deprofessionalisation of the historical discipline. Does everyone who can look up information on an event from the past, anyone who can use wikipedia, qualify as a historian? Do those university trained historians use search engines differently than unqualified amateur historians? Are those academic historians who refuse to use digital tools and search engines better researchers? Or do they perhaps overlook new insights and information that is not published or added to their library, but comes from those amateur historians who ask valid and unanswered questions?

Digital Source Critique

Historians in the digital age should be able to understand and interpret the codes and conventions of mediated representations of the past. Digital sources differ from analogue sources because digital sources can be understood as meta-sources going through their own life cycle. There is no original or authentic digital copy since after the creation or digitisation of a source, the material is enriched, edited, retrieved several times, and finally analysed and presented in different ways. Carl Lagoze talks about data integrity in his 2014 article Big Data, data integrity, and the fracturing of the control zone and mentions the six V’s: volume, velocity, variety, validity, veracity, and volatility.⁴ He assess the data based on Volume or size, Velocity or speed of accumulation, Variety referring to heterogeneous data types and models, Validity or bias and noise, Veracity or correctness and accuracy, and Volatility or life span of the data. Andreas ended this section with an interesting quote from Google’s previous CEO Eric Schmidt: “If content is king, context is its crown.”⁵

Tool and interface criticism

According to Jim Mussel we went from digital history 1.0 to 2.0 by moving from tools for analysis to tools for manipulation.⁶ One of the main questions we need to ask of our digital tools is whether they are sustainable or not. Is the tool open source for example, and what are its limitations? Are the results you generate reproducible now, in 5 years, in 20 years, in 100 years? Are the workings of a tool clear and do you understand exactly what happens? Andreas reminded us to make the limitations of tools explicit in our research. In the copy-paste culture we live in today, how do we maintain individual research archives, how do we implement strategies of inventorisation for born-digital materials and do we have a clear information management strategy. And related to tool criticism, we should also understand the relation between the back-end and the front-end, which Andreas named interface criticism. For example, will a website appear exactly the same in different browsers, on different screens? Will you get the same search results in different countries, or by different users?

Digital Storytelling and Public History

With the surge of different media, digital literacy has become relevant, especially for historians aiming at public outreach. One of the possibilities to communicate research findings is through transmedia storytelling. Transmedia storytelling implies a multiperspective historical interpretation instead of a single master-narrative. Writing history in the digital age means that historians do not only use digital-born material, but write and create digital-born studies of history. Furthermore, through blogging and other platforms, open peer review, as well as open access have been introduced to the field. These developments allow visibility, even of private collections, and engage the public in new ways, such as crowdsourcing.

In conclusion, what is described here is the uncertainty in the communication of research, uncertainty about methodologies, and of epistemology. Digital history means rethinking historical methods and source criticism on a fundamental level. We should create a shared framework of concepts, definitions and approaches for digital history and lay the foundations for a discipline that is now confronted with developments in other fields and needs to come up with answers.

Literature Review

(A. Tagging Literature)

During the first exercise, we had to tag thirty articles with the help of the abstracts written before the skills training. The terms used for the tagging were provided by the DH glossary and new terms could be suggested in an additional column. On many occasions, the DH glossary did not provide a fitting classification of the topics discussed in the articles. While in some instances (for example the entry digital history) the definitions of the terms were incomplete. After tagging the articles, each group chose those articles that were most accurate in defining and describing digital source criticism. An interesting discovery during this part of the exercise was that many of the articles we had to read were not directly related to digital source criticism. The two articles that stood out for us were the texts by Catherina Schreiber Genuine Internetdaten als historische Quellen – Entwurf einer korrealistischen Quellentheorie and Harold Rheingold On Crapp Detection.⁷ Here digital-born sources are the main focus.

Source Transformation

(B. Envisioning the Phases of Transformation)

Drawing

One of our hands-on exercises consisted of drawing out the phases of transformation from the original to the digital source. As an example we chose a handwritten document from the 17th century that Thomas brought in printed format. We envisioned the steps taken to digitise the original handwritten document from a technical point of view. Our first stop was the flatbed scanner which transformed paper and ink into bits and bytes. On a smaller scale we looked at the inside of a camera capturing the light through a lens with several filters and sensors creating a matrix of red/green/blue or RGB values stored in a certain file-format such as JPEG, PNG, or TIFF. The second step was to store this information on a hard drive encoding those bits and bytes on a magnetic disc representing 1 and 0 as a pattern of magnetisation. During the third phase the silicon chips inside a computer needed to perform all the work and through the basic input/output system (BIOS), operating system (OS) and applications and/or software send the information to the final stage. The final stage was the monitor or screen that displayed the image of our document using a backlight behind a raster of RGB-pixels. What we didn’t take into account is the information which gets lost in translation. However, we will discuss this issue when we look at the digitisation process in detail.

Capturing the interior of the BBC TV centre

Andy O’Dwyer introduced the group to a feature of Google Maps that we can foresee having numerous implications for the future of architectural history: mapping the interiors of buildings. It has always been a useful feature to drop the Google Maps human icon onto a location to explore a city or town, but the ability to enter buildings and walk around gives a unique perspective into the particular culture of a space. Research questions may include: what artwork was hanging on the walls when this building was populated? How did the building layout reflect the hierarchy (or lack thereof) of professional relationships? What did the landscape around the building look like from various window views at a certain time? This approach to conceptualise immersive environments and visual reconstructions has tremendous implications for the future of virtual reality worlds.

Transforming the interview collection of David Boder

Dr. Stephania Scagliola described the textual and audio trail left behind by David Boder, interviewing victims of the Holocaust right after the second World War. He recorded the interviews onto carbon wire and then converted these recordings to steel wire which lasted longer. The recording device and wires caused a logistical mess, but the steel wires have been preserved and rediscovered decades later. In the next part of the process, David Boder himself translated most of the interviews into English by voice, which was typed by an assistant and edited and retyped after his final approval. These text transcriptions were then published on microcards with a device called a mimeograph.

The sound is mainly preserved through copies he sent to the funder of the project. The Library of Congress converted the recordings on steel wire to magnetic tape and digitised these tapes with a Sony PCM 1630 onto VHS tape. In 1999 the digital audio tape was transcribed again and the recordings underwent several transformations onto CD and WAV formats. Furthermore the sound was remastered and manipulated through splitting and aggregating the audio. The textual transcriptions were transformed into HTML for the original website, followed by a new website which now contains 50 new interviews that were transcribed digitally in the original language and translated by a professional. The project can be found at http://voices.iit.edu/projectnotes.

What stood out from this process is how quickly technology evolves, leaving behind relics that are not always easy to transform, especially when ‘readers’ such as cassette-players, VHS-players, or software for reading WAV-files disappear, go out of fashion or cannot be repaired any longer. The tendency for archives to publish selected material onto a website has positive and negative consequences. On a positive note the material often reaches a larger audience, however, websites require constant attention since browsers evolve and therefore the back-end needs to keep up. A website created 10 years ago might not even open today, and if it does open the layout looks terribly outdated and unattractive for researchers or the public to engage with the materials provided.

Digitisation

(C. Actual Digitisation of Data)

Book scanner

Considering all of us use books for our research, the book scanner is intuitively of importance. We all digitised books at our disposal, some scanned for us by Andy. To see the book scanner in action provided an interesting insight in the labour that goes into scanning. Although the book scanner is advertised as automated scanning, allowing hundreds of pages to be scanned and OCR’ed automatically per hour, the reality is very different. Andy showed us his best practices, including cleaning dust from book and scanner, specifying the page size in the software, and keeping track of whether the machine missed a page by flipping too many pages at once. After 40 minutes, we had digitised about 20 pages. Overall this gave a very interesting reflection on the hidden labour behind digitisation, the tacit knowledge required to operate ‘automated machines’, and the decisions made while digitising a book.

Image scanner

Aurélia Lafontaine showed us the image scanner that scans photos one by one, or in the case of a filmrol can scan a number of negatives in a single run. The scanner appears to be relatively “plug-and-play” and it was quite simple to scan a document and save it on the hard drive. We played with both the “professional mode” with many options, as well as the “automatic mode” which only offers a “scan” button to see whether having all these options results in better scans. The automatic mode worked surprisingly well. We encountered two issues of interest, that we could not resolve however. First, the colour of a scan on the screen is not the same colour as the picture in your hand. Whether this is an artefact of the screen or of the scanner is hard to tell and would require testing the image on multiple screens. Second, Aurélia advised against turning an image using an image viewer, as this changes the file size without explicating what information is changed or lost. These two issues made us aware that how images are created on-screen is actually a complex process we do not fully comprehend by looking at the scanner itself.

Sound

Sound technology can be applied to such a wide variety of historical research projects that we wish more time was available to dive into the tools offered at the university, both analog and digital. Together with Stefan Krebs our group focused on translating an “interview,” in this case a short sound bite from each of us using a tape recorder, into a digital object for use within the Pro Tools software environment. We discussed the differences in signal between analog and digital, and ways of recording to avoid irreparable distortions and other preventable problems whenever we’re in the field.

3D scanner

The 3D scanning workshop was an extremely informative introduction to the practices surrounding the creation of 3D objects on multiple scales (e.g. a single item, an entire architectural structure, etc.) using photogrammetry. It was helpful to discuss how experts create, manipulate, and describe physicality in a digital context with our colleague Marleen De Kramer. Our group took a series of photos of a small wooden figurine and translated this into a point cloud, and then a 3D object. We encountered a number of problems during the process, but through failure we were able to develop a better understanding of the pitfalls involved.

Defining Digital Source Criticism

(D. Genealogy of the Term)

In the final exercise, we first read a number of articles selected by Stefania to see how different authors used source criticism or digital source criticism in their writings. Second, as a multilingual group, we considered the different terms for “digital source criticism” in other languages, notably German, Dutch, Italian, and Japanese. We found that the term is translated quite literally to other languages. Finally, we traced the genealogy of terms such as “digital”, “source”, “source criticism”, and “digital source criticism”. While it is relatively easy to find the etymology of a single term, it is much harder to do so for combinations of terms. For example, we could not find who was the first to speak of “digital source” and how this was embedded in terminology known at that time. We looked on Google Scholar for the usage of “digital source criticism”, and the earliest occurrence seemed to be a chapter from 2001.⁸ We noticed that virtually nobody else cites this piece, and that other publications on “digital source criticism” tend to come up with their own interpretation rather than embed its use in existing literature. “Digital source criticism” is not an agreed upon method, but more importantly there is hardly any debate about this: articles envision what it could be, rather than try to create boundaries of what best practices should be. We therefore concluded that “digital source criticism” is not an existing process, or part of shared practices, but a denominator for a method that is envisioned for the future.

Rosenzweig, Roy. “Scarcity or Abundance? Preserving the Past in a Digital Era.” The American Historical Review 108, no. 3 (June 2003): 735–62. p. 738.
Haber, Peter. 2011. Digital Past: Geschichtswissenschaft im digitalen Zeitalter. Oldenbourg Wissenscheftsverlag. ISBN: 3486707043, 978348707045. https://books.google.lu/books?isbn=3486707043.
Ramsay, Stephen. 2011. Reading Machines: Toward an Algorithmic Criticism. University of Illinois Press. ISBN: 0252093445, 9780252093449. https://books.google.lu/books?isbn=0252093445.
Lagoze, Carl. 2014. “Big Data, data integrity, and the fracturing of the control zone.” Big Data & Society. https://journals.sagepub.com/doi/10.1177/2053951714558281.
Quoted in Snickars, Pelle. 2012. “If Content Is King, Context Is Its Crown.” Journal of European History and Culture, volume 1. From the James MacTaggart Lecture of 2011 https://www.youtube.com/watch?v=hSzEFsfc9Ao. Full transcript at https://gigaom.com/2011/08/26/419-watch-live-here-eric-schmidts-edinburgh-keynote-and-twitter-reaction/.
Mussel, James. 2013. “Doing and making. History as digital practice.” in History in the Digital Age, ed. Toni Weller, Routledge, p.1-20. https://books.google.lu/books?isbn=0415666961.
Schreiber, Catherina. 2012. “Genuine Internetdaten als historische Quellen – Entwurf einer korrealistischen Quellentheorie.” Zeitschrift für digitale Geschichtswissenschaften. Universität des Saarlandes, vol.1, p.1-15. https://orbilu.uni.lu/handle/10993/7981. Rheingold, Harold. 2012. Crap Detection Mini-Course. http://rheingold.com/2013/crap-detection-mini-course/
Reisinger, G. (2001). Digital Source Criticism: Net Art as a Methodological Case Study. Netpioneers, 1, 123-142.