Digital Historians often manifests in two forms: a person with a mixed background in History and Computer Science or as a collaboration between a historian and a computer scientist.
The latter, in particular, comes with combining challenges such as finding a common vocabulary, negotiating new forms of knowledge, different writing styles and in general two completely different perspectives. One example is their different takes on models.
A model is a simplification of the very complex real world. It preserves those features that we are interested in analysing and intentionally omits any other variables. For example, a world map is a model of the surface of the earth. A political world map would show information about countries, borders, cities and roads, while a natural world map would only show mountains, rivers, oceans and so on. This, of course, comes with a trade-off. On the one hand, the more information the model preserves the more accurate the model is. On the other hand, the model will be more complex and more difficult to analyse.
I have found that computer scientists and historians tend to polarise at the two extremes of the spectrum. Computer scientists have the tendency to reduce the complexity of a model very lightheartedly, while historians see every simplification as a loss.
In my work with Sytze Van Herck “Mind the Gap Gender and Computer Science Conferences” (2018) we analysed gender differences in Computer Science and how gender affects collaboration patterns. The dataset used is publicly available 1 and contains information about computer science articles, including authorship information. In this research, gender does not refer to one own’s gender identification, but rather what other people perceive. Someone’s perception of a potential collaborator’s gender, and the associated stereotypes, can affect the probability that these people will connect and publish papers together. Human interaction is complex and this perception may be determined by several factors such as physical appearance, gender identity or simply their full name on a paper.
The tool used to assign gender is the Genderize API. It can predict the gender of a person based on their first name. This tool simplifies gender by defining it sorely based on first name and ignoring other factors. It also defines gender as being either male, female or undefined. However, it can represent the complex concept of gender, add all the information relevant to our study and still be reasonably simple. This model may not be 100% accurate, but a 100% accurate model would be too complex or may be even impossible to obtain. The gender predictions obtained are then used as a proxy of the perceived gender of these researchers and are used for analysis.
Using the Genderize API, we were able to quantify the gender bias in the computer science research and look at a complex patterns of collaboration. Many of these tools are far from perfect, therefore the key to digital history projects is to document and justify any choice in terms of tools used, assumptions made and the interpretation of the results.