Prof. Dr. Martin Theobald (University of Luxembourg), Dr. Robert C. Kahlert (KU Leuven)
What are (big) data? What are databases? What are database structures? What can we do with them? This skills training provided an introduction to different database systems and applications, and how to work with them in historical research. The training day offered an introduction to hand-curated data, and the various ways it can be stored: blog entries, text files, presentations, office documents, Wikis, note-taking software, spreadsheets, SQL databases. We discussed what data is, how to gather and encode it, how to link it back to its point of origin, how to normalise it, and what to do if you need more than the software supports. The training’s second day approached the topic of database structures from the perspective of big data. It provided an overview of current trends in distributed data management. We looked at how different data forms (incl. text, XML and JSON) can be handled by open-source libraries and directly processed in a distributed environment using the Apache Spark platform.