I graduated in 2009 from Kamchatka State University, Russia, with a degree in linguistics and translation studies and then moved to Australia, where I continued studying, started up a translation business and worked for several major IT research centres in Australia as a coordinator of an educational project for school children. At this job, I had met and worked with many amazing researchers in machine learning, AI and computational linguistics, and eventually I got inspired to dive into information technologies myself and decided to combine my interests – linguistics and computer science, learnt foundations of programming though online courses, and in 2017 started a new career in Luxembourg as a PhD student at the Digital History and Hermeneutics Doctoral Training Unit. I am in my third year of doctoral training now, working on a project exploring a corpus of Australian Aboriginal autobiographies using computational text analysis tools and methods.
Corpus-based study of Australian Aboriginal autobiographies with text mining methods
In the 1960s, almost 200 years after Australia was invaded by Europeans, the Aboriginal people turned to life writing to share an alternative history using the oppressors’ language, English. As they inevitably see the world through the prism of their experiences of racism, oppression, exploitation and injustice, the genre of Aboriginal writing is defined by its content and the language use challenging European perceptions, concepts, and values
My project is exploring Indigenous Australian life writing as a literary genre and a historical source in the social and political context through applying the exploratory “distant reading” techniques to a corpus of autobiographical works published between 1960s-2020. I am particularly interested in the Digital Humanities applications of word embeddings modelling based on the distributional hypothesis which suggests that words occurring in similar contexts tend to be semantically close. This hypothesis is implemented as distributional semantic models that extract distributional information from a corpus and compute, based on this information, semantic similarity between words. These models are applied as an unsupervised technique to explore semantical spaces and discourses in large textual data.
Supervisor: Prof. Dr. Christoph Schommer
Christof Schöch lectures on the use and abuse of word embedding
Text as Data at DH Benelux 2019
Australian Aboriginal Autobiographies Database
Word Embeddings in Humanities