EcoCor (A Programmable Corpus for Research in Ecocriticism)

What is the project about?

EcoCor is a programmable corpus (incl. an API) for computational research of literary texts that have repeatedly served as references for research in ecocriticism.

What is your research question?

We are prototyping a corpus hosting infrastructure & a research platform for literary texts (initially in German and English-language, but with potential of extension to other languages) that have repeatedly served as references for research in ecocriticism. Basis are TEI-coded corpora TEI-encoded corpus of such texts, as well as a microservice that extracts plant and animal taxon names from literary texts. Results of the extraction routine can be queried via the API so that it will be possible to further investigate how animal & plant life is represented and reflected in the texts. With this we e.g. hope to quantify diversity patterns underlying the distribution of such names across the texts in our corpora, following the example of (Langer, L., Burghardt, M., Borgards, R., Böhning-Gaese, K., Seppelt, R., & Wirth, C. (2021). The rise and fall of biodiversity in literature: A comprehensive quantification of historical changes in the use of vernacular labels for biological taxa in Western creative literature. People and Nature, 3, 1093– 1109. https://doi.org/10.1002/pan3.10256)

How is the project and/or case situated?

We build on the idea of programmable corpora (Cf. Fischer et al. 2019; Börner, Trilcke 2023). When it comes to the research question that we want to implement into our API, a similar investigation is described here (limited to an English language corpus, and not attempting to build a reusable pipeline): Langer, L., Burghardt, M., Borgards, R., Böhning-Gaese, K., Seppelt, R., & Wirth, C. (2021). The rise and fall of biodiversity in literature: A comprehensive quantification of historical changes in the use of vernacular labels for biological taxa in Western creative literature. People and Nature, 3, 1093– 1109. https://doi.org/10.1002/pan3.10256

What methods, data sets, and tools are used?

Our (draft) dataset consists of two corpora which include a first selection of literary texts (in German and English-language) that have repeatedly served as references for research in ecocriticism https://github.com/dh-network/ecocor/tree/main/corpus/tei

We apply such methods as gazetteer-based entity extraction from texts and analysis of word frequencies distribution.

Who is part of the team?

Sören Barkey (University of Potsdam), Ingo Börner (University of Potsdam), Clara Funk (Europa-Universität Viadrina Frankfurt (Oder)), Viktor Illmer (Freie Universität Berlin), Carsten Milling (University of Potsdam), Mark Schwindt (Ruhr-Universität Bochum), Daniil Skorinkin (University of Potsdam), Henny Sluyter-Gäthje (University of Potsdam), Peer Trilcke (University of Potsdam)