Koder - A multi-register corpus for investigating register variation in contemporary German

Keywords: corpus design; Koder; register; German

Abstract

This paper introduces the design decisions in building the Koder corpus, a multi-register-corpus of contemporary German. The purpose of this corpus is to serve as a basis for the investigation into the use of German across registers. In order to construct a representative corpus, the essential considerations are: the type and number of registers to include, the number of texts in each register and minimal text length. The paper describes which aspects were central in determining these issues as well the corpus composition and the necessary text processing.

References

Adamzik, Kirsten. 2016. Textlinguistik. Grundlagen, Kontroversen, Perspektiven. Berlin: Mouton de Gruyter.

Barbaresi, Adrien. 2012. German Political Speeches, Corpus and Visualization. http://purl.org/corpus/german-speeches (15 November, 2016.)

Beißwenger, Michael. 2013. Das Dortmunder Chat-Korpus: Ein Annotiertes Korpus zur Sprachverwendung und Sprachliche Variation in der Deutschsprachigen Chat-Kommunikation. LINSE. http://www.linse.uni-due.de/tl_files/ PDFs/ Publikationen-Rezensionen/Chatkorpus_Beisswenger_2013.pdf (22 December, 2018.)

Beißwenger, Michael and Lothar Lemnitzer. 2013. Aufbau eines Referenzkorpus zur deutschsprachigen internetbasierten Kommunikation als Zusatzkomponente für die Korpora im Projekt “Digitales Wörterbuch der deutschen Sprache” (DWDS). Journal for Language Technology and Computational Linguistics 26/2: 1–22.

Berber Sardinha, Tony. 2004. Lingüística de Corpus. Barueri: Manole.

Berber Sardinha, Tony. 2014. 25 years later: Comparing Internet and pre-Internet registers. In Tony Berber Sardinha and Márcia Veirano Pinto eds., 81–105.

Berber Sardinha, Tony, Carlos Kaufmann and Cristina Acunzo. 2014. Dimensions of register variation in Brazilian Portuguese. In Tony Berber Sardinha and Márcia Veirano Pinto eds., 35–79.

Berber Sardinha, Tony and Márcia Veirano Pinto eds. 2014. Multi-Dimensional Analysis, 25 years on: A Tribute to Douglas Biber. Amsterdam: John Benjamins.

Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.

Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5/4: 257–269.

Biber, Douglas. 1993a. Representativeness in corpus design. Literary and Linguistic Computing 8/4: 242–257.

Biber, Douglas. 1993b. Using register diversified corpora for general language studies. Computational Linguistics 19/2: 219–241.

Biber, Douglas and Susan Conrad. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.

Biber, Douglas, Mark Davies, James K. Jones and Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1/1: 1–37.

Brinker, Klaus, Gerd Antos, Wolfgang Heinemann and Sven Sager eds. 2000. Preface. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., XXIII–XXVIII.

Brinker, Klaus, Gerd Antos, Wolfgang Heinemann and Sven Sager eds. 2000. Linguistics of Text and Conversation: An International Handbook of Contemporary Research. Volume 1. Berlin: Mouton de Gruyter.

Burger, Harald. 2000. Textsorten in den Massenmedien. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 614–628.

Busse, Dietrich. 2000. Textsorten des Bereichs Rechtwesen und Justiz. In Klaus Brinker Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 658–675.

Deutsches Referenzkorpus (DeReKo), Wikipedia Diskussionen 2015. http://corpora.ids-mannheim.de/pub/wikipedia-deutsch/2015/ (20 November, 2017)

Eroms, Hans-Werner. 2008. Stil und Stilistik: Eine Einführung. Berlin: Schmidt.

Heinemann, Margot. 2000a. Textsorten des Bereichs Hochschule und Wissenschaft. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 702–709.

Heinemann, Margot. 2000b. Textsorten des Alltags. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 604–614.

Hundt, Markus. 2000. Textsorten des Bereichs Wirtschaft und Handel. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 642–658.

IDS, Datenbank für Gesprochenes Deutsch (DGD), FOLK. http://dgd.ids-mannheim.de (9 October, 2019.)

IDS, Datenbank für Gesprochenes Deutsch (DGD), GWSS. http://dgd.ids-mannheim.de (9 October, 2019.)

Klein, Joseph. 2000. Textsorten in Bereich politischer Institutionen. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 732–755.

Kochan, Stephen and Patrick Wood. 2016. Shell Programming in Unix, Linux and OS X (fourth edition). Indiana: Addison-Wesley.

Loewen, Shawn and Talip Gonulal. 2015. Exploratory factor analysis and principal components analysis. In Luke Plonsky ed. Advancing Quantitative Methods in Second Language Research. London: Routledge, 182–212.

Simmler, Franz. 2000. Textsorten des religiösen und kirchlichen Bereichs. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 676–690.

Sinclair, John. 2005. Corpus and text – Basic principles. In Martin Wynne ed. Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, 1–16.

Veirano Pinto, Márcia. 2013. A Linguagem dos Filmes Norte-americanos ao Longo dos Anos: Uma Abordagem Multidimensional. São Paulo: PUC São Paulo dissertation.

Wiese, Ingrid. 2000. Textsorten des Bereichs Medizin und Gesundheit. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 710–718.

Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World Englishes 28/4: 421–450.

Published
2019-11-07
How to Cite
Costa, A. (2019). Koder - A multi-register corpus for investigating register variation in contemporary German. Research in Corpus Linguistics, 7, 69-83. https://doi.org/10.32714/ricl.07.04