Koder - A multi-register corpus for investigating register variation in contemporary German
DOI:
https://doi.org/10.32714/ricl.07.04Keywords:
corpus design; Koder; register; GermanAbstract
This paper introduces the design decisions in building the Koder corpus, a multi-register-corpus of contemporary German. The purpose of this corpus is to serve as a basis for the investigation into the use of German across registers. In order to construct a representative corpus, the essential considerations are: the type and number of registers to include, the number of texts in each register and minimal text length. The paper describes which aspects were central in determining these issues as well the corpus composition and the necessary text processing.
Downloads
References
Adamzik, Kirsten. 2016. Textlinguistik. Grundlagen, Kontroversen, Perspektiven. Berlin: Mouton de Gruyter.
Barbaresi, Adrien. 2012. German Political Speeches, Corpus and Visualization. http://purl.org/corpus/german-speeches (15 November, 2016.)
Beißwenger, Michael. 2013. Das Dortmunder Chat-Korpus: Ein Annotiertes Korpus zur Sprachverwendung und Sprachliche Variation in der Deutschsprachigen Chat-Kommunikation. LINSE. http://www.linse.uni-due.de/tl_files/ PDFs/ Publikationen-Rezensionen/Chatkorpus_Beisswenger_2013.pdf (22 December, 2018.)
Beißwenger, Michael and Lothar Lemnitzer. 2013. Aufbau eines Referenzkorpus zur deutschsprachigen internetbasierten Kommunikation als Zusatzkomponente für die Korpora im Projekt “Digitales Wörterbuch der deutschen Sprache” (DWDS). Journal for Language Technology and Computational Linguistics 26/2: 1–22.
Berber Sardinha, Tony. 2004. Lingüística de Corpus. Barueri: Manole.
Berber Sardinha, Tony. 2014. 25 years later: Comparing Internet and pre-Internet registers. In Tony Berber Sardinha and Márcia Veirano Pinto eds., 81–105.
Berber Sardinha, Tony, Carlos Kaufmann and Cristina Acunzo. 2014. Dimensions of register variation in Brazilian Portuguese. In Tony Berber Sardinha and Márcia Veirano Pinto eds., 35–79.
Berber Sardinha, Tony and Márcia Veirano Pinto eds. 2014. Multi-Dimensional Analysis, 25 years on: A Tribute to Douglas Biber. Amsterdam: John Benjamins.
Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.
Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5/4: 257–269.
Biber, Douglas. 1993a. Representativeness in corpus design. Literary and Linguistic Computing 8/4: 242–257.
Biber, Douglas. 1993b. Using register diversified corpora for general language studies. Computational Linguistics 19/2: 219–241.
Biber, Douglas and Susan Conrad. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.
Biber, Douglas, Mark Davies, James K. Jones and Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1/1: 1–37.
Brinker, Klaus, Gerd Antos, Wolfgang Heinemann and Sven Sager eds. 2000. Preface. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., XXIII–XXVIII.
Brinker, Klaus, Gerd Antos, Wolfgang Heinemann and Sven Sager eds. 2000. Linguistics of Text and Conversation: An International Handbook of Contemporary Research. Volume 1. Berlin: Mouton de Gruyter.
Burger, Harald. 2000. Textsorten in den Massenmedien. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 614–628.
Busse, Dietrich. 2000. Textsorten des Bereichs Rechtwesen und Justiz. In Klaus Brinker Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 658–675.
Deutsches Referenzkorpus (DeReKo), Wikipedia Diskussionen 2015. http://corpora.ids-mannheim.de/pub/wikipedia-deutsch/2015/ (20 November, 2017)
Eroms, Hans-Werner. 2008. Stil und Stilistik: Eine Einführung. Berlin: Schmidt.
Heinemann, Margot. 2000a. Textsorten des Bereichs Hochschule und Wissenschaft. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 702–709.
Heinemann, Margot. 2000b. Textsorten des Alltags. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 604–614.
Hundt, Markus. 2000. Textsorten des Bereichs Wirtschaft und Handel. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 642–658.
IDS, Datenbank für Gesprochenes Deutsch (DGD), FOLK. http://dgd.ids-mannheim.de (9 October, 2019.)
IDS, Datenbank für Gesprochenes Deutsch (DGD), GWSS. http://dgd.ids-mannheim.de (9 October, 2019.)
Klein, Joseph. 2000. Textsorten in Bereich politischer Institutionen. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 732–755.
Kochan, Stephen and Patrick Wood. 2016. Shell Programming in Unix, Linux and OS X (fourth edition). Indiana: Addison-Wesley.
Loewen, Shawn and Talip Gonulal. 2015. Exploratory factor analysis and principal components analysis. In Luke Plonsky ed. Advancing Quantitative Methods in Second Language Research. London: Routledge, 182–212.
Simmler, Franz. 2000. Textsorten des religiösen und kirchlichen Bereichs. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 676–690.
Sinclair, John. 2005. Corpus and text – Basic principles. In Martin Wynne ed. Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, 1–16.
Veirano Pinto, Márcia. 2013. A Linguagem dos Filmes Norte-americanos ao Longo dos Anos: Uma Abordagem Multidimensional. São Paulo: PUC São Paulo dissertation.
Wiese, Ingrid. 2000. Textsorten des Bereichs Medizin und Gesundheit. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven Sager eds., 710–718.
Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World Englishes 28/4: 421–450.
Downloads
Published
How to Cite
Issue
Section
License
Submission of your paper to this journal implies that the paper is not under submission for publication elsewhere. Material which has been previously copyrighted, published, or accepted for publication will not be considered for publication in this journal. Submission of a manuscript is interpreted as a statement of certification that no part of the manuscript is copyrighted by any other publisher nor is under review by any other formal publication. By submitting your manuscript to us, you agree on these copyright guidelines. It is your responsibility to ensure that your manuscript does not cause any copyright infringements, defamation, and other problems.
Submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the journal or its editorial staff. The main author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under the BY Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
Article submission implies author agreement with this policy.