Multimodal meaning making: The annotation of nonverbal elements in multimodal corpus transcription

Keywords: corpus annotation, corpus transcription, multimodality, nonverbal elements, spoken discourse, video-mediated communication, gestures


The article discusses how to integrate annotation for nonverbal elements (NVE) from multimodal raw data as part of a standardized corpus transcription. We argue that it is essential to include multimodal elements when investigating conversational data, and that in order to integrate these elements, a structured approach to complex multimodal data is needed. We discuss how to formulate a structured corpus-suitable standard syntax and taxonomy for nonverbal features such as gesture, facial expressions, and physical stance, and how to integrate it in a corpus. Using corpus examples, the article describes the development of a robust annotation system for spoken language in the corpus of Video-mediated English as a Lingua Franca Conversations (ViMELF 2018) and illustrates how the system can be used for the study of spoken discourse. The system takes into account previous research on multimodality, transcribes salient nonverbal features in a concise manner, and uses a standard syntax. While such an approach introduces a degree of subjectivity through the criteria of salience and conciseness, the system also offers considerable advantages: it is versatile and adaptable, flexible enough to work with a wide range of multimodal data, and it allows both quantitative and qualitative research on the pragmatics of interaction.


Download data is not yet available.


Metrics Loading ...


Adolphs, Svenja and Ronald Carter. 2013. Spoken Corpus Linguistics: From Monomodal to Multimodal. London: Routledge.

Allwood, Jens, Loredana Cerrato, Kristina Jokinen, Constanza Navarretta and Patrizia Paggio. 2007. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation 41: 273–287.

Bezemer, Jeff and Carey Jewitt. 2010. Multimodal Analysis: Key Issues. London: Continuum.

Bressem, Jana, Silva H. Ladewig and Cornelia Müller. 2013. Linguistic Annotation System for Gestures (LASG). In Cornelia Müller, Alan Cienki, Ellen Fricke, Silva Ladewig, David McNeill and Sedinha Tessendorf eds. Body-Language-Communication: An International Handbook on Multimodality in Human Interaction. Berlin: Walter de Gruyter, 1098–1125.

Brunner, Marie-Louise. 2021. Understanding Intercultural Communication: Negotiating Meaning and Identities in English as a Lingua Franca Skype Conversations. Saarbrücken: Saarland University PhD dissertation.

Brunner, Marie-Louise, Stefan Diemer and Selina Schmidt. 2017. “... okay so good luck with that ((laughing))?” - Managing rich data in a corpus of Skype conversations. In Turo Hiltunen, Joe McVeigh and Tanja Säily. Big and Rich Data in English Corpus Linguistics: Methods and Explorations [Studies in Variation, Contacts and Change in English 19]. Helsinki: VARIENG. (01 May, 2021.)

Calbris, Geneviève. 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.

Carletta, Jean, Simone Ashby, Sebastien Bourban, Mike Flynn, Mael Guillemot, Thomas Hain, Jaroslav Kadlec, Vasilis Karaiskos, Wessel Kraaij, Melissa Kronenthatl, Guillaume Lathoud, Mike Lincoln, Agnes Lisowska, Iain McCowan, Wilfried Post, Dennis Reidsma and Pierre Wellner. 2006. The AMI meeting corpus: A pre-announcement. In Steve Renals and Samy Bengio eds. Machine Learning for Multimodal Interaction: Second International Workshop, MLMI 2005, Edinburgh, UK, July 11–13, 2005, Revised Selected Papers (Lecture Notes in Computer Sciences 3869). Berlin: Springer, 28–39.

Cassell, Justine. 1998. A framework for gesture generation and interpretation. In Roberto Cipolla and Alex Pentland eds. Computer Vision in Human-machine Interaction. Cambridge: Cambridge University Press, 191–215.

Dressler, Richard A. and Roger J. Kreuz. 2000. Transcribing oral discourse: A survey and a model system. Discourse Processes 29/1: 25–36.

Du Bois, John W. 1991. Transcription design principles for spoken discourse research. Pragmatics 1/1: 71–106.

Edwards, Jane A. 1993. Principles and contrasting systems of discourse transcription. In Jane A. Edwards and Martin D. Lampert eds. Talking Data: Transcription and Coding in Discourse Research. Hillsdale: Lawrence Erlbaum Associates, 3–31.

F4transkript. Dr. Dresing & Pehl GmbH. (07 May, 2021.)

Goodwin, Charles. 2000. Action and embodiment within situated human interaction. Journal of Pragmatics 32/10: 1489–1522.

Goodwin, Charles. 2007. Environmentally coupled gestures. In Charles Goodwin, Susan D. Duncan, Justine Cassell and Elena Levy eds. Gesture and the Dynamic Dimensions of Language. Amsterdam: John Benjamins, 195–212.

Goodwin, Marjorie H. and Charles Goodwin. 2000. Emotion within situated activity. In Nancy Budwig, Ina Č. Užgiris and James V. Wertsch eds. Communication: An Arena of Development. Stamford: Greenwood Publishing Group, 33–53.

Hepburn, Alexa and Galina Bolden. 2013. The conversation analytic approach to transcription. In Jack Sidnell and Tanya Stivers eds. The Handbook of Conversation Analysis. Hoboken: John Wiley & Sons, 57–76.

Jefferson, Gayle. 1973. A case of precision timing in ordinary conversation: Overlapped tag-positioned address terms in closing sequences. Semiotica 9/1: 47–96.

Joo, Jungseock, Francis F. Steen and Mark Turner. 2017. Red Hen Lab: Dataset and tools for multimodal human communication research. KI-Künstliche Intelligenz 31/4: 357–361.

Kendon, Adam. 1980. Gesticulation and speech: Two aspects of the process of utterance. In Mary Richie Key ed. The Relationship of Verbal and Nonverbal Communication. Berlin: Mouton de Gruyter, 207–228.

Kendon, Adam. 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.

Kerswill, Paul and Ann William. 2002. “Salience” as an explanatory factor in language change: Evidence from dialect levelling in urban England. In Mari C. Jones and Edith Esch eds. Language Change: The Interplay of Internal, External and Extra-Linguistic Factors. Berlin: Mouton de Gruyter, 81–110.

Kress, Gunther. 2011. Multimodal discourse analysis. In John P. Gee and Michael Handford eds. The Routledge handbook of discourse analysis. London: Routledge, 35–50.

McNeill, David. 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press.

McNeill, David. 2008. Gesture and Thought. Chicago: University of Chicago press.

McNeill, David. 2017. Brief Introduction to Annotation. (01 May, 2021.)

McNeill, David and Susan Duncan. 2000. Growth points in thinking-for-speaking. In David McNeill ed. Language and Gesture. Cambridge: Cambridge University Press, 141–161.

Mondada, Lorenza. 2014. Pointing, talk, and the bodies. In Mandana Seyfeddinipur and Marianne Gullberg eds. From Gesture in Conversation to Visible Action as Utterance: Essays in Honor of Adam Kendon. Amsterdam: John Benjamins, 95–124.

Norris, Sigrid. 2002. The implication of visual research for discourse analysis: Transcription beyond language. Visual Communication 1/1: 97–121.

Pápay, Kinga, Szilvia Szeghalmy and István Szekrényes. 2011. Hucomtech multimodal corpus annotation. Argumentum 7: 330–347.

Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson. 1978. A simplest systematics for the organization of turn taking for conversation. In Jim Schenkein ed. Studies in the Organization of Conversational Interaction. New York: Academic Press, 7–55.

Schiel, Florian, Silke Steininger and Ulrich Türk. 2002. The SmartKom Multimodal Corpus at BAS. München: Ludwig-Maximilians Universität München Press.

Scollon, Ron and Philip LeVine. 2004. Multimodal discourse analysis as the confluence of discourse and technology. In Philip LeVine and Ron Scollon (eds.), Discourse and Technology: Multimodal Discourse Analysis. Washington: Georgetown University Press, 1–6.

Streeck, Jürgen. 2009. Gesturecraft: The Manu-facture of Meaning. Amsterdam: John Benjamins.

ViMELF. 2017a. ViMELF Transcription Conventions. Birkenfeld: Trier University of Applied Sciences. (01 May, 2021.)

ViMELF. 2017b. Transcription of Nonverbal Elements. Birkenfeld: Trier University of Applied Sciences. (01 May, 2021.)

ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. (01 May, 2021.)

How to Cite
Brunner, M.-L., & Diemer, S. (2021). Multimodal meaning making: The annotation of nonverbal elements in multimodal corpus transcription. Research in Corpus Linguistics, 9(1), 63-88.