Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl <p style="text-align: justify;"><em><strong>Research in Corpus Linguistics</strong></em> (<em>RiCL</em>, ISSN 2243-4712) is a scholarly peer-reviewed international scientific journal aiming at the publication of contributions which contain empirical analyses of data from different languages and from different theoretical perspectives and frameworks, with the goal of improving our knowledge about the linguistic theoretical background of a language, a language family or any type of cross-linguistic phenomena/constructions/assumptions. <em>RiCL</em> invites original, previously unpublished research articles, reports on corpus development, and book reviews in the field of Corpus Linguistics. The journal also considers the publication of special issues on specific topics, whose edition can be offered to leading scholars in the field.</p> AELINCO (Spanish Association for Corpus Linguistics) en-US Research in Corpus Linguistics 2243-4712 <p><a href="https://ricl.aelinco.es/index.php/ricl/copyright-notice" target="_blank" rel="noopener">Copyright notice</a></p> Introduction: Innovation in spoken corpus linguistics https://ricl.aelinco.es/index.php/ricl/article/view/413 <p style="text-align: justify;">Over the decades, technological advancements have substantially improved the efficiency and scope of spoken corpus compilation, but there remain many challenges ––both practical and theoretical–– that constrain 1) the quality of spoken corpus data, 2) the scale to which spoken corpora can be compiled, and 3) the authenticity with which spoken language is represented in textual form. This special issue presents eight studies which address contemporary innovations in spoken corpus design, data collection, processing, and analysis, covering a range of speech contexts and varieties. The studies focus on registers including online workplace meetings, casual conversation, oral histories, oral proficiency interviews, and <em>YouTube</em> vlogs. Innovations include the integration of automated transcription tools, multimodal annotation schemes, creative participant recruitment methods, and developments in natural language processing (NLP). Three contributions offer critical reconceptualisations of traditional approaches to spoken corpus design, proposing strategies to improve the authenticity of spoken corpora.</p> Robbie Love Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-09-17 2024-09-17 12 2 i–viii i–viii 10.32714/ricl.12.02.01 “We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions https://ricl.aelinco.es/index.php/ricl/article/view/312 <p style="text-align: justify;">Online communication via video platforms has become a standard component of workplace interaction for many businesses and employees. The rapid uptake in the use of virtual meeting platforms due to COVID-19 restrictions meant that many people had to quickly adjust to communication via this medium without much (if any) training as to how workplace communication is successfully facilitat- ed on these platforms. The <em>Interactional Variation Online </em>project aims to analyse a corpus of virtual meetings to gain a multi-modal understanding of this context of language use. This paper describes one component of the project, namely guidelines that can be replicated when constructing a corpus of multi-modal data derived from recordings of online meetings. A further aim is to determine typical fea- tures of virtual meetings in comparison to face-to-face meetings so as to inform good practice in virtual workplace interactions. By looking at how non-verbal behaviour, such as head movements, gaze, pos- ture, and spoken discourse interact in this medium, we both undertake a holistic analysis of interaction in virtual meetings and produce a template for the development of multi-modal corpora for future analysis.</p> Anne O'Keeffe Dawn Knight Geraldine Mark Christopher Fitzgerald Justin McNamara Svenja Adolphs Benjamin Cowan Tania Fahey Palma Fiona Farr Sandrine Peraldi Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-02-20 2024-02-20 12 2 1–23 1–23 10.32714/ricl.12.02.02 Building LANA-CASE, a spoken corpus of American English conversation: Challenges and innovations in corpus compilation https://ricl.aelinco.es/index.php/ricl/article/view/305 <p style="text-align: justify;">The <em>Lancaster-Northern Arizona Corpus of Spoken American English </em>(LANA-CASE) is a collaborative project between Lancaster University and Northern Arizona University to create a publicly available, large-scale corpus of American English conversation. In this article, we describe the design of LANA-CASE in terms of the challenges that have arisen and how these have been addressed – including decisions related to operationalizing the domain, sampling the data, recruiting participants, and selecting instruments for data collection. In addressing these challenges, we were able to draw on and further develop strategies established in the creation of other spoken corpora (including the British English counterpart to LANA-CASE, the <em>Spoken British National Corpus 2014</em>) as well as to implement recent theoretical and technical innovations related to each step. We hope that this discussion can inform future projects focused on the design and construction of spoken corpora.</p> Elizabeth Hanks Tony McEnery Jesse Egbert Tove Larsson Douglas Biber Randi Reppen Paul Baker Vaclav Brezina Gavin Brookes Isobelle Clarke Raffaella Bottini Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-02-29 2024-02-29 12 2 24–44 24–44 10.32714/ricl.12.02.03 Compiling a corpus of African American Language from oral histories https://ricl.aelinco.es/index.php/ricl/article/view/319 <p style="text-align: justify;">African American Language (AAL) is a marginalized variety of American English that has been understudied due to a lack of accessible data. This lack of data has made it difficult to research language in African American communities and has been shown to cause emerging technologies such as Automatic Speech Recognition (ASR) to perform worse for African American speakers. To address this gap, the <em>Joel Buchanan Archive of African American Oral History</em> (JBA) at the University of Florida is being compiled into a time-aligned and linguistically annotated corpus. Through Natural Language Processing (NLP) techniques, this project will automatically time-align spoken data with transcripts and automatically tag AAL features. Transcription and time-alignment challenges have arisen as we ensure accuracy in depicting AAL morphosyntactic and phonetic structure. Two linguistic studies illustrate how the <em>African American Corpus from Oral Histories </em>betters our understanding of this lesser-studied variety.</p> Sarah Moeller Alexis Davis Wilermine Previlon Michael Bottini Kevin Tang Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-04-25 2024-04-25 12 2 45–79 45–79 10.32714/ricl.12.02.04 Addressing comparability and retrieval issues in conversation corpora: A case study on the Spoken British National Corpora (1994 and 2014), using the past perfect https://ricl.aelinco.es/index.php/ricl/article/view/321 <p style="text-align: justify;">This paper addresses issues in comparison and analysis of conversation corpora. We focus on the demographically-sampled spoken portions of the <em>British National Corpora </em>(BNC), representing British English in 1994 and 2014, for the purposes of studying recent language change and sociolinguistic variation. Issues of comparability and representativeness of the two BNCs have been raised before (see Love 2020), with several measures taken to ensure backwards compatibility of the Spoken BNC2014 with its 1994 counterpart. However, we believe further considerations and solutions merit attention, relating to sampling, transcription, annotation, and corpus querying. The BNClab subcorpus (Brezina <em>et al.</em> 2018a), a sociolinguistic judgment sample derived from the parent BNCs, provides a very promising basis for analysis, although arguably its mixed geographical representativeness affects cross-time comparability. To address this, we make some proposals for modifying the BNClab subcorpus to improve comparability. Then, we use the modified sample to address issues in retrieval and quantification of grammatical constructions in the spoken BNCs, namely a) determining an appropriate frequency metric, b) retrieving a comprehensive but manageable set of examples from ‘messy’ spoken data, and c) handling transcription inaccuracies. Finally, we discuss the case study findings and wider methodological implications for users of these corpora.</p> Nicholas Smith Cristiano Broccias Cathleen Waters Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-05-13 2024-05-13 12 2 80–110 80–110 10.32714/ricl.12.02.05 Rethinking interviews as representations of spoken language in learner corpora https://ricl.aelinco.es/index.php/ricl/article/view/324 <p style="text-align: justify;">Following the call to examine the role of learner corpora in SLA research (Bell and Payant 2021), this paper discusses spoken learner corpora ––specifically those collected through interviews–– and considers the aspects of spoken learner language that they represent. The interview is both an elicitation technique and a complex genre. The overlapping of the two conceptualisations under the same term may give rise to problems of definition about the nature of the language collected and, as a consequence, to difficulties in interpretation when assessing the characteristics of spoken learner data. In this paper, we use original research to exemplify some of the areas that need some rethinking in terms of future reconceptualisation about how spoken data are collected and analysed. This research shows the potential impact of the degree of interviewer/interviewee engagement with the task, suggesting that not enough attention has been paid to the genre of interview in learner corpus research.</p> Pascual Pérez-Paredes Geraldine Mark Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-06-10 2024-06-10 12 2 111–145 111–145 10.32714/ricl.12.02.06 Developing a coding scheme for annotating opinion statements in L2 interactive spoken English with application for language teaching and assessment https://ricl.aelinco.es/index.php/ricl/article/view/323 <p style="text-align: justify;">Evaluative meanings are known to be difficult to identify and quantify in corpus data (Hunston 2004). The research in this area has largely drawn on the annotating schemes offered by the frameworks of Appraisal (Read and Carroll 2012; Fuoli 2018) or stance (Simaki <em>et al.</em> 2019). However, these annotation schemes have been applied predominantly to written production and to first language use. This study, therefore, proposes an annotation scheme for identifying and classifying linguistic expressions of opinion with particular application for second language (L2) language teaching and language assessment contexts. In addition, the coding scheme also specifically deals with spoken interactive communication, with particular attention paid to aspects such as the co-construction of opinion statements (Hovarth and Eggins 1995). The paper outlines the components of the coding scheme along with their theoretical underpinning, addresses some of the challenges in applying the codes and annotating real-life data, and discusses future possibilities and considerations related to the application of the coding scheme.</p> Yejin Jung Dana Gablasova Vaclav Brezina Hanna Schmück Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-07-02 2024-07-02 12 2 146–173 146–173 10.32714/ricl.12.02.07 Corpus as a slice of life: Representing naturally occurring language and its speakers https://ricl.aelinco.es/index.php/ricl/article/view/332 <p style="text-align: justify;">Discourse is subject to numerous forces that shape its form. One force that is underestimated is the interactional dynamic among interlocutors. In devising the criteria that inform data selection for a corpus of spoken discourse, designers may end up prioritizing the collection of spontaneous discourse and overlook the fact that this type of discourse can still display artificial interactional dynamics. We propose an approach to spoken corpus compilation that aims at preserving naturally occurring interactional dynamics by choosing as focus of the corpus the representation of participants’ lives. Through the analysis of speech events collected in different projects, we demonstrate the advantages of sourcing naturally occurring discourse over spontaneous data. We then discuss a series of practices that the authors implemented in different contexts to ensure the collection of naturally occurring data. We argue that this framework yields the construction of corpora that are representative not only of a language, but also of the lives of its users.</p> Giorgia Troiani John W. Du Bois Andrey Filchenko Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-06-26 2024-06-26 12 2 174–202 174–202 10.32714/ricl.12.02.08 Design and construction of a social media corpus: Influencers’ speech in vlogs https://ricl.aelinco.es/index.php/ricl/article/view/304 <p style="text-align: justify;">This article outlines the creation of a social media corpus of Turkish vlogs on <em>YouTube</em>, aimed at analyzing the translanguaging practices and multimodal communication of Turkish social media influencers. It firstly describes the process of constructing the corpus, including transcription conventions and <em>ad hoc</em> annotation. The article then analyzes the phenomenon of translanguaging, with an emphasis on its prevalent forms and modes. Given the challenges associated with compiling a multimodally rich social media corpus, this paper provides strategies for manually transcribing and annotating linguistic and semiotic features in <em>ELAN</em> software, as well as strategies for managing tier-based annotations for vlog datasets. Additionally, the study presents approaches for handling non-standard linguistic codes and marked occurrences in language contact zones, illustrated through examples drawn from the vlog corpus where Turkish serves as the standard code.</p> Hülya Mısır Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-06-30 2024-06-30 12 2 203–219 203–219 10.32714/ricl.12.02.09 Review of Gillings, Mathew, Gerlinde Mautner and Paul Baker. 2023. Corpus-Assisted Discourse Studies. Cambridge: Cambridge University Press. https://ricl.aelinco.es/index.php/ricl/article/view/363 Tamsin Parnell Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-06-27 2024-06-27 12 2 220–225 220–225 10.32714/ricl.12.02.10 Review of Brookes, Gavin and Luke C. Collins. 2023. Corpus Linguistics for Health Communication: A Guide for Research. London: Routledge. https://ricl.aelinco.es/index.php/ricl/article/view/379 Ovidia Martínez Sánchez Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-06-27 2024-06-27 12 2 226–233 226–233 10.32714/ricl.12.02.11 Review of Pettersson-Traba, Daniela. 2022. The Development of the Concept of SMELL in American English. A Usage-Based View of Near-Synonymy. Berlin: De Gruyter Mouton. https://ricl.aelinco.es/index.php/ricl/article/view/373 Daniel Granados-Meroño Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-07-12 2024-07-12 12 2 234–243 234–243 10.32714/ricl.12.02.12 Review of Izquierdo, Marlén and Zuriñe Sanz-Villar eds. 2023. Corpus Use in Cross-linguistic Research: Paving the Way for Teaching, Translation and Professional Communication. Amsterdam: John Benjamins. https://ricl.aelinco.es/index.php/ricl/article/view/406 Isabel Pizarro-Sánchez Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-09-03 2024-09-03 12 2 244–252 244–252 10.32714/ricl.12.02.13 Review of Viana, Vander ed. 2023. Teaching English with Corpora: A Resource Book. London: Routledge. https://ricl.aelinco.es/index.php/ricl/article/view/423 Gaëtanelle Gilquin Copyright (c) 2024 Research in Corpus Linguistics http://creativecommons.org/licenses/by/4.0 2024-10-22 2024-10-22 12 2 253–259 253–259 10.32714/ricl.12.02.14