https://ricl.aelinco.es/index.php/ricl/issue/feed Research in Corpus Linguistics 2024-10-24T16:34:16+00:00 Research in Corpus Linguistics ojs@aelinco.es Open Journal Systems <p style="text-align: justify;"><em><strong>Research in Corpus Linguistics</strong></em> (<em>RiCL</em>, ISSN 2243-4712) is a scholarly peer-reviewed international scientific journal aiming at the publication of contributions which contain empirical analyses of data from different languages and from different theoretical perspectives and frameworks, with the goal of improving our knowledge about the linguistic theoretical background of a language, a language family or any type of cross-linguistic phenomena/constructions/assumptions. <em>RiCL</em> invites original, previously unpublished research articles, reports on corpus development, and book reviews in the field of Corpus Linguistics. The journal also considers the publication of special issues on specific topics, whose edition can be offered to leading scholars in the field.</p> https://ricl.aelinco.es/index.php/ricl/article/view/413 Introduction: Innovation in spoken corpus linguistics 2024-10-24T16:34:14+00:00 Robbie Love r.love@aston.ac.uk <p style="text-align: justify;">Over the decades, technological advancements have substantially improved the efficiency and scope of spoken corpus compilation, but there remain many challenges ––both practical and theoretical–– that constrain 1) the quality of spoken corpus data, 2) the scale to which spoken corpora can be compiled, and 3) the authenticity with which spoken language is represented in textual form. This special issue presents eight studies which address contemporary innovations in spoken corpus design, data collection, processing, and analysis, covering a range of speech contexts and varieties. The studies focus on registers including online workplace meetings, casual conversation, oral histories, oral proficiency interviews, and <em>YouTube</em> vlogs. Innovations include the integration of automated transcription tools, multimodal annotation schemes, creative participant recruitment methods, and developments in natural language processing (NLP). Three contributions offer critical reconceptualisations of traditional approaches to spoken corpus design, proposing strategies to improve the authenticity of spoken corpora.</p> 2024-09-17T15:23:02+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/312 “We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions 2024-10-24T16:34:15+00:00 Anne O'Keeffe anne.okeeffe@mic.ul.ie Dawn Knight knightd5@cardiff.ac.uk Geraldine Mark markg2@cardiff.ac.uk Christopher Fitzgerald christopher.fitzgerald@mic.ul.ie Justin McNamara justin.mcnamara@mic.ul.ie Svenja Adolphs svenja.adolphs@nottingham.ac.uk Benjamin Cowan benjamin.cowan@ucd.ie Tania Fahey Palma t.faheypalma@abdn.ac.uk Fiona Farr fiona.farr@ul.ie Sandrine Peraldi sandrine.peraldi@ucd.ie <p style="text-align: justify;">Online communication via video platforms has become a standard component of workplace interaction for many businesses and employees. The rapid uptake in the use of virtual meeting platforms due to COVID-19 restrictions meant that many people had to quickly adjust to communication via this medium without much (if any) training as to how workplace communication is successfully facilitat- ed on these platforms. The <em>Interactional Variation Online </em>project aims to analyse a corpus of virtual meetings to gain a multi-modal understanding of this context of language use. This paper describes one component of the project, namely guidelines that can be replicated when constructing a corpus of multi-modal data derived from recordings of online meetings. A further aim is to determine typical fea- tures of virtual meetings in comparison to face-to-face meetings so as to inform good practice in virtual workplace interactions. By looking at how non-verbal behaviour, such as head movements, gaze, pos- ture, and spoken discourse interact in this medium, we both undertake a holistic analysis of interaction in virtual meetings and produce a template for the development of multi-modal corpora for future analysis.</p> 2024-02-20T00:00:00+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/305 Building LANA-CASE, a spoken corpus of American English conversation: Challenges and innovations in corpus compilation 2024-10-24T16:34:15+00:00 Elizabeth Hanks eah472@nau.edu Tony McEnery a.mcenery@lancaster.ac.uk Jesse Egbert jesse.egbert@nau.edu Tove Larsson tove.larsson@nau.edu Douglas Biber douglas.biber@nau.edu Randi Reppen randi.reppen@nau.edu Paul Baker j.p.baker@lancaster.ac.uk Vaclav Brezina v.brezina@lancaster.ac.uk Gavin Brookes g.brookes@lancaster.ac.uk Isobelle Clarke i.clarke@lancaster.ac.uk Raffaella Bottini r.bottini@lancaster.ac.uk <p style="text-align: justify;">The <em>Lancaster-Northern Arizona Corpus of Spoken American English </em>(LANA-CASE) is a collaborative project between Lancaster University and Northern Arizona University to create a publicly available, large-scale corpus of American English conversation. In this article, we describe the design of LANA-CASE in terms of the challenges that have arisen and how these have been addressed – including decisions related to operationalizing the domain, sampling the data, recruiting participants, and selecting instruments for data collection. In addressing these challenges, we were able to draw on and further develop strategies established in the creation of other spoken corpora (including the British English counterpart to LANA-CASE, the <em>Spoken British National Corpus 2014</em>) as well as to implement recent theoretical and technical innovations related to each step. We hope that this discussion can inform future projects focused on the design and construction of spoken corpora.</p> 2024-02-29T00:00:00+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/319 Compiling a corpus of African American Language from oral histories 2024-10-24T16:34:15+00:00 Sarah Moeller smoeller@ufl.edu Alexis Davis davisa2@ufl.edu Wilermine Previlon wprevilon@ufl.edu Michael Bottini chaelbottini@gmail.com Kevin Tang kevin.tang@hhu.de <p style="text-align: justify;">African American Language (AAL) is a marginalized variety of American English that has been understudied due to a lack of accessible data. This lack of data has made it difficult to research language in African American communities and has been shown to cause emerging technologies such as Automatic Speech Recognition (ASR) to perform worse for African American speakers. To address this gap, the <em>Joel Buchanan Archive of African American Oral History</em> (JBA) at the University of Florida is being compiled into a time-aligned and linguistically annotated corpus. Through Natural Language Processing (NLP) techniques, this project will automatically time-align spoken data with transcripts and automatically tag AAL features. Transcription and time-alignment challenges have arisen as we ensure accuracy in depicting AAL morphosyntactic and phonetic structure. Two linguistic studies illustrate how the <em>African American Corpus from Oral Histories </em>betters our understanding of this lesser-studied variety.</p> 2024-04-25T18:18:17+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/321 Addressing comparability and retrieval issues in conversation corpora: A case study on the Spoken British National Corpora (1994 and 2014), using the past perfect 2024-10-24T16:34:15+00:00 Nicholas Smith ns359@leicester.ac.uk Cristiano Broccias c.broccias@gmail.com Cathleen Waters dr.cathleen.linguist@gmail.com <p style="text-align: justify;">This paper addresses issues in comparison and analysis of conversation corpora. We focus on the demographically-sampled spoken portions of the <em>British National Corpora </em>(BNC), representing British English in 1994 and 2014, for the purposes of studying recent language change and sociolinguistic variation. Issues of comparability and representativeness of the two BNCs have been raised before (see Love 2020), with several measures taken to ensure backwards compatibility of the Spoken BNC2014 with its 1994 counterpart. However, we believe further considerations and solutions merit attention, relating to sampling, transcription, annotation, and corpus querying. The BNClab subcorpus (Brezina <em>et al.</em> 2018a), a sociolinguistic judgment sample derived from the parent BNCs, provides a very promising basis for analysis, although arguably its mixed geographical representativeness affects cross-time comparability. To address this, we make some proposals for modifying the BNClab subcorpus to improve comparability. Then, we use the modified sample to address issues in retrieval and quantification of grammatical constructions in the spoken BNCs, namely a) determining an appropriate frequency metric, b) retrieving a comprehensive but manageable set of examples from ‘messy’ spoken data, and c) handling transcription inaccuracies. Finally, we discuss the case study findings and wider methodological implications for users of these corpora.</p> 2024-05-13T20:12:49+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/324 Rethinking interviews as representations of spoken language in learner corpora 2024-10-24T16:34:15+00:00 Pascual Pérez-Paredes pascualf@um.es Geraldine Mark markg2@cardiff.ac.uk <p style="text-align: justify;">Following the call to examine the role of learner corpora in SLA research (Bell and Payant 2021), this paper discusses spoken learner corpora ––specifically those collected through interviews–– and considers the aspects of spoken learner language that they represent. The interview is both an elicitation technique and a complex genre. The overlapping of the two conceptualisations under the same term may give rise to problems of definition about the nature of the language collected and, as a consequence, to difficulties in interpretation when assessing the characteristics of spoken learner data. In this paper, we use original research to exemplify some of the areas that need some rethinking in terms of future reconceptualisation about how spoken data are collected and analysed. This research shows the potential impact of the degree of interviewer/interviewee engagement with the task, suggesting that not enough attention has been paid to the genre of interview in learner corpus research.</p> 2024-06-10T17:16:28+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/323 Developing a coding scheme for annotating opinion statements in L2 interactive spoken English with application for language teaching and assessment 2024-10-24T16:34:15+00:00 Yejin Jung y.jung@lancaster.ac.uk Dana Gablasova d.gablasova@lancaster.ac.uk Vaclav Brezina v.brezina@lancaster.ac.uk Hanna Schmück h.schmueck@lancaster.ac.uk <p style="text-align: justify;">Evaluative meanings are known to be difficult to identify and quantify in corpus data (Hunston 2004). The research in this area has largely drawn on the annotating schemes offered by the frameworks of Appraisal (Read and Carroll 2012; Fuoli 2018) or stance (Simaki <em>et al.</em> 2019). However, these annotation schemes have been applied predominantly to written production and to first language use. This study, therefore, proposes an annotation scheme for identifying and classifying linguistic expressions of opinion with particular application for second language (L2) language teaching and language assessment contexts. In addition, the coding scheme also specifically deals with spoken interactive communication, with particular attention paid to aspects such as the co-construction of opinion statements (Hovarth and Eggins 1995). The paper outlines the components of the coding scheme along with their theoretical underpinning, addresses some of the challenges in applying the codes and annotating real-life data, and discusses future possibilities and considerations related to the application of the coding scheme.</p> 2024-07-02T07:00:59+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/332 Corpus as a slice of life: Representing naturally occurring language and its speakers 2024-10-24T16:34:16+00:00 Giorgia Troiani gtroiani@ucsb.edu John W. Du Bois dubois@linguistics.ucsb.edu Andrey Filchenko andrey.filchenko@nu.edu.kz <p style="text-align: justify;">Discourse is subject to numerous forces that shape its form. One force that is underestimated is the interactional dynamic among interlocutors. In devising the criteria that inform data selection for a corpus of spoken discourse, designers may end up prioritizing the collection of spontaneous discourse and overlook the fact that this type of discourse can still display artificial interactional dynamics. We propose an approach to spoken corpus compilation that aims at preserving naturally occurring interactional dynamics by choosing as focus of the corpus the representation of participants’ lives. Through the analysis of speech events collected in different projects, we demonstrate the advantages of sourcing naturally occurring discourse over spontaneous data. We then discuss a series of practices that the authors implemented in different contexts to ensure the collection of naturally occurring data. We argue that this framework yields the construction of corpora that are representative not only of a language, but also of the lives of its users.</p> 2024-06-26T17:52:04+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/304 Design and construction of a social media corpus: Influencers’ speech in vlogs 2024-10-24T16:34:16+00:00 Hülya Mısır hulyamsr@gmail.com <p style="text-align: justify;">This article outlines the creation of a social media corpus of Turkish vlogs on <em>YouTube</em>, aimed at analyzing the translanguaging practices and multimodal communication of Turkish social media influencers. It firstly describes the process of constructing the corpus, including transcription conventions and <em>ad hoc</em> annotation. The article then analyzes the phenomenon of translanguaging, with an emphasis on its prevalent forms and modes. Given the challenges associated with compiling a multimodally rich social media corpus, this paper provides strategies for manually transcribing and annotating linguistic and semiotic features in <em>ELAN</em> software, as well as strategies for managing tier-based annotations for vlog datasets. Additionally, the study presents approaches for handling non-standard linguistic codes and marked occurrences in language contact zones, illustrated through examples drawn from the vlog corpus where Turkish serves as the standard code.</p> 2024-06-30T22:15:07+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/363 Review of Gillings, Mathew, Gerlinde Mautner and Paul Baker. 2023. Corpus-Assisted Discourse Studies. Cambridge: Cambridge University Press. 2024-10-24T16:34:16+00:00 Tamsin Parnell tamsin.parnell2@nottingham.ac.uk 2024-06-27T00:00:00+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/379 Review of Brookes, Gavin and Luke C. Collins. 2023. Corpus Linguistics for Health Communication: A Guide for Research. London: Routledge. 2024-10-24T16:34:16+00:00 Ovidia Martínez Sánchez ovidia.martinez@ua.es 2024-06-27T00:00:00+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/373 Review of Pettersson-Traba, Daniela. 2022. The Development of the Concept of SMELL in American English. A Usage-Based View of Near-Synonymy. Berlin: De Gruyter Mouton. 2024-10-24T16:34:16+00:00 Daniel Granados-Meroño daniel.granadosm@um.es 2024-07-12T06:41:44+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/406 Review of Izquierdo, Marlén and Zuriñe Sanz-Villar eds. 2023. Corpus Use in Cross-linguistic Research: Paving the Way for Teaching, Translation and Professional Communication. Amsterdam: John Benjamins. 2024-10-24T16:34:16+00:00 Isabel Pizarro-Sánchez isabel.pizarro@uva.es 2024-09-03T09:55:56+00:00 Copyright (c) 2024 Research in Corpus Linguistics https://ricl.aelinco.es/index.php/ricl/article/view/423 Review of Viana, Vander ed. 2023. Teaching English with Corpora: A Resource Book. London: Routledge. 2024-10-24T16:34:16+00:00 Gaëtanelle Gilquin gaetanelle.gilquin@uclouvain.be 2024-10-22T05:56:19+00:00 Copyright (c) 2024 Research in Corpus Linguistics