Building LANA-CASE, a spoken corpus of American English conversation: Challenges and innovations in corpus compilation
DOI:
https://doi.org/10.32714/ricl.12.02.03Keywords:
spoken corpora, conversation, corpus compilation, LANA-CASEAbstract
The Lancaster-Northern Arizona Corpus of Spoken American English (LANA-CASE) is a collaborative project between Lancaster University and Northern Arizona University to create a publicly available, large-scale corpus of American English conversation. In this article, we describe the design of LANA-CASE in terms of the challenges that have arisen and how these have been addressed – including decisions related to operationalizing the domain, sampling the data, recruiting participants, and selecting instruments for data collection. In addressing these challenges, we were able to draw on and further develop strategies established in the creation of other spoken corpora (including the British English counterpart to LANA-CASE, the Spoken British National Corpus 2014) as well as to implement recent theoretical and technical innovations related to each step. We hope that this discussion can inform future projects focused on the design and construction of spoken corpora.
Downloads
References
Aston, Guy and Lou Burnard. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press.
Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8/4: 243–257.
Biber, Douglas, Jesse Egbert, Daniel Keller and Stacey Wizner. 2021. Towards a taxonomy of conversational discourse types: An empirical corpus-based analysis. Journal of Pragmatics 171: 20–35.
Davies, Alan. 1991. The Native Speaker in Applied Linguistics. Edinburgh: Edinburgh University Press.
Dworkin, Jodi, Heather Hessel, Kate Gliske and Jessie H. Rudi. 2016. A comparison of three online recruitment strategies for engaging parents. Family Relations 65/4: 550–561.
Egbert, Jesse, Douglas Biber and Bethany Gray. 2022. Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness. Cambridge: Cambridge University Press.
Farrokhi, Farahman and Asgar Mahmoudi-Hamidabad. 2012. Rethinking convenience sampling: Defining quality criteria. Theory & Practice in Language Studies 2/4: 784–792.
Hanks, Elizabeth. (In preparation). Exploring the register of conversation: Uncovering linguists’ insights about its situational characteristics.
Knight, Dawn, Fernando Loizides, Steven Neale, Laurence Anthony and Irena Spasić. 2021. Developing computational infrastructure for the CorCenCC corpus: The National Corpus of Contemporary Welsh. Language Resources and Evaluation 55: 789–816.
Labov, William. 1997. Linguistics and sociolinguistics. In Nikolas Coupland and Adam Jaworski eds. Sociolinguistics: A Reader. London: Palgrave Macmillan, 23–24.
Leech, Geoffrey. 1993. 100 million words of English. English Today 9/1: 9–15.
Love, Robbie. 2020. Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014. New York: Routledge.
Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22/3: 319–344.
McCarthy, Michael J. 1998. Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.
McEnery, Tony and Andrew Wilson. 2001. Corpus Linguistics: An Introduction. Edinburgh: Edinburgh University Press.
McEnery, Tony and Gavin Brookes. 2022. Building a written corpus: What are the basics? In Anne O’Keeffe and Michael McCarthy eds. The Routledge Handbook of Corpus Linguistics. London: Routledge, 35–47.
Saha, Koustuv, Pranshu Gupta, Gloria Mark, Emre Kıcıman and Munmun De Choudhury. 2023. Observer effect in social media use. https://doi.org/10.21203/rs.3.rs-2492994/v1
Shirk, Jennifer, Heidi Ballard, Candie Wilderman, Tina Phillips, Andrea Wiggins, Rebecca Jordan, Ellan McCallie, Matthew Minarchek, Bruce Lewenstein, Marianne Krasny and Rick Bonney. 2012. Public participation in scientific research: A framework for deliberate design. Ecology and Society 17/2: 1–20.
Simpson-Vlach, Rita C. and Sheryl Leicher. 2006. The MICASE Handbook: A Resource for Users of the Michigan Corpus of Academic Spoken English. Ann Arbor: University of Michigan Press.
U.S. Census Bureau. n.d. Explore census data. https://data.census.gov/(June 2022).
Downloads
Published
Versions
- 2025-02-02 (2)
- 2024-02-29 (1)
How to Cite
Issue
Section
License
Submission of your paper to this journal implies that the paper is not under submission for publication elsewhere. Material which has been previously copyrighted, published, or accepted for publication will not be considered for publication in this journal. Submission of a manuscript is interpreted as a statement of certification that no part of the manuscript is copyrighted by any other publisher nor is under review by any other formal publication. By submitting your manuscript to us, you agree on these copyright guidelines. It is your responsibility to ensure that your manuscript does not cause any copyright infringements, defamation, and other problems.
Submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the journal or its editorial staff. The main author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under the BY Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
Article submission implies author agreement with this policy.