The Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing

Keywords: learner corpus, learner corpus research, English as a Foreign Language, academic writing, register variation, student writing


The Varieties of English for Specific Purposes dAtabase (VESPA first release) is the result of an international corpus compilation project that aims to address the lack of large-scale, open access, multi-L1, multi-discipline and multi-register learner corpora. This corpus report provides a detailed description of VESPA and illustrates possible uses of the corpus for register exploration of learner data. Specifically, it first offers an overview of the makeup of the corpus and the online interface that can be used to search and download the corpus. It then gives an illustrative example of a study where multi-dimensional analysis was used to investigate the relative importance of register vis-à-vis other factors in learner academic writing. In the concluding remarks, we identify priorities for future developments in the VESPA project, including the addition of more L1 components, more disciplines and more registers, as well as the compilation of a comparable corpus of native student writing.


Download data is not yet available.


Metrics Loading ...


Alsop, Sian and Hilary Nesi. 2009. Issues in the development of the British Academic Written English (BAWE) corpus. Corpora 4 /1: 71–83

Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.

Biber, Douglas. 1992. The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities 26: 331–345.

Biber, Douglas, Randi Reppen, Shelley Staples and Jesse Egbert. 2020. Exploring the longitudinal development of grammatical complexity in the disciplinary writing of L2-English university students. International Journal of Learner Corpus Research 6/1: 38–71.

Blanchard, Daniel, Joel Tetreault, Derrick Higgins, Aoife Cahill and Martin Chodorow. 2013. TOEFL11: A corpus of non-native English. ETS Research Report Series, 2013/2: i–15. (29 September, 2021.)

Callies, Marcus and Ekaterina Zaytseva. 2013. The Corpus of Academic Learner English (CALE) – A new resource for the assessment of writing proficiency in the academic register. Dutch Journal of Applied Linguistics 2/1: 126–132.

Ebeling, Signe O. and Hilde Hasselgård. 2015. Learners’ and native speakers’ use of recurrent word-combinations across disciplines. In Ann-Kristin H. Gujord, Susan Nacey, Silje Ragnhildstveit eds. Learner Corpus Research: LCR2013 Conference Proceedings (Bergen Language and Linguistics Studies 6), 87–106.

Ebeling, Signe O. and Alois Heuboeck. 2007. Encoding document information in a corpus of student writing: The British Academic Written English Corpus. Corpora 2/2: 241–256.

Gilquin, Gaëtanelle, Sylviane Granger and Magali Paquot. 2007. Learner corpora: The missing link in EAP pedagogy. In Paul Thompson ed. Corpus-based EAP Pedagogy. Special issue of the Journal of English for Academic Purposes 6/4: 319–335.

Granger, Sylviane, Maïté Dupont, Fanny Meunier, Hubert Naets and Magali Paquot. 2020. The International Corpus of Learner English (version 3). Louvain-la-Neuve: Presses universitaires de Louvain.

Granger, Sylviane and Magali Paquot. 2013. Language for specific purposes learner corpora. In Carol A. Chapelle ed. The Encyclopedia of Applied Linguistics. Oxford: Blackwell-Wiley.

Hasselgård, Hilde. 2014. It-clefts in English L1 and L2 academic writing. In Kristin Davidse, Caroline Gentens, Lobke Ghesquière and Lieven Vandelanotte eds. Corpus Interrogation and Grammatical Patterns. Amsterdam: John Benjamins, 295–320.

Heuboeck, Alois, Jasper Holmes and Hilary Nesi. 2008. The BAWE Corpus Manual. (29 September, 2021.)

Larsson, Tove. 2019. Grammatical stance marking in student and expert production: Revisiting the informal-formal dichotomy. Register Studies 1/2: 243–268.

Larsson, Tove, Marcus Callies, Hilde Hasselgård, Natalia J. Laso, Magali Paquot, Sanne van Vuuren and Isabel Verdaguer. 2020. Adverb placement in EFL academic writing: Going beyond syntactic transfer. International Journal of Corpus Linguistics 25/2: 155–184.

Larsson, Tove and Henrik Kaatari. 2019. Extraposition in learner and expert writing: Exploring (in)formality and the impact of register. International Journal of Learner Corpus Research 5/1: 33–62.

Larsson, Tove, Magali Paquot and Douglas Biber. 2021. On the importance of register in learner writing: A multi-dimensional approach. In Elena Seoane and Douglas Biber eds. Corpus-based Approaches to Register Variation. Amsterdam: John Benjamins, 235–258.

Lee, David Y. W. and Sylvia Xiao Chen. 2009. Making a bigger deal of the smaller words: Function words and other key items in research writing by Chinese learners. Journal of Second Language Writing 18/3: 149–165.

Nesi, Hilary, Sheena Gardner, Paul Thompson and Paul Wickens. 2008. British Academic Written English Corpus. Oxford Text Archive.

Open Cambridge Learner Corpus (v1). 2017. Distributed by Lexical Computing Limited on behalf of Cambridge University Press and Cambridge English Language Assessment.

Paquot, Magali. 2010. Academic Vocabulary in Learner Writing: From Extraction to Analysis. London: Continuum.

Paquot, Magali. 2019. The phraseological dimension in interlanguage complexity research. Second Language Research 35/1: 121–145.

Paquot, Magali, Hilde Hasselgård and Signe O. Ebeling. 2013. Writer/reader visibility in learner writing across genres: A comparison of the French and Norwegian components of the ICLE and VESPA learner corpora. In Sylviane Granger, Gaëtanelle Gilquin and Fanny Meunier eds. Twenty Years of Learner Corpus Research: Looking back, Moving ahead. Louvain-la-Neuve: Presses universitaires de Louvain, 377–387.

Paquot, Magali, Signe O. Ebeling, Alois Heuboeck and Larry Valentin. 2015. The VESPA Tagging Manual (version 2.3). Louvain-la-Neuve: Centre for English Corpus Linguistics.

Polio, Charlene. 2017. Second language writing development: A research agenda. Language Teaching 50/2: 261–275.

Römer, Ute. 2009. English in academia: Does nativeness matter? Anglistik: International Journal of English Studies 20/2: 89–100.

Römer, Ute and Matthew Brook O’Donnell. 2011. From student hard drive to web corpus (part 1): The design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora 6/2: 159–177.

Staples, Shelley, Douglas Biber and Randi Reppen. 2018. Using corpus-based register analysis to explore authenticity of high-stakes language exams: A register comparison of TOEFL iBT and disciplinary writing tasks. The Modern Language Journal 102/2: 310–332.

Ströbel, Marcus, Elma Kerz and Daniel Wiechmann. 2020. The relationship between first and second language writing: Investigating the effects of first language complexity on second language complexity in advanced stages of learning. Language Learning 70/3: 732–767.

How to Cite
Paquot, M., Larsson, T., Hasselgård, H., Ebeling, S. O., De Meyere, D., Valentin, L., Laso, N. J., Verdaguer, I., & van Vuuren, S. (2022). The Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing. Research in Corpus Linguistics, 10(2), 1-15.