The compilation of a developmental spoken English corpus of Turkish EFL learners

Keywords: learner corpus, spoken corpus, corpus compilation, developmental corpus, EFL


Although compiling a spoken learner corpus is not a recent enterprise, the number of developmental learner spoken corpora in the field of corpus linguistics is not satisfactory. This report describes the compilation of the Yeditepe Spoken Corpus of Learner English (YESCOLE), a 119,787-word corpus of Turkish students’ spoken English at tertiary level. YESCOLE was compiled to generate a developmental corpus of spoken interlanguage by collecting samples from learners of different English proficiency levels at regular short intervals over seven months. In order to shed light on the laborious methodology of compiling the developmental spoken learner corpus, this paper elucidates the steps taken to build YESCOLE and discusses its potential benefits for research and instructional purposes.


Download data is not yet available.


Metrics Loading ...


Ahangari, Saeideh and Morteza Abdi. 2011. The effect of pre-task planning on the accuracy and complexity of Iranian EFL learners’ oral performance. Procedia –Social and Behavioral Sciences 29: 1950–1959.

Anthony, Laurence. 2017. AntFileConverter (version 1.2.1). Tokyo, Japan: Waseda University.

Anthony, Lawrence. 2019. AntConc (version 3.5.8). Tokyo, Japan: Waseda University.

Asik, Asuman and Pasa Tevfik Cephe. 2013. Discourse markers and spoken English: Nonnative use in the Turkish EFL setting. English Language Teaching 6/12: 144–155.

Belz, Julie A. and Nina Vyatkina. 2008. The pedagogical mediation of a developmental learner corpus for classroom-based language instruction. Language Learning & Technology 12/3: 33–52.

Boers, Frank, June Eyckmans, Jenny Kappel, Helene Stengers and Murielle Demecheleer. 2006. Formulaic sequences and perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research 10/3: 245–261.

Bigi, Brigitte. 2015. SPPAS – Multi-lingual approaches to the automatic annotation of speech. The Phonetician – International Society of Phonetic Sciences 111: 54–69.

Council of Europe. 2005. Reference Supplement to the Preliminary Version of the Manual for Relating Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment. DGIV/EDU/LANG 2005, 13. Strasbourg: Language Policy Division.

De Cock, Sylvie. 2010. Spoken learner corpora and EFL teaching. In Mari Carmen Campoy-Cubillo, Begona Bellés-Fortuño and M. Luisa Gea-Valor eds. Corpus-based Approaches to English Language Teaching. London: Continuum, 123–137.

De Jong, Nivja H., Margarita P. Steinel, Arjen Florijn, Rob Schoonen and Jan H. Hulstijn. 2013. Linguistic skills and speaking fluency in a second language. Applied Psycholinguistics 34/5: 893–916.

De Moraes, Helmara Febeliana Real. 2018. Use of corpora in teaching speaking. In John I. Liontas ed. The TESOL Encyclopedia of English Language Teaching. New York: John Wiley and Sons, 1–6.

Demirel, Elif Tokdemir and Koray Şahin. 2015. The use of spoken learner corpora to detect problems with lexical accuracy. HUMANITAS-Uluslararası Sosyal Bilimler Dergisi 3/5: 73–83.

Demirel, Elif Tokdemir and Semin Kazazoğlu 2015. The comparison of collocation use by Turkish and Asian learners of English: The case of TCSE corpus and ICNALE corpus. Procedia – Social and Behavioral Sciences 174: 2278–2284.

Du Bois, John W. 1991. Transcription design principles for spoken discourse research. Pragmatics 1/1: 71–106.

Dulay, Heidi C., Marina K. Burt and Stephen D. Krashen. 1982. Language Two. Oxford: Oxford University Press.

Ebrahimi, Alice and Esmail Faghih. 2017. Integrating corpus linguistics into online language teacher education programs. ReCALL: The Journal of EUROCALL 29/1: 120–135.

Ellis, Rod and Fangyuan Yuan. 2004. The effects of planning on fluency, complexity, and accuracy in second language narrative writing. Studies in Second Language Acquisition 26/1: 59–84.

Farr, Fiona. 2008. Evaluating the use of corpus-based instruction in a language teacher education context: Perspectives from the users. Language Awareness 17/1: 25–43.

Gilquin, Gaëtanelle. 2015. From design to collection of learner corpora. In Sylviane Granger, Gaëtanelle Gilquin and Fanny Meunier eds. The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 9–34.

Gilquin, Gaëtanelle, Sylviane Granger and Magali Paquot. 2007. Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes 6/4: 319–335.

Granger, Sylviane. 2004. Computer learner corpus research: Current status and future prospects. In Ulla Connor and Thomas A. Upton eds. Applied Corpus Linguistics: A Multidimensional Perspective. Amsterdam: Rodopi 123–145.

Granger, Sylviane, Estelle Dagneaux, Fanny Meunier and Magali Paquot eds. 2009. International Corpus of Learner English. Louvain-la-Neuve: Presses universitaires de Louvain

Hedayati, Hora Fatemeh and S. Susan Marandi. 2014. Iranian EFL teachers’ perceptions of the difficulties of implementing CALL. ReCALL 26/3: 298–314.

Hobbs, James. 2005. Interactive lexical phrases in pair interview tasks. In Corony Edwards and Jane Willis eds. Teachers Exploring Tasks in English Language Teaching. London: Palgrave Macmillan, 143–156.

Huang Lan Fen. 2011. Discourse Markers in Spoken English: A Corpus Study of Native Speakers and Chinese Non-native Speakers. Birmingham: University of Birmingham dissertation.

Johns, Tim. 1991. From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. In Tim Johns and Philip King eds. Classroom Concordancing. English Language Research Journal, 4, Birmingham: University of Birmingham. 1–16.

Khan, Sarah. 2011. Strategies and Spoken Production of Three Oral Communication Tasks: A Study of High and Low Proficiency EFL Learners. Barcelona: Universitat Autònoma de Barcelona dissertation.

Kilimci, Abdurrahman. 2014. LINDSEI-TR: A new spoken corpus of advanced learners of English. International Journal of Social Sciences and Education 4/2: 401–410.

Kormos, Judit, and Zoltán Dörnyei. 2004. The interaction of linguistic and motivational variables in second language task performance. Zeitschrift für Interkulturellen Fremdsprachenunterricht 9/2. (20 May, 2021.)

MacWhinney, Brian. 2000. The CHILDES Project: Tools for Analyzing Talk (third edition). Mahwah, NJ: Lawrence Erlbaum Associates.

Masrom, Umi Kalsom, Nik Aloesnita Nik Mohd Alwi and Nor Shidrah Mat Daud. 2015. The role of task complexity and task motivation in language production. GEMA Online Journal of Language Studies 15/2: 33–49.

McEnery, Tony and Costas Gabrielatos. 2006. English corpus linguistics. In Bas Aarts, April MS McMahon and Lars Hinrichs eds. The Handbook of English Linguistics. Oxford: Blackwell, 33–71.

Meunier, Fanny. 2016. Introduction to the LONGDALE Project. In Erik Castello, Katherine Ackerley and Francesca Coccetta eds. Studies in Learner Corpus Linguistics. Research and Applications for Foreign Language Teaching and Assessment. Berlin: Peter Lang, 123–126.

Ortega, Lourdes. 1999. Planning and focus on form in L2 oral performance. Studies in Second Language Acquisition 21/1: 109–148.

Oxford Quick Placement Test (Version 1). 2001. Oxford University in collaboration with University of Cambridge, Local examinations Syndicate, Oxford: Oxford University Press.

Rea Rizzo, Camino. 2010. Getting on with corpus compilation: From theory to practice. ESP World 9:1–23.

Selinker, Larry. 1972. Interlanguage. IRAL-International Review of Applied Linguistics in Language Teaching 10: 209–232.

Skehan, Peter. 1998. A Cognitive Approach to Language Learning. Oxford: Oxford University Press.

Stolcke, Andreas, Elizabeth Shriberg, Rebecca Bates, Mari Ostendorf, Dilek Hakkani, Madelaine Plauche, Gökhan Tur and Yu Lu. 1998. Automatic detection of sentence boundaries and disfluencies based on recognized words. In the Fifth International Conference on Spoken Language Processing. Sydney, Australia (November 30-December 4, 1998). (20 February, 2021.)

Thewissen, Jennifer. 2013. Capturing L2 accuracy developmental patterns: Insights from an error‐tagged EFL learner corpus. The Modern Language Journal 97/1: 77–101.

Ting, Su-Hie, Mahanita Mahadhir and Siew-Lee Chang. 2010. Grammatical errors in spoken English of university students in oral communication course. GEMA Online Journal of Language Studies 10/1: 53–70.

Tono, Yukio and María Belén Díez-Bedmar. 2014. Focus on learner writing at the beginning and intermediate stages: The ICCI corpus. International Journal of Corpus Linguistics 19/2: 163–177.

Traum, David R. and Peter A. Heeman. 1997. Utterance units in spoken dialogue. In Elisabeth Maier, Marion Mast and Susann LuperFoy eds. Dialogue Processing in Spoken Language Systems. Heidelberg: Springer, 125–140.

Worm, Karsten L.1998. A model for robust processing of spontaneous speech by integrating viable fragments. In Association for Computational Linguistics eds. COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics, 1403–1407. (12 November, 2020.)

Yaman, Sibel, Li Deng, Dong Yu, Ye-Yi Wang and Alex Acero. 2008. An integrative and discriminative technique for spoken utterance classification. IEEE Transactions on Audio, Speech, and Language Processing 16/6: 1207–1214.

Yıldız, Mustafa. 2016. Contrastive analysis of Turkish and English in Turkish EFL learners’ spoken discourse. International Journal of English Studies 16/1: 57–74.

Yuan, Fangyuan and Rod Ellis. 2003. The effects of pre‐task planning and on‐line planning on fluency, complexity and accuracy in L2 monologic oral production. Applied linguistics 24/1: 1–27.

How to Cite
Genç-Yöntem, E., & Eveyik-Aydın, E. (2021). The compilation of a developmental spoken English corpus of Turkish EFL learners. Research in Corpus Linguistics, 10(1), 45-62.