Lexical simplification in learner translation: A corpus-based approach

Keywords: lexical simplification, learner translation, corpus-based approach, students' translations


The advance of corpus-based methodology in translation studies has greatly enhanced our understanding of the nature of translational language. While most research efforts have focused on identifying the unique features of translations carried out by professionals, comparatively fewer studies have investigated the linguistic features of student translations. In this corpus-based study, we examine if learner translations carried out by Hong Kong students exhibit lexical simplification features vis-à-vis comparable written texts. The study is based on two comparable corpora: the International Corpus of English in Hong Kong (ICE-HK) and the Parallel Learner Translation Corpus (PLTC) compiled at The Hong Kong Polytechnic University. Following Laviosa (1998), we compare four main lexical features (lexical density, type-token ratio, core vocabulary coverage, and list head coverage) to investigate if student translations show a simplification trend. The results demonstrate that Chinese-to-English translation is not lexically simpler than English as a Second Language (ESL) writing. Furthermore, it is lexically denser than ESL writing. Our study aims to provide new insights into learner translation as a form of constrained communication.


Download data is not yet available.


Metrics Loading ...


Baker, Mona. 1993. Corpus linguistics and translation studies: Implications and applications. In Mona Baker, Francis Gill and Elena Tognini-Bonelli eds. Text and Technology: In Honour of John Sinclair. Philadelphia: John Benjamins, 233–250.

Baker, Mona. 1995. Corpora in translation studies: An overview and some suggestions for future research. Target 7/2: 223–243.

Baker, Mona. 1996. Corpus-based translation studies: The challenges that lie ahead. In Harold Somers ed. Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager. Philadelphia: John Benjamins, 175–186.

Blum-Kulka, Shoshana and Eddie A. Levenston. 1983. Universals of lexical simplification. In Claus Færch and Gabriele Kasper eds. Strategies in Interlanguage Communication. London: Longman, 119–139.

Bolt, Philip and Kingsley Bolton. 1996. The International Corpus of English in Hong Kong. In Sidney Greenbaum ed. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press, 197–214.

Bowker, Lynne and Peter Bennison. 2003. Student translation archive: Design, development and application. In Federico Zanettin, Silvia Bernardini and Dominic Stewart eds. Corpora in Translator Education. Manchester: St. Jerome Publishing, 103–117.

Bulté, Bram and Alex Housen. 2012. Defining and operationalising L2 complexity. In Alex Housen, Folkert Kuiken and Ineke Vedder eds. Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA. Amsterdam: John Benjamins, 21–46.

Carl, Michael and Barbara Dragsted. 2012. Inside the monitor model: Processes of default and challenged translation production. Translation: Corpora, Computation, Cognition 2/1: 127–145.

Chesterman, Andrew. 2004. Hypotheses about translation universals. In Gyde Hansen, Kirsten Malmkjær and Daniel Gile eds. Claims, Changes and Challenges in Translation Studies. Amsterdam: John Benjamins, 1–13.

Crossley, Scott A. and Danielle S. McNamara. 2012. Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading 35/2: 115–135.

Duff, Alan. 1981. The Third Language: Recurrent Problems of Translation into English. Oxford: Pergamon Press.

Ferraresi, Adriano, Silvia Bernardini, Maja Petrović and Marie-Aude Lefer. 2018. Simplified or not simplified? The different guises of mediated English at the European parliament. Meta 63/3: 717–738.

Frawley, William. 1984. Translation: Literary, Linguistic, and Philosophical Perspectives. Newark: University of Delaware Press.

Gonzalez, Melanie. 2013. The Intricate Relationship between Measures of Vocabulary Size and Lexical Diversity as Evidenced in Non-native and Native Speaker Academic Compositions. Florida: University of Central Florida dissertation.

Granger, Sylviane and Marie-Aude Lefer. 2020. The Multilingual Student Translation Corpus: A resource for translation, teaching and research. Language Resources and Evaluation 54/4: 1183–1199.

Greenbaum, Sidney. 1988. A proposal for an international computerized corpus of English. World Englishes 7/3: 315. https://doi.org/10.1111/j.1467-971X.1988.tb00241.x.

Grosjean, François. 2013. Bilingualism: A short introduction. In François Grosjean and Ping Li eds. The Psycholinguistics of Bilingualism. Oxford: Wiley-Blackwell, 5–25.

House, Juliane. 2015. Translation as Communication across Languages and Cultures. London: Routledge.

House, Juliane and Dániel Z. Kádár. 2021. Introduction. In Dániel Z. Kádár and Juliane House eds. Cross-Cultural Pragmatics. Cambridge: Cambridge University Press, 1–12.

Hu, Kaibao. 2016. Introducing Corpus-based Translation Studies. Heidelberg: Springer.

Hu, Shirong. 2007. A Corpus-based Study of the Translation Strategies Used in the Chinese Translations of Hamlet and Othello. Shanghai: Shanghai Jiao Tong University dissertation.

Jantunen, Jarmo Harri. 2004. Untypical patterns in translations: Issues on corpus methodology and synonymity. In Anna Mauranen and Pekka Kujamäki eds. Translation Universals: Do They Exist. Amsterdam: John Benjamins, 101–126.

Jarvis, Scott. 2002. Short texts, best-fitting curves and new measures of lexical diversity. Language Testing 19/1: 57–84.

Kortmann, Bernd and Benedikt Szmrecsanyi. 2009. World Englishes between simplification and complexification. In Thomas Hoffmann and Lucia Siebers eds. World Englishes – Problems, Properties and Prospects. Amsterdam: John Benjamins, 263–286.

Kruger, Haidee and Bertus van Rooy. 2016. Constrained language: A multidimensional analysis of translated English and a non-native indigenised variety of English. English World-Wide 37/1: 26–57.

Lanstyák, István and Pál Heltai. 2012. Universals in language contact and translation. Across Languages and Cultures 13/1: 99–121.

Laufer, Batia and Paul Nation. 1995. Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics 16/3: 307–322.

Laviosa, Sara. 1998. Core patterns of lexical use in a comparable corpus of English narrative prose. Meta 43/4: 557–570.

Leech, Geoffrey, Paul Rayson and Andrew Wilson. 2001. Word Frequencies in Written and Spoken English: Based on the British National Corpus. Harlow: Longman.

Li, Defeng. 2002. Translator training: What translation students have to say. Meta 47/4: 513–531.

Liu, Kanglong and Muhammad Afzaal. 2021. Syntactic complexity in translated and non-translated texts: A corpus-based study of simplification. PLOS ONE 16/6: e0253454. https://doi.org/10.1371/journal.pone.0253454.

Liu, Kanglong, Joyce Oiwun Cheung and Nan Zhao. 2022. Learner corpus research in Hong Kong: Past, present and future. Corpora 17/Supplement: 79–97.

Lu, Xiaofei. 2012. The relationship of lexical richness to the quality of ESL learners oral narratives. The Modern Language Journal 96/2: 190–208.

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing toolkit. In Kalina Bontcheva and Jingbo Zhud eds. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore: Association for Computational Linguistics: 55–60.

Mauranen, Anna. 2000. Strange strings in translated language: A study on corpora. In Maeve Olohan ed. Intercultural Faultlines. Research Models in Translation Studies 1: Textual and Cognitive Aspects. Manchester: St. Jerome Publishing, 119–141.

McWhorter, John H. 2011. Linguistic Simplicity and Complexity: Why Do Languages Undress? Berlin: Mouton De Gruyter.

Nasseri, Maryam and Paul Thompson. 2021. Lexical density and diversity in dissertation abstracts: Revisiting English L1 vs. L2 text differences. Assessing Writing 47: 100511. https://doi.org/10.1016/j.asw.2020.100511.

Nelson, Gerald. 2006. The ICE Hong Kong Corpus: User Manual. London: University College London.

Olohan, Maeve and Mona Baker. 2000. Reporting that in translated English: Evidence for subconscious processes of explicitation? Across Languages and Cultures 1/2: 141–158.

Pym, Anthony. 2008. On Toury’s laws of how translators translate. In Anthony Pym, Miriam Shlesinger and Daniel Simeoni eds. Beyond Descriptive Translation Studies: Investigations in Homage to Gideon Toury. Amsterdam: John Benjamins, 311–328.

Pym, Anthony. 2015. Translating as risk management. Journal of Pragmatics 85: 67–80.

Saldanha, Gabriela. 2011. Emphatic italics in English translations: Stylistic failure or motivated stylistic resources? Meta 56/2: 424–442.

Scott, Mike. 2021. WordSmith Tools Version 8.0. Stroud: Lexical Analysis Software.

Toury, Gideon. 2012. Descriptive Translation Studies – and Beyond. Amsterdam: John Benjamins.

Tymoczko, Maria. 1998. Computerized corpora and the future of translation studies. Meta 43/4: 652–660.

Wen, Tinghui. 2009. Simplification as a Recurrent Translation Feature: A Corpus-based Study of Modern Chinese Translated Mystery Fiction in Taiwan. Manchester: University of Manchester dissertation.

Xia, Yun. 2014. Normalization in Translation: Corpus-based Diachronic Research into Twentieth-century English-Chinese Fictional Translation. Newcastle upon Tyne: Cambridge Scholars Publishing.

Xu, Cui and Dechao Li. 2022. Exploring genre variation and simplification in interpreted language from comparable and intermodal perspectives. Babel 68/5: 742–770.

How to Cite
Kwok, H. L., Laviosa, S., & Liu, K. (2023). Lexical simplification in learner translation: A corpus-based approach. Research in Corpus Linguistics, 11(2), 103-124. https://doi.org/10.32714/ricl.11.02.06