Machine Learning Classification of Pronunciation Difficulty for Learners of English as a Foreign Language

  • Katsunori Kotani Kansai Gaidai University
  • Takehiko Yoshimi Ryukoku University
Keywords: phonetic learner corpus, English as a Foreign Language, pronunciation difficulty, Machine-learning classification

Abstract

This study compiled and assessed a learner corpus to measure the difficulty of pronouncing a sentence (henceforth, pronounceability). The method of measuring pronounceability is useful for computer-assisted language learning of English as a Foreign Language that employs online materials as a resource for pronunciation training. An advantage of this resource is that learners can select materials depending on their interest, a disadvantage being that pronounceability is unknown to learners. If pronounceability is automatically measured, learners can independently access materials appropriate for their proficiency levels without teachers’ assistance. The pronounceability assessment demonstrated moderate reliability and partial validity when it was measured by learners’ subjective judgment on a five-point Likert scale. Given the reliability and validity, this study developed a pronounceability measuring method utilizing a machine learning algorithm that automatically predicts the pronounceability of a sentence based on the linguistic features of the sentences and learners’ features (i.e. learners’ scores for an English proficiency test). The proposed measuring method demonstrated a higher classification accuracy (53.7 percent) than the majority class baseline (46.0 percent).

References

Brown, James D. 1996. Testing in language programs. Englewood Cliffs, NJ: Prentice-Hall.

Chall, Jeanne S. and Harold E. Dial. 1948. Predicting listener understanding and interest in newscasts. Educational Research Bulletin 27/6: 141–153+168.

Chauncey Group International. 1998. TOEIC technical manual. Princeton, NJ: Chauncey Group International.

Cronbach, Lee J. 1970. Essentials of psychological testing. 3rd edition. New York: Harper & Row.

Delais-Roussarie, Elisabeth, Fabián Santiago and Hi-Yon Yoo. 2015. The extended COREIL corpus: first outcomes and methodological issues. In Proceedings from the Workshop on Phonetic Learner Corpora, International Congress of the Phonetic Sciences, 57–59.

Deterding, David. 2006. The North Wind versus a Wolf: short texts for the description and measurement of English pronunciation. Journal of the International Phonetic Association 36/2: 187–196.

Fang, Irving E. 1966. The ‘Easy listening formula’. Journal of Broadcasting 11/1: 63–68.

Gósy, Mária, Dorottya Gyarmathy and András Beke. 2015. The development of a Hungarian-English learner speech database and a related analysis of filled pauses. In Proceedings from the Workshop on Phonetic Learner Corpora, International Congress of the Phonetic Sciences, 61–63.

Graham, Calbert, Andrew Caines and Paula Buttery. 2015. Phonetic and prosodic features in automated spoken language assessment. In Proceedings from the Workshop on Phonetic Learner Corpora, International Congress of the Phonetic Sciences, 37–40.

Hwang, Myung-Hee. 2005. How strategies are used to solve listening difficulties: listening proficiency and text level effect. English Teaching 60/1: 207–226.

International Phonetic Association. 1999. Handbook of the International Phonetic Association: a guide to the use of the International Phonetic Alphabet. Cambridge: Cambridge University Press.

Kiyokawa, Hideo. 1990. A formula for predicting listenability: the listenability of English language materials 2. Wayo Women’s University Language and Literature 24: 57–74.

Kotani, Katsunori, Shota Ueda, Takehiko Yoshimi and Hiroaki Nanjo. 2014. A listenability measuring method for an adaptive computer-assisted language learning and teaching system. Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, 387–394.

Lai, Degang. 2015. A study on the influencing factors of online learners’ learning motivation. Higher Education of Social Science 9/4: 26–30.

Meyer, David. 2012. Support Vector Machines. The interface to libsvm in package e1071. https://datajobs.com/data-science-repo/SVM-in-R-[David-Meyer].pdf (accessed 27 July 2018)

Xia, Menglin, Ekaterina Kochmar and Ted Briscoe. 2016. Text readability assessment for second language learners. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics and the Asian Federation of Natural Language Processing, 12–22.

Yoon, Su-Youn, Yeonsuk Cho and Diane Napolitano. 2016. Spoken text difficulty estimation using linguistic features. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics and the Asian Federation of Natural Language Processing, 1–6.

Published
2018-12-31
How to Cite
Kotani, K., & Yoshimi, T. (2018). Machine Learning Classification of Pronunciation Difficulty for Learners of English as a Foreign Language. Research in Corpus Linguistics, 6, 1-8. https://doi.org/10.32714/ricl.06.01
Section
Articles