Machine Learning Classification of Pronunciation Difficulty for Learners of English as a Foreign Language

  • Katsunori Kotani Kansai Gaidai University
  • Takehiko Yoshimi Ryukoku University


This study compiled and assessed a learner corpus to measure the difficulty of pronouncing a sentence (henceforth, pronounceability). The method of measuring pronounceability is useful for computer-assisted language learning of English as a Foreign Language that employs online materials as a resource for pronunciation training. An advantage of this resource is that learners can select materials depending on their interest, a disadvantage being that pronounceability is unknown to learners. If pronounceability is automatically measured, learners can independently access materials appropriate for their proficiency levels without teachers’ assistance. The pronounceability assessment demonstrated moderate reliability and partial validity when it was measured by learners’ subjective judgment on a five-point Likert scale. Given the reliability and validity, this study developed a pronounceability measuring method utilizing a machine learning algorithm that automatically predicts the pronounceability of a sentence based on the linguistic features of the sentences and learners’ features (i.e. learners’ scores for an English proficiency test). The proposed measuring method demonstrated a higher classification accuracy (53.7 percent) than the majority class baseline (46.0 percent).