The Construction Complexity Calculator (ConPlex): A tool for calculating Nelson’s (2024) construction-based complexity measure

Authors

DOI:

https://doi.org/10.32714/ricl.13.02.05

Keywords:

constructions, complexity, corpus tool creation, corpus tool validation, language development

Abstract

The current study aims to increase the accessibility of Nelson’s (2024) recently suggested construction-based complexity measure by providing a tool that can calculate the measure for single or multiple texts. To validate the tool, complexity scores for the International Corpus Network of Asian Learners of English corpus (ICNALE) were compared with Nelson’s (2024) results. In addition, complexity scores were calculated for a new dataset, the Common European Framework of Reference English Listening Corpus (CEFR), along with the MERLIN corpus, which includes learner writing samples from learners of Czech, German, and Italian. Complexity scores generally increased across CEFR levels in all of the datasets. However, the complexity scores in the current study tend to be higher than the original study due to differences in the sentence splitting approach. The sentence tokenisation method used is deemed to be more appropriate, and it may be concluded that the Construction Complexity Calculator (ConPlex) tool accurately calculates Nelson’s measure. It is hoped that the tool will allow researchers to calculate the complexity of constructions at the text level for a wide range of research purposes.

Downloads

Download data is not yet available.

References

Biber, Douglas, Bethany Gray, Tove Larsson and Shelley Staples. 2024. Grammatical analysis is required to describe grammatical (and “syntactic”) complexity: A commentary on “complexity and difficulty in second language acquisition: A theoretical and methodological overview.” Language Learning. https://doi.org/10.1111/lang.12683

Boyd, Adriane, Jirka Hana, Lionel Nicolas, Detmar Meurers, Katrin Wisniewski, Andrea Abel, Karin Schöne, Barbora Štindlová and Chiara Vettori. 2014. The MERLIN corpus: Learner language and the CEFR. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk and Stelios Piperidis eds. Proceedings of the Ninth International Conference on Language Resources and Evaluation. Reykjavik: European Language Resources Association, 1281–1288.

Brezina, Vaclav and Gabriele Pallotti. 2019. Morphological complexity in written L2 texts. Second Language Research 35/1: 99–119.

Bulté, Bram, Alex Housen and Gabriele Pallotti. 2024. Complexity and difficulty in second language acquisition: A theoretical and methodological overview. Language Learning. https://doi.org/10.1111/lang.12669

Crossley, Scott, Aron Heintz, Joon Suh Choi, Jordan Batchelor, Mehrnoush Karimi and Agnes Malatinszky. 2023. A large-scaled corpus for assessing text readability. Behavior Research Methods 55/2: 491–507.

Ehret, Katharina, Aleksandrs Berdicevskis, Christian Bentz and Alice Blumenthal-Dramé. 2023. Measuring language complexity: Challenges and opportunities. Linguistics Vanguard 9/s1: 1–8.

Goldberg, Adele E. 2003. Constructions: A new theoretical approach to language. Trends in Cognitive Sciences 7/5: 219–224.

Hawkins, John A. 2015. A Comparative Typology of English and German: Unifying the Contrasts. Abingdon: Routledge.

Hledíková, Hana and Magda Ševčíková. 2024. Conversion in languages with different morphological structures: A semantic comparison of English and Czech. Morphology 34/1: 73–102.

Honnibal, Matthew, Ines Montani, Sofie Van Landeghem, Adriane Boyd and Henning Peters. 2023. Explosion/spaCy: v3.7.2: Fixes for APIs and requirements. Zenodo. https://doi.org/10.5281/ZENODO.1212303

Ishikawa, Shin’ichiro. 2023. The ICNALE Guide: An Introduction to a Learner Corpus Study on Asian Learners’ L2 English. Abingdon: Routledge.

Kettunen, Kimmo. 2014. Can type-token ratio be used to show morphological complexity of languages? Journal of Quantitative Linguistics 21/3: 223–245.

Kyle, Kristopher. 2016. Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-based Indices of Syntactic Sophistication. Atlanta, GA: Georgia State University dissertation.

Kyle, Kristopher, Scott A. Crossley and Scott Jarvis. 2021. Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly 18/2: 154–170.

Larsen-Freeman, Diane. 1997. Chaos/complexity science and second language acquisition. Applied Linguistics 18/2: 141–165.

Larsen-Freeman, Diane. 2017. Complexity theory: The lessons continue. In Lourdes Ortega and ZhaoHong Han eds. Complexity Theory and Language Development: In Celebration of Diane Larsen-Freeman. Amsterdam: John Benjamins, 11–50.

Larsen-Freeman, Diane and Lynne Cameron. 2008. Research methodology on language development from a complex systems perspective. The Modern Language Journal 92/2: 200–213.

Lu, Xiaofei. 2017. Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Language Testing 34/4: 493–511.

Lu, Xiaofei. 2024. Towards greater conceptual clarity in complexity and difficulty: A commentary on “complexity and difficulty in second language acquisition: A theoretical and methodological overview.” Language Learning https://doi.org/10.1111/lang.12688

MacWhinney, Brian. 2000. The CHILDES Project: Tools for Analyzing Talk. Mahwah: Lawrence Erlbaum Associates.

McNamara, Danielle S., Arthur C. Graesser, Philip M. McCarthy and Zhiqiang Cai. 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. New York: Cambridge University Press.

Mizumoto, Atsushi. 2024. Developing and disseminating data analysis tools for open science. In Luke Plonsky ed. Open Science in Applied Linguistics. Online: Applied Linguistics Press, 123–131. https://www.appliedlinguisticspress.org/home/catalog/plonsky_2024

Nelson, Robert. 2024. Using constructions to measure developmental language complexity. Cognitive Linguistics 35/4: 481–511.

Nogolová, Michaela, Radek Čech, Michaela Hanušková and Miroslav Kubát. 2023. The development of sentence and clause lengths in Czech L2 texts. Korpus - gramatika - axiologie 28: 22–37.

Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton and Christopher D. Manning. 2020. Stanza: A Python natural language processing toolkit for many human languages. In Asli Celikyilmaz and Tsung-Hsien Wen eds. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Online: Association for Computational Linguistics, 101–108. https://aclanthology.org/2020.acl-demos.14.pdf

Shannon, Claude, E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27/3: 379–423.

Skehan, Peter. 2009. Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics 30/4: 510–532.

Spina, Stefania. 2025. Complexity and accuracy of verbal morphology in written L2 Italian: The role of proficiency and contingency. International Journal of Learner Corpus Research 11/1: 114–144.

Uchida, Satoru, Yuki Arase and Tomoyuki Kajiwara. 2024. Profiling English sentences based on CEFR levels. ITL - International Journal of Applied Linguistics 175/1: 103–126.

Virtanen, Pauli, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, and Evgeni Burovski. 2020. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods 17/3: 261–272.

Weiss, Zarah and Detmar Meurers. 2019. Broad linguistic modeling is beneficial for German L2 proficiency assessment. In Andrea Abel, Aivars Glaznieks, Verena Lyding and Lionel Nicolas eds. Widening the Scope of Learner Corpus Research: Selected Papers from the Fourth Learner Corpus Research Conference. Louvain-la-Neuve: Presses Universitaires de Louvain, 419–435.

Downloads

Published

2025-12-29

How to Cite

R. Cooper, C. (2025). The Construction Complexity Calculator (ConPlex): A tool for calculating Nelson’s (2024) construction-based complexity measure. Research in Corpus Linguistics, 13(2), 124–143. https://doi.org/10.32714/ricl.13.02.05