Research in Corpus Linguistics 2019-09-26T13:28:27+00:00 Research in Corpus Linguistics Open Journal Systems <p><em><strong>Research in Corpus Linguistics</strong></em> (<em>RiCL</em>, ISSN 2243-4712) is a scholarly peer-reviewed international scientific journal published annually, aiming at the publication of contributions which contain empirical analyses of data from different languages and from different theoretical perspectives and frameworks, with the goal of improving our knowledge about the linguistic theoretical background of a language, a language family or any type of cross-linguistic phenomena/constructions/assumptions. <em>RiCL</em> invites original, previously unpublished research articles, reports on corpus development, and book reviews in the field of Corpus Linguistics. The journal also considers the publication of special issues on specific topics, whose edition can be offered to leading scholars in the field.</p> Machine Learning Classification of Pronunciation Difficulty for Learners of English as a Foreign Language 2019-09-26T08:59:33+00:00 Katsunori Kotani Takehiko Yoshimi <p>This study compiled and assessed a learner corpus to measure the difficulty of pronouncing a sentence (henceforth, pronounceability). The method of measuring pronounceability is useful for computer-assisted language learning of English as a Foreign Language that employs online materials as a resource for pronunciation training. An advantage of this resource is that learners can select materials depending on their interest, a disadvantage being that pronounceability is unknown to learners. If pronounceability is automatically measured, learners can independently access materials appropriate for their proficiency levels without teachers’ assistance. The pronounceability assessment demonstrated moderate reliability and partial validity when it was measured by learners’ subjective judgment on a five-point Likert scale. Given the reliability and validity, this study developed a pronounceability measuring method utilizing a machine learning algorithm that automatically predicts the pronounceability of a sentence based on the linguistic features of the sentences and learners’ features (i.e. learners’ scores for an English proficiency test). The proposed measuring method demonstrated a higher classification accuracy (53.7 percent) than the majority class baseline (46.0 percent).</p> 2018-12-31T00:00:00+00:00 Copyright (c) Opinion Corpus for Assessment of Study Abroad Program 2019-09-26T09:02:11+00:00 Katsunori Kotani Takehiko Yoshimi Mayumi Uchida <p>This study compiled an opinion corpus for developing a method for automatically evaluating a study-abroad program. Evaluation should cover not only academic experience at a host institution but also intercultural experience in the dormitory and interpersonal experience with local students, which helps improve a study-abroad program. The corpus included 600 students’ opinions on the satisfaction with academic, intercultural and interpersonal experiences, consisting of 40,024 words in total. Each opinion was annotated according to the opinion polarity determined by an existing sentiment classifier automatically. When automatically classified opinion polarity was compared with manually determined opinion polarity, a different distribution was observed. Because the existing classifier was not trained with a corpus that dealt with the issues related to students’ opinions about a study-abroad program, this result suggested the need of a corpus for study-abroad program evaluation. The opinion classifier of this study trained with the opinion corpus demonstrated a higher accuracy (83.5 percent) than the majority class baseline (70.9 percent).</p> 2018-12-31T00:00:00+00:00 Copyright (c) 'Give it him and then I'll give you money for it.' The dative alternation in Contemporary British English 2019-09-26T09:04:07+00:00 Juan Lorente Sánchez <p>‘Dative alternation’ refers to a linguistic phenomenon related to ditransitive verbs, that is, verbs which take a subject and two objects referring to a theme and a recipient. In English, the phenomenon offers the possibility of alternation between a prepositional object construction (PREP), where the recipient is encoded as a prepositional phrase (<em>give it to him</em>), a double object construction (DOC), where the recipient precedes the theme (<em>give him it</em>) and an alternative double object construction (altDOC), where the theme takes precedence over the recipient (<em>give it him</em>), the latter constrained to dialectal usage. Even though this alternation has been extensively addressed in the literature, few studies have considered language-external factors in determining the choice of encoding. This paper analyses the distribution of ditransitive forms in competition in contemporary British English from a twofold perspective, shedding some light on the distribution of these variants across time, along with the study of PREP, DOC and altDOC in relation to their sociolinguistic dimension. The corpus used as source of evidence is the <em>British National Corpus</em>, a 100-million-word collection of both written and spoken language from a wide range of sources.</p> 2018-12-31T00:00:00+00:00 Copyright (c) Reflexive metadiscourse in a corpus of Spanish bachelor dissertations in EFL 2019-09-26T09:06:36+00:00 Noelia Navarro Gil <p>Academic English has often been described as a reader-oriented discourse, in which the structure, objectives and claims are made explicit and carefully framed. Metadiscourse markers help to build coherence and cohesion, and allow writers to guide their readership through their texts. Spanish EFL learners often transfer part of their L1 writing culture into their L2 texts. This is problematic because academic Spanish tends to show a slightly more reader-responsible style, and academic texts call for a high degree of disciplinarity: learners not only have to be aware of the conventions of the L2 regarding metadiscourse, but also of their own discipline. This article explores the use of reflexive metadiscourse in a learner corpus of bachelor dissertations&nbsp; written in English by Spanish undergraduates in medicine and linguistics, and compares the results with an expert corpus of research articles. The results show that overall both corpora contain similar frequencies of textual metadiscourse, but this is only true when we look at the results according to discipline. In spite of this quantitative similarity, there are cases of overuse and underuse in the learner corpus that highlight features of the bachelor dissertations&nbsp; genre, on the one hand, and EFL Spanish writing, on the other hand.</p> 2018-12-31T00:00:00+00:00 Copyright (c) Uptake of corpus tools in the Spanish Higher Education context: a mixed-methods study 2019-09-26T09:09:21+00:00 Pascual Pérez-Paredes Purificación Sánchez Hernández <p>This paper examines the introduction and use of corpus consultation in the course of a training initiative sponsored by the Professional Training Unit of a medium-sized University in Spain. ‘Introducing Research Articles (RA) Writing’ was a 12-hour module that offered researchers the opportunity to gain insight into the nature of the research articles (RA) across different disciplines. The researchers (n=25) and the instructors met three times in two-hour sessions during a two-month period. All participants completed two post-task questionnaires and a delayed questionnaire. An interview was completed two years after the end of the course. After task 2, 64 percent of the participants found corpus tools to be of great help when writing their research articles. No significant differences between B1 and B2–C1 groups were found in their assessment of the writing tools provided. Increased familiarity with the corpus tools did not result in a better appraisal of these resources and all participants seemed to favour the use of the curated list of vocabulary provided. The delayed questionnaire and subsequent delayed interviews (n=5) revelaled that the use of corpora had had limited or no impact on the writing practices of these researchers. We argue that the use of corpora in professional writing contexts requires careful planning as well as continued institutional support.</p> 2018-12-31T00:00:00+00:00 Copyright (c) Revisiting you know and I mean: some notes on the functions of the two pragmatic markers in contemporary spoken American English 2019-09-26T09:11:10+00:00 Daniela Pettersson-Traba <p>This article presents a corpus-based study of the pragmatic markers <em>you know</em> and <em>I mean</em> in contemporary spoken American English. Previous research indicates that <em>you know</em> and <em>I mean</em> are polysemous in their discourse roles, serving various functions in speech. By drawing on tokens extracted from the <em>Corpus of Contemporary American English</em>, the <em>Corpus of American Soap Operas</em> and the <em>Corpus of Spoken, Professional American English</em>, which include data from text types differing on the scales of formality and spontaneity, the main aims are 1) to compare the use of these two pragmatic markers and 2) to explore whether and how their behavior differs in three text types: TV and radio programs, soap operas, and White House press conferences and faculty/committee meetings. The results demonstrate that, despite overlapping in some of their functions, <em>you know</em> and <em>I mean</em> cannot be used interchangeably in discourse. Additionally, the functions of the two pragmatic markers vary significantly depending on the corpora, which is due to the particular characteristics of the speech situations in which they are used.</p> 2018-12-31T00:00:00+00:00 Copyright (c) Investigating the impact of structural factors upon that/zero complementizer alternation patterns in verbs of cognition: a diachronic corpus-based multifactorial analysis 2019-09-26T13:28:27+00:00 Christopher Shank Koen Plevoets <p>This corpus-based study examines the diachronic development of the <em>that</em>/zero alternation with nine verbs of cognition, viz. <em>think</em>, <em>believe</em>, <em>feel</em>, <em>guess</em>, <em>imagine</em>, <em>know</em>, <em>realize</em>, <em>suppose</em> and <em>understand</em> by means of a stepwise logistic regression analysis. The data comprised a total of (n=5,812) <em>think</em>, (n=3,056) <em>believe</em>, (n=1,273) <em>feel</em>, (n=1,885) <em>guess</em>, (n=2,225) <em>imagine</em>, (n=1,805) <em>know</em>, (n=1,244) <em>realize</em>, (n=2,836) <em>suppose</em> and (n=3,395) <em>understand</em> tokens from both spoken and written corpora from 1580–2012. Taking our cue from previous research suggesting that there has been a diachronic increase in the use of the zero complementizer form from Late Middle / Early Modern to Present-day English, we use a large set of parallel spoken and written diachronic data and a rigorous quantitative methodology to test this claim with the nine aforementioned verbs. In addition, we also investigate the impact of eleven structural features, which have been claimed to act as predictors for the use or presence of the zero complementizer form for ‘panchronic’ (i.e. effects are aggregated over all time periods) and diachronic effects. The objectives of this study are to examine the following: (i) whether there is indeed a diachronic trend towards more zero use; (ii) whether the conditioning factors proposed in the literature indeed predict the zero form; (iii) to what extent these factors interact; and (iv) whether the predictive power of the conditioning factors becomes stronger or weaker over time. The analysis shows that, contrary to the aforementioned belief that the zero form has been on the increase, there is in fact a steady decrease in zero use, but the extent of this decrease is not the same for all verbs. In addition, the analysis of interactions with verb type indicates differences between verbs in terms of the predictive power of the conditioning factors. Additional significant interactions emerged, notably with verb, mode (i.e. spoken or written data) and period. The interactions with period show that certain factors that are good predictors of the zero form overall lose predictive power over time.</p> 2018-12-31T00:00:00+00:00 Copyright (c)