How to build a corpus for a tool-based approach to determinologisation in the field of particle physics

Keywords: corpus-building; determinologisation; comparable corpora; tool-based approach; representativeness; textual terminology

Abstract

This paper discusses corpus design and building issues when dealing with a complex, multidimensional linguistic phenomenon such as determinologisation. Representing this phenomenon in corpus data imposes an original reflection on both the dimensions involved in the determinologisation process and some of the essential concepts of corpus building. In particular, this paper focuses on the necessity of representing the progressive aspects of determinologisation in the corpus, i.e. through levels of specialisation and through time, and the practical issues this raises. At the same time, it will show that a representative corpus of determinologisation in a specific domain (in this case, particle physics) implies clear and objective criteria when it comes to picking individual texts. Four principles will be established to this end. The discussion will lead to the proposal of a solid text selection procedure, which ensures that the peculiarities of determinologisation in the domain of particle physics are reflected in the corpus.

References

Ahmad, Khurshid and Margaret Rogers. 2001. Corpus linguistics and terminology extraction. In Sue E. Wright and Gerhard Budin eds. Handbook of Terminology Management. Amsterdam: John Benjamins, 725–760.

Bhatia, Vijay K. 2004. Worlds of Written Discourses: A Genre-based View. London: Continuum.

Beacco, Jean-Claude and Sophie Moirand. 1995. Autour des discours de transmission des connaissances. Langages 117: 32–53.

Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8/4: 243–257.

Bourigault, Didier and Monique Slodzian. 1999. Pour une terminologie textuelle. Terminologies Nouvelles 19: 19–32.

Bowker, Lynne and Jennifer Pearson. 2002. Working with Specialized Language. A Practical Guide to Using Corpora. London: Routledge.

Cabré, M. Teresa. 1994. Terminologie et dictionnaires. META 39/4: 589–597.

Condamines, Anne. 2003. Sémantique et Corpus Spécialisés: Constitution de Bases de Connaissances Terminologiques. Toulouse: Université Toulouse le Mirail.

Condamines, Anne and Aurélie Picton. 2014. Des communiqués de presse du Cnes à la presse généraliste. Vers un observatoire de la diffusion des termes. In Pascaline Dury, José Carlos de Hoyos, Julie Makri-Morel, François Maniez, Vincent Renner and María Belén Villar Diaz eds. La Néologie en Langue de Spécialité: Détection, Implantation et Circulation des Nouveaux Termes. Lyon: Centre de Recherche en Terminologie et Traduction, Université Lumière Lyon 2, 165–188.

Daille, Béatrice. 2017. Term Variation in Specialised Corpora. Amsterdam: John Benjamins.

Delavigne, Valérie. 2001. Les Mots du Nucléaire. Contribution Socioterminologique à une Analyse des Discours de Vulgarisation. Université de Rouen dissertation.

Drouin, Patrick. 2003. Term extraction using non-technical corpora as a point of leverage. Terminology 9/1: 99–117.

Drouin, Patrick. 2007. Identification automatique du lexique scientifique transdisciplinaire. Revue Française de Linguistique Appliquée 12/2: 45–64.

Drouin, Patrick, Aline Francoeur, John Humbley and Aurélie Picton eds. 2017. Multiple Perspectives on Terminological Variation. Amsterdam: John Benjamins.

Dury, Pascaline. 2008. The rise of carbon neutral and compensation carbone: A diachronic investigation into the migration of vocabulary from the language of ecology to newspaper language and vice versa. Terminology 14/2: 230–248.

Dury, Pascaline and Aurélie Picton. 2009. Terminologie et diachronie: Vers une réconciliation théorique et méthodologique? Revue Française de Linguistique Appliquée 14/2: 31–41.

Fernández-Silva, Sabela. 2016. The cognitive and rhetorical role of term variation and its contribution to knowledge construction in research articles. Terminology 22/1: 52–79.

Freixa, Judit. 2006. Causes of denominative variation in terminology. A typology proposal. Terminology 12/1: 51–77.

Guilbert, Louis. 1975. La Créativité Lexicale. Paris: Larousse.

Habert, Benoît. 2000. Des corpus représentatifs: De quoi, pour quoi, comment? In Mireille Bilger ed. Linguistique sur Corpus: Études et Réflexions. Perpignan: Les Presses de l’Université de Perpignan, 11–58.

Halskov, Jakob. 2005. Probing the properties of determinologization: The DiaSketch. Lambda 29: 39–63.

Jacobi, Daniel. 1986. Diffusion et Vulgarisation: Itinéraires du Texte Scientifique. Paris: Les Belles Lettres.

Kennedy, Graeme. 1998. An Introduction to Corpus Linguistics. London: Longman.

Leech, Geoffrey. 2007. New resources, or just better old ones? The Holy Grail of representativeness. In Marianne Hundt, Nadja Nesselhauf and Carolin Biewer eds. Corpus Linguistics and the Web. Amsterdam: Rodopi, 133–149.

León-Araúz, Pilar, Antonio San Martín and Pamela Faber. 2016. Pattern-based word sketches for the extraction of semantic relations. In Patrick Drouin, Natalia Grabar, Thierry Hamon, Kyo Kageura and Koichi Takenchi eds. Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016). Osaka, Japan, 73–82.

Loffler-Laurian, Anne-Marie. 1983. Typologie des discours scientifiques: Deux approches. Études de Linguistique Appliquée 51: 8–20.

McEnery, Tony and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.

Meyer, Ingrid and Kristen Mackintosh. 1996. The corpus from a terminographer’s viewpoint. International Journal of Corpus Linguistics 1/2: 257–285.

Meyer, Ingrid and Kristen Mackintosh. 2000. When terms move into our everyday lives: An overview of de-terminologization. Terminology 6/1: 111–138.

Moirand, Sophie. 2007. Les Discours de la Presse Quotidienne. Observer, Analyser, Comprendre. Paris: Presses universitaires de France, Linguistique nouvelle.

Nicolae, Cristina and Valérie Delavigne. 2013. In Geoffrey Williams ed. Actes des Sixièmes Journées de la Linguistique de Corpus. Lorient: Université de Bretagne-Sud, 217–229.

Pearson, Jennifer. 1998. Terms in Context. Amsterdam: John Benjamins.

Picton, Aurélie. 2011. Picturing short-period diachronic phenomena in specialised corpora. A textual terminology description of the dynamics of knowledge in space technologies. Terminology 17/1: 134–156.

Picton, Aurélie. 2018. Terminologie outillée et diachronie: Éléments de réflexion autour d’une réconciliation. ASp 74: 27–52.

Renouf, Antoinette. 2017. Some corpus-based observations on determinologisation. Neologica 11: 21–48.

Siepmann, Dirk, Christoph Bürgel and Sascha Diwersy. 2017. The Corpus de Référence du Français Contemporain (CRFC) as the first genre-diverse mega-corpus of French. International Journal of Lexicography 30/1: 63–84.

Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Tutin, Agnès. 2007. Autour du lexique et de la phraséologie des écrits scientifiques. Revue Française de Linguistique Appliquée 12/2: 5–14.

Ungureanu, Ludmila. 2006. L’Interpénétration Langue Générale-Langue Spécialisée dans le Discours d’Internet. Paris: Connaissances et Savoirs.

Published
2019-11-10
How to Cite
Humbert-Droz, J., Picton, A., & Condamines, A. (2019). How to build a corpus for a tool-based approach to determinologisation in the field of particle physics. Research in Corpus Linguistics, 7, 1-17. https://doi.org/10.32714/ricl.07.01