Building the Great Recession News Corpus (GRNC): A contemporary diachronic corpus of economy news in English

Keywords: corpus linguistics, financial discourse, crisis studies, information retrieval, sentiment analysis

Abstract

The paper describes the process involved in developing the Great Recession News Corpus (GRNC); a specialized web corpus, which contains a wide range of written texts obtained from the Business section of The Guardian and The New York Times between 2007 and 2015. The corpus was compiled as the main resource in a sentiment analysis project on the economic/financial domain. In this paper we describe its design, compilation criteria and methodological approach, as well as the description of the overall creation process. Although the corpus can be used for a variety of purposes, we include a sentiment analysis study on the evolution of the sentiment conveyed by the word credit during the years of the Great Recession which we think provides validation of the corpus.

Downloads

Download data is not yet available.

References

Al-Rawi, Ahmed. 2019. Viral news on social media. Digital Journalism 7/1: 63–79.

Alba-Juez, Laura and Geoff Thompson. 2014. The many faces and phases of evaluation. In Laura Alba-Juez and Geoff Thompson eds. Evaluation in Context. Amsterdam: John Benjamins, 3–24.

Baroni, Marco and Bernardini, Silvia. 2004. BootCaT: Bootstrapping corpora and terms from the web. In María Tersa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa and Raquel Silva eds. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). Paris: European Language Resources Association, 1313–1316.

Bednarek, Monika and Helen Caple. 2012. News Discourse. London: A&C Black.

Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8/4: 243–257.

Blevins, Dane P. and Roberto Ragozzino. 2019. On social media and the formation of organizational reputation: How social media are increasing cohesion between organizational reputation and traditional media for stakeholders. Academy of Management Review 44/1: 219–222.

Botella, Ana, Keith Stuart and Lucía Gadea. 2015. A journalistic corpus: A methodology for the analysis of the financial crisis in Spain. Procedia – Social and Behavioral Sciences 198: 42–51.

Bowker, Lynne and Jennifer Pearson. 2002. Working with Specialized Language: A Practical Guide to Using Corpora. London: Routledge.

Chung, Jae Eun. 2018. Peer influence of online comments in newspapers: Applying social norms and the social identification model of deindividuation effects (SIDE): Social Science Computer Review 36/5: 551–567.

ComScore. 2012. Most Read Online Newspapers in the World: Mail Online, New York Times and The Guardian. https://www.comscore.com/Insights/Infographics/Most-Read-Online-Newspapers-in-the-World-Mail-Online-New-York-Times-and-The-Guardian (4 May, 2020.)

Conboy, Martin. 2006. Tabloid Britain: Constructing a Community through Language. London: Routledge.

Diesner, Jana, Terril L. Frantz and Kathleen M. Carley. 2005. Communication networks from the Enron Email Corpus “It’s always about the people. Enron is no Different.” Computational and Mathematical Organization Theory 11/3: 201–228.

Douglas, Fiona M. 2003. The Scottish Corpus of Texts and Speech: Problems of corpus design. Literary and Linguistic Computing 18/1: 23–37.

Duffy, Andrew and Megan Knight. 2019. Don’t be stupid. Journalism Studies 20/7: 932–951.

Etter, Michael, Davide Ravasi and Elanor Colleoni. 2017. Social media and the formation of organizational reputation. Academy of Management Review 44/1: 28–52.

European Central Bank. 2012. Verbatim of the remarks made by Mario Draghi. Speech given at the Global Investment Conference. London, 26 July 2012. https://www.ecb.europa.eu/press/key/date/2012/html/sp120726.en.html (25 May, 2020.)

European Central Bank. 2015. Introductory statement to the press conference (with Q&A) by Mario Draghi. Frankfurt am Main, 22 January 2015. https://www.ecb.europa.eu/press/pressconf/2015/html/is150122.en.html (25 May, 2020.)

Franklin, Bob. 2014. The future of journalism. Journalism Studies 15/5: 481–499.

Gablasova, Dana, Vaclav Brezina and Tony McEnery. 2017. Collocations in corpus-based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning 67/1: 155–179.

Golan, Guy. 2006. Inter-media agenda setting and global news coverage. Journalism Studies 7/2: 323–333.

Handford, Michael. 2010. The Language of Business Meetings. Cambridge: Cambridge University Press.

Huan, Changpeng. 2018. Journalistic Stance in Chinese and Australian Hard News. Shanghai: Springer.

Huxford, John. 2012. Reporting on recession: Journalism, prediction, and the economy. International Business & Economics Research Journal (IBER) 11/3: 343–356.

Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý and Vít Suchomel. 2014. The Sketch Engine: Ten years on. Lexicography 1/1: 7–36.

Li, Yongyan and David D. Qian. 2010. Profiling the academic word list (AWL) in a financial corpus. System 38/3: 402–411.

Link Klipper 1.0.0. 2017. http://www.codebox.in/products/linkklipper/ (7 May, 2020.)

Lischinsky, Alon. 2011. In times of crisis: A corpus approach to the construction of the global financial crisis in annual reports. Critical Discourse Studies 8/3: 153–168.

Marcus, Mitchell P., Beatrice Santorini and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19/2: 313–330.

McEnery, Tony, Richard Xiao and Yukio Tono. 2006. Corpus-based Language Studies: An Advanced Resource Book. London: Routledge.

Moirand, Sophie. 2007. Les Discours de la Presse Quotidienne. Observer, Analyser, Comprendre. Paris: Presses Universitaires de France.

Moreno-Ortiz, Antonio. 2017a. Lingmotif: A user-focused sentiment analysis tool. Procesamiento del Lenguaje Natural 58: 133–140.

Moreno-Ortiz, Antonio. 2017b. Lingmotif: Sentiment analysis for the digital humanities. In Mirella Lapata, Phil Blunsom and Alexander Koller eds. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia: Association for Computational Linguistics, 73–76.

Moreno-Ortiz, Antonio, Javier Fernández-Cruz and Chantal Pérez-Hernández. 2020. Design and evaluation of SentiEcon: A fine-grained economic/financial sentiment lexicon from a corpus of business news. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asución Moreno, Jan Odijk and Stelios Piperidis eds. Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020). Marseille: European Language Resources Association, 5067–5074.

Nafría, Ismael. 2017. La Reinvención del New York Times: Cómo la Dama Gris del Periodismo se está Adaptando. Austin: Knight Center.

Newman, Nic. 2009. The Rise of Social Media and its Impact on Mainstream Journalism. Oxford: Reuters Institute for the Study of Journalism, Department of Politics and International Relations, University of Oxford.

Renouf, Antoinette. 2002. The time dimension in modern English corpus linguistics. In Bernhard Kettemann and Georg Marko eds. Teaching and Learning by Doing Corpus Analysis. Proceedings of the Fourth International Conference on Teaching and Language Corpora, Graz 19-24 July, 2000. Amsterdam: Brill/Rodopi, 27–41.

Rojo López, Ana María and María Ángeles Orts Llopis. 2010. Metaphorical pattern analysis in financial texts: Framing the crisis in positive or negative metaphorical terms. Journal of Pragmatics 42/12: 3300–3313.

Rose, Tony, Mark Stevenson and Miles Whitehead. 2002. The Reuters Corpus Volume 1– From yesterday’s news to tomorrow’s language resources. In Manuel González Rodríguez and Carmen Paz Suárez Araujo eds. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). Las Palmas de Gran Canaria: European Language Resources Association, 827–833.

Sadjirin, Roslan, Roslina Aziz, Nordin Abdul, Ismail Mohd Rozaidi and Norzie Diana Baharum. 2018. The development of Malaysian Corpus of Financial English (MaCFE). Journal of Language Studies 18/3: 73–100.

Schröter, Melani and Petra Storjohann. 2015. Patterns of discourse semantics: A corpus-assisted study of financial crisis in British newspaper discourse in 2009. Pragmatics and Society 6/1: 43–66.

Schudson, Michael. 1989. The sociology of news production. Media, Culture & Society 11/3: 263–282.

Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Sinclair, John. 2005. Corpus and text – Basic principles. In Martin Wynne ed. Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books. http://users.ox.ac.uk/~martinw/dlc/index.htm (7 May, 2020.)

Thompson, Geoff and Susan Hunston. 2000. Evaluation: An introduction. In Susan Hunston and Geoff Thompson eds. Evaluation in Text: Authorial Stance and the Construction of Discourse. Oxford: Oxford University Press, 1–26.

Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.

Van Belle, Douglas A. 2003. Bureaucratic responsiveness to the news media: Comparing the influence of The New York Times and network television news coverage on US foreign aid allocations. Political Communication 20/3: 263–285.

Xiao, Richard. 2010. Corpus creation. In Nitin Indurkhya and Frederick J. Damerau eds. Handbook of Natural Language Processing. Boca Raton: Chapman & Hall/CRC, 147–165.

Published
2020-07-10
How to Cite
Fernández-Cruz, J., & Moreno-Ortiz, A. (2020). Building the Great Recession News Corpus (GRNC): A contemporary diachronic corpus of economy news in English. Research in Corpus Linguistics, 8(2), 28-45. https://doi.org/10.32714/ricl.08.02.02