The TAGFACT annotator and editor: A versatile tool

Keywords: annotation tool, corpus creation, corpus edition, Spanish journalistic texts

Abstract

The multifunctional tool this paper presents has been developed within the TAGFACT project, a project that aims to automate the annotation of factuality –understood as the degree of commitment with which the writer presents situations– in Spanish journalistic texts. In what follows, the tool, which allows the compilation of the texts and the manual annotation of predicates, is described. The corpus created using it has been extracted in groups of three pieces of news covering the same event from newspapers with different ideologies (left wing, right wing and centrist). It is made up of 176 different pieces of news, containing 1,359 sentences and 46,947 words. The tool has been used so far to manually annotate a section of the ‘Gold Standard’ (approximately 10,000 words). It has proved to be versatile in that it allows for both the creation and management of corpora and corpus annotation, using any tags the user wants depending on the purpose of each corpus. 

References

Agerri, Rodrigo, Josu Bermúdez and German Rigau. 2014. Ixa pipeline: Efficient and ready to use multilingual NLP tools. In Proceedings of the Ninth International Conference on Language Resources and Evaluation. Reykjavik: European Language Resources Association, 3823–3828.

Alonso, Laura, Irene Castellón, Hortènsia Curell, Ana Fernández-Montraveta, Sònia Oliver and Glòria Vázquez. 2018. Proyecto TAGFACT: Del texto al conocimiento. Factualidad y grados de certeza en español. Procesamiento del Lenguaje Natural 61: 151–154.

Diab, Mona, Bori Levin, Teruko Mitamura, Owen Rambow, Vinodkumar Prabhakaran, Vinodkumar and Weiwe Guo. 2009. Committed belief annotation and tagging. In Manfred Stede, Chu-Ren Huang, Nancy Ide and Adam Meyers eds. Proceedings of the Third Linguistic Annotation Workshop. Singapur: Association for Computational Linguistics, 68–73.

Huang, Rongtao, Zou Bowei, Wang Hongling, Li Peifeng and Zhou Guodong. 2019. Event factuality detection in discourse. In Jie Tang, Min-Yen Kan, Dongyan Zhao, Sujian Li and Hongying Zan eds. Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science. Vol. 11839. Springer, Cham, 404–414.

Krause, Thomas and Amir Zeldes. 2016. ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities 31/1: 118–139.

Lee, Kenton, Yoav Artzi, Yejin Choi and Luke Zettlemoyer. 2015. Event detection and factuality assessment with non-expert supervision. In Lluís Màrquez, Chris Callison-Burch and Jian Su eds. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics, 1643–1648.

Lloberes Marina, Irene Castellón, Lluis Padró. 2015. Suitability of ParTes test suite for parsing evaluation. Proceedings of the 14th International Conference on Parsing Technologies. Bilbao: Association for Computational Linguistics, 61–65.

Marneffe, Marie-Catherine, Christopher D. Manning and Christopher Potts. 2012. Did it happen? The pragmatic complexity of veridicality assessment. Computational Linguistics 38/2: 301–333.

Matsuyoshi, Suguru, Megumi Eguchi, Chitose Sao, Koji Murakami, Kentaro Inui and Yuji Matsumoto. 2010. Annotating event mentions in text with modality, focus and source information. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias eds. Proceedings of the Seventh International Conference on Language Resources and Evaluation. Valetta: European Language Resources Association, 1456–1463.

Minard, Anne-Lyse, Manuela Speranza and Tommaso Caselli. 2016. Event factuality annotation task (FactA). In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro and Rachele Sprugnoli eds. Proceedings of the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Napoli: Open Edition Books, 32–39.

Mullick, Ankan, Sourav Pal, Projjal Chanda, Arijit Panigrahy, Anurag Bharadwaj, Siddhant Singh and Tanmoy Dam. 2019. D-FJ: Deep neural network based factuality judgment. TrueFact, Truth Discovery and Fact Checking: Theory and Practice workshop.

Narita, Kazuya, Junta Mizuno and Kentaro Inui. 2013. A lexicon-based investigation of research issues in Japanese factuality analysis. In Ruslan Mitkov and Jong C. Park eds. Proceedings of the Sixth International Joint Conference on Natural Language Processing. Nagoya: Asian Federation of Natural Language Processing, 587–595.

O’Donnell, Mick. 2008. The UAM CorpusTool: Software for corpus annotation and exploration. In Carmen Bretones, José Francisco Fernández, José Ramón Ibáñez, M. Elena García, M. Enriqueta Cortés, Sagrario Salaberri, M. Soledad Cruz, Nobel Perdú and Blasina Cantizano eds. Applied Linguistics Now: Understanding Language and Mind. Almería: Universidad de Almería, 1433–1447.

Ogren, Philip V. 2006. Knowtator: A protégé plug-in for annotated corpus construction. In Alex Rudnicky, John Dowding and Natasa Milic-Frayling eds. Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations. New York: Association for Computational Linguistics, 273–275.

Padró, Lluís and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards wider multilinguality. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk and Stelios Piperidis eds. Proceedings of the Eight International Conference on Language Resources and Evaluation. Istanbul: European Language Resources Association, 2473–2479.

Prabhakaran, Vinodkumar, Tomas By, Julia Hirschberg, Owen Rambow, Samira Shaikh, Tomek Strzalkowski, Jennifer Tracey, Michael Arrigo, Rupayan Basu, Micah Clark, Adam Dalton, Mona Diab, Louise Guthrie, Anna Prokofieva, Stephanie Strassel, Gregory Werner, Yorick Wilks and Janyce Wiebe. 2015. A new dataset and evaluation for belief/factuality. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics. Denver: Association for Computational Linguistics, 82–91.

Saurí, Roser. 2008. A Factuality Profiler for Eventualities in Text. Massachusetts: Brandeis University dissertation.

Saurí, Roser and James Pustejovsky. 2009. FactBank: A corpus annotated with event factuality. Language Resources and Evaluation 43/3: 227–268.

Soni, Sandeep, Tanushree Mitra, Eric Gilbert and Jacob Eisenstein. 2014. Modeling factuality judgments in social media text. In Kristina Toutanova and Hua Wu eds. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Volume 2: Short Papers. Baltimore: Association for Computational Linguistics, 415–420.

Soroa, Aitor, German Rigau, Jordi Porta, Jordi Atserias, Xavier Gómez Guinovart and Horacio Saggion. 2017. Plataformas y Sistemas de Procesamiento Lingüístico de Alto Rendimiento. Plan de impulso de las tecnologías del lenguaje: Ministerio de Energía Turismo y la Agenda Digital.

Tonelli, Sara, Rachele Sprugnoli and Manuela Speranza. 2014. NewsReader guidelines for annotation at document level. In extension of deliverable D3. Technical Report NWR-2014-2. Trento.

van Son, Chantal, Marieke van Erp, Antske Fokkens and Piek Vossen. 2014. Hope and fear: Interpreting perspectives by integrating sentiment and event factuality. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk and Stelios Piperidis eds. Proceedings of the Ninth International Conference on Language Resources and Evaluation. Reykjavik: European Language Resources Association, 26–31.

Vázquez, Gloria and Ana Fernández-Montraveta. In press. Annotating factuality in the TAGFACT corpus. Comares.

Velupillai, Sumithra. 2011. Automatic classification of factuality levels. A case study on Swedish diagnoses and the impact of local context. In Anne Moen, Stig Kjaer Andersen, Jos Aarts and Petter Hurlen eds. User Centred Networked Health Care Proceedings of the European Federation of Medical Informatics. Amsterdam: IOS Press, 559–563.

Weisser, Martin. 2016. DART – The dialogue annotation and research tool. Corpus Linguistics and Linguistic Theory 12/2: 355–388.

Wonsever, Dina, Aiala Rosá and Marisa Malcuori. 2016. Factuality annotation and learning in Spanish texts. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asunción Moreno, Jan Odijk and Stelios Piperidis eds. Proceedings of the Tenth Conference on Language Resources and Evaluation. Portoroz: European Language Resources Association, 2076–2080.

Published
2020-05-14
How to Cite
Fernández-Montraveta, A., Curell, H., Vázquez, G., & Castellón, I. (2020). The TAGFACT annotator and editor: A versatile tool. Research in Corpus Linguistics, 8(1), 131-146. https://doi.org/10.32714/ricl.08.01.08