Evaluating stance annotation of Twitter data

Keywords: stance-taking, social media discourse, corpus annotation, inter-coder reliability


Taking stance towards any topic, event or idea is a common phenomenon on Twitter and social media in general. Twitter users express their opinions about different matters and assess other people’s opinions in various discursive ways. The identification and analysis of the linguistic ways that people use to take different stances leads to a better understanding of the language and user behaviour on Twitter. Stance is a multidimensional concept involving a broad range of related notions such as modality, evaluation and sentiment. In this study, we annotate data from Twitter using six notional stance categories —contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty—­­ following a comprehensive annotation protocol including inter-coder reliability measurements. The relatively low agreement between annotators highlighted the challenges that the task entailed, which made us question the inter-annotator agreement score as a reliable measurement of annotation quality of notional categories. The nature of the data, the difficulty of the stance annotation task and the type of stance categories are discussed, and potential solutions are suggested.


Download data is not yet available.


Metrics Loading ...


AlDayel, Abeer and Walid Magdy. 2021. Stance detection on social media: State of the art and trends. Information Processing & Management 58/4: 102597. https://doi.org/10.1016/j.ipm.2021.102597

Alonso Ameida, Francisco. 2015. Introduction to stance language. Research in Corpus Linguistics 3: 1–5.

Artstein, Ron and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34/4: 555–596.

Berman, Ruth, Hrafnhildur Ragnarsdóttir and Sven Strömqvist. 2002. Discourse stance: Written and spoken language. Written Language & Literacy 5/2: 255–289.

Biber, Douglas. 2006. Stance in spoken and written university registers. Journal of English for Academic Purposes 5/2: 97–116.

Boyd, Danah, Scott Golder and Gilad Lotan. 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter. Proceedings of the 43rd Hawaii International Conference on System Sciences. Washington: IEEE Computer Society, 1–10. https://ieeexplore.ieee.org/document/5428313

Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20/1: 37–46.

Du Bois, John W. 2007. The stance triangle. In Robert Englebretson ed. Stancetaking in Discourse: Subjectivity, Evaluation, Interaction. Amsterdam: John Benjamins, 139–182.

Ekberg, Lena and Carita Paradis. 2009. Evidentiality in language and cognition. Functions of Language 16/1: 5–7.

Facchinetti, Roberta, Frank Palmer and Manfred Krug. 2003. Modality in Contemporary English. Berlin: Walter de Gruyter.

Faulkner, Adam. 2014. Automated classification of stance in student essays: An approach using stance target information and the Wikipedia link-based measure. In William Eberle and Chutima Boonthum-Denecke eds. Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference. Florida: Association for the Advancement of Artificial Intelligence, 174–179.

Feinstein, Alvan R. and Domenic V. Cicchetti. 1990. High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology 43/6: 543–549.

Ferreira, William and Andreas Vlachos. 2016. Emergent: A novel data-set for stance classification. In Kevin Knight, Ani Nenkova and Owen Rambow eds. Proceedings of the Association for Computational Linguistics: Human Language Technologies, 1163–1168. https://aclanthology.org/N16-1138/

Fuoli, Matteo. 2018. A stepwise method for annotating APPRAISAL. Functions of Language 25/2: 229–258.

Ghosh, Shalmoli, Prajwal Singhania, Siddharth Singh, Koustav Rudra and Saptarshi Ghosh. 2019. Stance detection in web and social media: A comparative study. In Patrice Bellot, Chiraz Trabelsi, Josiane Mothe, Fionn Murtagh, Jian Yun Nie, Laure Soulier, Eric SanJuan, Linda Cappellato and Nicola Ferro eds. Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages. Cham: Springer, 75–87.

Gwet, Kilem. 2002. Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Statistical Methods for Inter-rater Reliability Assessment 1: 1–5.

Hasan, Kazi Saidul and Vincent Ng. 2014. Why are you taking this stance? Identifying and classifying reasons in ideological debates. In Alessandro Moschitti, Bo Pang and Walter Daelemans eds. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: Association for Computational Linguistics, 751–762.

Hernández, Nuria. 2014. New media, new challenges: Exploring the frontiers of corpus linguistics in the linguistics curriculum. Research in Corpus Linguistics 1: 17–31.

Hidalgo‐Downing, Laura. 2012. Grammar and evaluation. The Encyclopedia of Applied Linguistics. https://doi.org/10.1002/9781405198431.wbeal1471

Hoek, Jet and Merel Scholman. 2017. Evaluating discourse annotation: Some recent insights and new approaches. In Harry Bunt ed. Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation. Tilburg: Tilburg University, 1–13. https://aclanthology.org/W17-7401/

Honey, Courtenay and Susan C. Herring. 2009. Beyond microblogging: Conversation and collaboration via Twitter. Proceedings of the 42nd Hawaii International Conference on System Sciences. Waikoloa: IEEE Computer Society, 1–10. https://ieeexplore.ieee.org/document/4755499

Hripcsak, George and Daniel F. Heitjan. 2002. Measuring agreement in medical informatics reliability studies. Journal of Biomedical Informatics 35/2: 99–110.

Hunston, Susan and Geoffrey Thompson. 2000. Evaluation in Text: Authorial Stance and the Construction of Discourse. Oxford: Oxford University Press.

Hyland, Ken. 2005. Stance and engagement: A model of interaction in academic discourse. Discourse Studies 7/2: 173–192.

Jacknick, Christine M. and Sharon Avni. 2017. Shalom, bitches: Epistemic stance and identity work in an anonymous online forum. Discourse, Context & Media 15: 54–64.

Jaffe, Alexandra. 2009. Stance: Sociolinguistic Perspectives. Oxford: Oxford University Press.

Kaltenböck, Gunther, María José López-Couso and Belén Méndez-Naya. 2020. The dynamics of stance constructions. Language Sciences 82: 101330. https://doi.org/10.1016/j.langsci.2020.101330

Krippendorff, Klaus. 2011. Computing Krippendorff's Alpha-reliability. https://repository.upenn.edu/asc_papers/43. (24 November, 2022.)

Kucher, Kostiantyn, Andreas Kerren, Carita Paradis and Magnus Sahlgren. 2016. Visual analysis of text annotations for stance classification with ALVA. In Tobias Isenberg and Filip Sadlo eds. Proccedings of the Eurographics Conference on Vizualization, 49–51. http://dx.doi.org/10.2312/eurp.20161139

Küçük, Dilek and Fazli Can. 2020. Stance detection: A survey. ACM Computing Surveys 53/1: 1–37.

Landis, J. Richard and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33/1: 159–174.

Lombard, Matthew, Jennifer Snyder‐Duch and Cheryl Campanella Bracken. 2002. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research 28/4: 587–604.

Marín-Arrese, Juana I. 2017. Stancetaking and inter/subjectivity in journalistic discourse: The engagement system revisited. In Ruth Breeze and Inés Olza eds. Evaluation in Media Discourse: European Perspectives. Bern: Peter Lang, 21–48.

Marín-Arrese, Juana I., Marta Carretero, Jorge Arús Hita and Johan Van der Auwera eds. 2014. English Modality: Core, Periphery and Evidentiality. Berlin: Mouton de Gruyter.

McHugh, Mary L. 2012. Interrater reliability: The kappa statistic. Biochemia Medica 22/3: 276–282.

Mohammad, Saif, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu and Colin Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In Steven Bethard, Marine Carpuat, Daniel Cer, David Jurgens, Preslav Nakov and Tortsten Zesch eds. Proceedings of the 10th International Workshop on Semantic Evaluation. San Diego: Association for Computational Linguistics, 31–41. https://aclanthology.org/S16-1003/

Neuendorf, Kimberly. 2017. The Content Analysis Guidebook. Thousand Oaks: SAGE publications.

Pamungkas, Endang Wahyu, Valerio Basile and Viviana Patti. 2019. Stance classification for rumour analysis in Twitter: Exploiting affective information and conversation structure. arXiv preprint arXiv: 1901.01911. https://doi.org/10.48550/arXiv.1901.01911

Paradis, Carita. 2003. Between epistemic modality and degree: The case of really. Topics in English Linguistics 44: 191–222.

Perrin, Daniel. 2012. Stancing: Strategies of entextualizing stance in newswriting. Discourse, Context & Media 1/2–3: 135–147.

Poesio, Massimo. 2004. Discourse annotation and semantic annotation in the GNOME corpus. In Bonnie Webber and Donnna Byron eds. Proceedings of the Workshop on Discourse Annotation. Barcelona: Association for Computational Linguistics, 72–79.

Schmidt, Jan-Hinrik. 2014. Twitter and the rise of personal publics. In Katrin Weller, Alex Bruns, Jean Burgess, Merja Mahrt and Cornelius Puschmann eds. Twitter and Society. Bern: Peter Lang, 3¬–14.

Simaki, Vasiliki, Carita Paradis and Andreas Kerren. 2017a. Stance classification in texts from blogs on the 2016 British referendum. In Alexey Karpov, Rodmonga Potapova and Losif Mporas eds. Proceedings of the 19th Speech and Computer International Conference. Charm: Springer, 700–709.

Simaki, Vasiliki, Carita Paradis and Andreas Kerren. 2017b. Identifying the authors’ national variety of English in social media texts. In Ruslan Mitokov and Galia Angelova eds. Proceedings of the Recent Advances in Natural Language Processing Conference, 700–709. https://acl-bg.org/proceedings/2017/RANLP%202017/pdf/RANLP086.pdf

Simaki, Vasiliki, Carita Paradis and Andreas Kerren. 2018a. Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis. ICAME Journal 42: 133–165.

Simaki, Vasiliki, Carita Paradis and Andreas Kerren. 2018b. Detection of stance-related characteristics in social media text. In Nikos Fakotakis and Vasileios Megalooikonomou eds. Proceeding of the 10th Hellenic Conference on Artificial Intelligence. Patras: Association for Computing Machinery, 1–7. https://doi.org/10.1145/3200947.3201017

Simaki, Vasiliki, Carita Paradis and Andreas Kerren. 2019. A two-step procedure to identify lexical elements of stance constructions in discourse from political blogs. Corpora 14/3: 379–405.

Simaki, Vasiliki, Carita Paradis, Maria Skeppstedt, Magnus Sahlgren, Kostiantyn Kucher and Andreas Kerren. 2020. Annotating speaker stance in discourse: The Brexit Blog Corpus. Corpus Linguistics and Linguistic Theory 16/2: 215–248.

Taboada, Maite. 2016. Sentiment analysis: An overview from linguistics. Annual Review of Linguistics 2: 325–347.

Traugott, Elizabeth Closs. 2020. Expressions of stance-to-text: Discourse management markers as stance markers. Language Sciences 82: 101329. https://doi.org/10.1016/j.langsci.2020.101329

Verhagen, Arie. 2005. Constructions of Intersubjectivity: Discourse, Syntax, and Cognition. Oxford: Oxford University Press.

Yus, Francisco. 2011. Cyberpragmatics: Internet-mediated Communication in Context. Amsterdan: John Benjamins.

Yus, Francisco. 2016. Discourse, contextualization and identity shaping: The case of social networking sites and virtual worlds. In María Luisa Carrió-Pastor ed. Technology Implementation in Second Language Teaching and Translation Studies. Singapore: Springer, 71–88.

Zappavigna, Michele. 2012. Discourse of Twitter and Social Media: How we Use Language to Create Affiliation on the Web. London: A&C Black.

Zappavigna, Michele. 2015. Searchable talk: The linguistic functions of hashtags. Social Semiotics 25/3: 274–291.

Zappavigna, Michele and James R. Martin. 2018. # Communing affiliation: Social tagging as a resource for aligning around values in social media. Discourse, Context & Media 22: 4–12.

Zhu, Hongqiang. 2016. Searchable talk as discourse practice on the internet: The case of “# bindersfullofwomen.” Discourse, Context & Media 12: 87–98.

How to Cite
Simaki, V., Seitanidi, E., & Paradis, C. (2022). Evaluating stance annotation of Twitter data. Research in Corpus Linguistics, 11(1), 53-80. https://doi.org/10.32714/ricl.11.01.03