Commenting on local politics: An analysis of YouTube video comments for local government videos

Authors

DOI:

https://doi.org/10.32714/ricl.13.01.02

Keywords:

YouTube, comments, ASR transcripts, local government, transformer models, topic modeling, sentiment analysis

Abstract

This study compares the content of transcripts of videos uploaded by local governments with the comments on those videos, utilizing three transformer-model-based techniques: summarization of the discourse content of video transcripts, topic modeling of summarized transcripts, and sentiment analysis of transcripts and of comments. The analysis shows that some types of video content, for example those dealing with music or education, are more likely to attract positive comments than content related to policing or government meetings. In addition to their potential relevance for local government outreach, the study may represent a viable exploratory method for comparison of online video content and written comments in the context of computational social science analyses of user interaction and commenting behavior.

Downloads

Download data is not yet available.

References

Agarwal, Sumeet, Shantanu Godbole, Diwakar Punjani and Shourya Roy. 2007. How much noise is too much: A study in automatic text classification. In Naren Ramakrishnan, Osmar R. Zaïane, Yong Shi, Christopher W. Clifton and Xindong Wu eds. In Naren Ramakrishnan, Osmar R. Zaïane, Yong Shi, Christopher W. Clifton and Xindong Wu eds. Proceedings of the Seventh IEEE International Conference on Data Mining. Los Alamitos; IEEE Computer Society. https://doi.org/10.1109/ICDM.2007.21

Andersson, Marta. 2021. The climate of climate change: Impoliteness as a hallmark of homophily in YouTube comment threads on Greta Thunberg’s environmental activism. Journal of Pragmatics 178: 93–107.

Bird, Steven, Edward Loper and Ewan Klein. 2009. Natural Language Processing with Python. Beijing: O’Reilly Media.

Blei, David M., Andrew Y. Ng and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3: 993–1022.

Bou-Franch, Patricia, Nuria Lorenzo-Dus and Pilar Garcés-Conejos Blitvich. 2012. Social interaction in YouTube text-based polylogues: A study of coherence. Journal of Computer-mediated Communication 17: 501–521.

Bouman, Egbert. 2022. YouTube-Comment-Downloader. https://github.com/egbertbouman/YouTube-comment-downloader

Camacho-Collados, Jose, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa Anke, Fangyu Liu and Eugenio Martínez Cámara. 2022. TweetNLP: Cutting-edge natural language processing for social media. In Wanxiang Che and Ekaterina Shutova eds. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Abu Dhabi: Association for Computational Linguistics, 38–49.

Coats, Steven. 2023. Dialect corpora from YouTube. In Beatrix Busse, Nina Dumrukcic and Ingo Kleiber eds. Language and Linguistics in a Complex World. Berlin: De Gruyter, 79–102.

Coats, Steven. 2024. Noisy data: Using automatic speech recognition transcripts for linguistic research. In Steven Coats and Veronika Laippala eds. Linguistics Across Disciplinary Borders: The March of Data. London: Bloomsbury Academic, 17–39.

Coats, Steven and Veronika Laippala eds. 2024. Linguistics across Disciplinary Borders: The March of Data. London: Bloomsbury Academic.

Cotgrove, Louis A. 2022. #GlockeAktiv: A Corpus Linguistic Study of German Youth Language on YouTube. Nottingham: University of Nottingham dissertation.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran and Thamar Solorio eds. Proceedings of 2019 Conference of the North American Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 4171–4186.

Dodds, Peter Sheridan, Eric M. Clark, Suma Desu and Christopher M. Danforth. 2015. Human language reveals a universal positivity bias. PNAS 112/8: 2389–2394.

Dynel, Marta. 2014. Participation framework underlying YouTube interaction. Journal of Pragmatics 73: 37–52.

Gaventa, John and Gregory Barrett. 2012. Mapping the outcomes of citizen engagement. World Development 40: 2399–2410.

Goode, Luke, Alexis McCullough and Gelise O’Hare. 2011. Unruly publics and the fourth estate on YouTube. Participations: Journal of Audience and Reception Studies 8/2: 594–615.

Grieve, Jack, Dirk Hovy, David Jurgens, Tyler S. Kendall, Dong Nguyen, James N. Stanford and Meghan Sumner eds. 2023. Computational Sociolinguistics. Lausanne: Frontiers Media. https://doi.org/10.3389/978-2-8325-1760-4

Grootendorst, Maarten. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv: 2203.05794 [cs.CL]. https://doi.org/10.48550/arXiv.2203.05794

Häring, Mario, Wiebke Loosen and Walid Maalej. 2018. Who is addressed in this comment? Automatically classifying meta-comments in news comments. In Karrie Karahalios, Andrés Monroy-Hernández, Airi Lampinen and Geraldine Firzpatrick eds. Proceedings of the ACM on Human-Computer Interaction. New York: Association for Computing Machinery, 1–20.

Herring, Susan and Ashley R. Dainas. 2017. “Nice picture comment!” Graphicons in Facebook comment threads. In Tung X. Bui and Ralph Jr. Sprague eds. Proceedings of the 50th Hawaii International Conference on System Sciences. Hawai: University of Hawaii at Manoa, 2185–2194.

Herring, Susan and Seung Woo Chae. 2021. Prompt-rich CMC on YouTube: To what or to whom do comments respond? In Dan Suthers and Ravi Vatrapu eds. Proceedings of the 54th Hawaii International Conference on System Sciences. Hawai: University of Hawaii at Manoa, 2906–2915.

Honnibal, Matthew, Ines Montani, Sofie Van Landeghem and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. https://doi.org/10.5281/zenodo.1212303

Khan, M. Laeeq. 2017. Social media engagement: What motivates user participation and consumption on YouTube? Computers in Human Behavior 66: 236–247.

Krohn, Rachel and Tim Weninger. 2019. Modeling online comment threads from their start. arXiv: 1910.08575v1 [cs.SI]. https://doi.org/10.48550/arXiv.1910.08575

Ksiazek, Thomas B. 2018. Commenting on the news. Journalism Studies 19/5: 650–673.

Ksiazek, Thomas B., Limor Peer and Kevin Lessard. 2016. User engagement with online news: Conceptualizing interactivity and exploring the relationship between online news videos and user comments. New Media & Society 18/3: 502–520.

Lehti, Lotta, Johanna Isosävi, Veronika Laippala and Matti Luotolahti. 2016. Linguistic analysis of online conflicts: A case study of flaming in the Smokahontas comment thread on YouTube. Wider Screen 19. http://widerscreen.fi/numerot/2016-1-2/linguistic-anaead-on-YouTube/

Lewis, Mike, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv: 1910.13461 [cs.CL]. https://doi.org/10.48550/arXiv.1910.13461

Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv: 1907.11692 [cs.CL]. https://doi.org/10.48550/arXiv.1907.11692

Loureiro, Daniel, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke and Jose Camacho-Collados. 2022. TimeLMs: Diachronic language models from Twitter. arXiv: 2202.03829v2 [cs.CL]. https://doi.org/10.48550/arXiv.2202.03829

Markl, Nina and Catherine Lai. 2021. Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation. In Su Lin Blodgett, Michael Madaio, Brendan O’Connor, Hanna Wallach and Qian Yang eds. Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing . Association for Computational Linguistics, 34–40. https://aclanthology.org/2021.hcinlp-1.6

Meyer, Josh, Lindy Rauchenstein, Joshua D. Eisenberg and Nicholas Howell. 2020. Artie bias corpus: An open dataset for detecting demographic bias in speech applications. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk and Stelios Piperidis eds. Proceedings of the 12th Language Resources and Evaluation Conference. Marseille: European Language Resources Association, 6462–6468.

Nycyk, Michael. 2012. Tensions in Enforcing YouTube Community Guidelines: The Challenge of Regulating Users’ Flaming Comments. Perth, Australia: Curtin University of Technology dissertation.

Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12: 2825–2830.

Schmid, Hans-Jörg. 2020. The Dynamics of the Linguistic System: Usage, Conventionalization, and Entrenchment. Oxford: Oxford University Press.

Schmid, Phillip. 2023. Distilbart-cnn-12-6-samsum. https://huggingface.co/philschmid/distilbart-cnn-12-6-samsum

Schultes, Peter, Verena Dorner and Franz Lehner. 2013. Leave a comment! An in-depth analysis of user comments on YouTube. In Rainer Alt and Bogdan Franczyk eds. Proceedings of the 11th International Conference on Wirtschaftsinformatik. Leipzig: University of Leipzig, 659–673.

Siersdorfer, Stefan, Sergiu Chelaru, Jose San Pedro, Ismail Sengor Altingovde and Wolfgang Nejdl. 2014. Analyzing and mining comments and comment ratings on the social web. ACM Transactions on the Web 8/3: 1–39

Spencer, Herbert. 2015 [1854]. The origin and function of music. In John Shepherd and Kyle Devine eds. The Routledge Reader on the Sociology of Music. London: Routledge, 27–34.

Tatman, Rachel. 2017. Gender and dialect bias in YouTube’s automatic captions. In Dirk Hovy, Shannon Spruit, Margaret Mitchell, Emily M. Bender, Michael Strube and Hanna Wallach eds. Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Valencia: Association for Computational Linguistics, 53–59.

Tausczik, Yla R. and James W. Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29/1: 24–54.

Wang, Wenhui, Furu Wei, Li Dong, Hangbo Bao, Nan Yang and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems 33: 5776–5788.

Downloads

Published

2024-09-10

How to Cite

Coats, S. (2024). Commenting on local politics: An analysis of YouTube video comments for local government videos. Research in Corpus Linguistics, 13(1), 1–25. https://doi.org/10.32714/ricl.13.01.02