Reddit comment analysis: sentiment prediction and topic modeling using VADER and BERTopic
DOI:
https://doi.org/10.51359/2965-4661.2024.265074Keywords:
Sentiment Analysis, text mining, Exploratory Data Analysis, Reddit, topic modellingAbstract
This work aims at exploring data analysis techniques applied to the social media platform Reddit, highlighting the execution of an Exploratory Data Analysis (EDA) to identify trends and patterns of interaction among users. For sentiment analysis of the comments, the VADER model ("Valence Aware Dictionary and Sentiment Reasoner") is used, and topic modeling is performed with BERTopic ("Bidirectional Encoder Representations from Transformers for Topic Modeling"). The goal is to compare the accuracy and effectiveness of the models in classifying emotions and themes expressed in the comments. The comparison of the models allows identifying which approach yields the most accurate results, which is aligned with the context of discussions on Reddit, providing valuable insights into user behavior and preferences.
References
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Giachanou, A., & Crestani, F. (2016). Like it or not: A survey of Twitter sentiment analysis methods. ACM Computing Surveys, 49(2), 1-41.
Hutto, C. J., & Gilbert, E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1).
Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the omg! Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media.
Downloads
Additional Files
Published
Issue
Section
License
Copyright (c) 2024 Denilson de Oliveira Silva, Richard Matheus Avelino da Silva, Patrícia Virgínia de Santana Lima, Jéssica Cristina Pereira Batista, Sílvio Fernando Alves Xavier Júnior

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with Socioeconomic Analytics retain the copyright of their work and agree to license it under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. This means that the work can be shared, copied, and redistributed in any medium or format, as long as it is not used for commercial purposes, and the original work is properly cited. The work cannot be changed in any way or used to create derivative works.