(A. Krouska, C. Troussas, M. Virvou) Comparative Evaluation of Algorithms for Sentiment Analysis overSocial Networking Services

Abstract: Twitter is a highly popular social networking service and a web-based communication platform with million users exchanging daily public messages, namely tweets, expressing their opinion and feelings towards various issues. Twitter represents one of the largest and most dynamic datasets for data mining and sentiment analysis. Therefore, Twitter Sentiment Analysis constitutes a prominent and an active research area with significant applications in industry and academia. The purpose of this paper is to provide a guideline for the decision of optimal algorithms for sentiment analysis services. In this context, five well-known learning-based classifiers (Naive Bayes, Support Vector Machine, k-Nearest Neighbor, Logistic Regression and C4.5) and a lexicon-based approach (SentiStrength) have been evaluated based on confusion matrices, using three different datasets (OMD, HCR and STS-Gold) and two test models (percentage split and cross validation). The results demonstrate the superiority of Naive Bayes and Support Vector Machine regardless of datasets and test methods.

Keywords: Sentiment analysis, Twitter, learning machines, lexicon-based classification, polarity detection, social networking services

Categories: H.3, H.3.5, H.4.3, I.7, J.4, M.0