Combining Psycho-linguistic, Content-based and Chat-based Features to Detect Predation in Chatrooms
Javier Parapar (University of A Coruña, Spain)
David E. Losada (Universidade de Santiago de Compostela, Spain)
Álvaro Barreiro (University of A Coruña, Spain)
Abstract: The Digital Age has brought great benefits for the human race but also some draw-backs. Nowadays, people from opposite corners of the World can communicate online via instant messaging services. Unfortunately, this has introduced new kinds of crime. Sexual predators haveadapted their predatory strategies to these platforms and, usually, the target victims are kids. The authorities cannot manually track all threats because massive amounts of online conversationstake place in a daily basis. Automatic methods for alerting about these crimes need to be designed. This is the main motivation of this paper, where we present a Machine Learning approachto identify suspicious subjects in chat-rooms. We propose novel types of features for representing the chatters and we evaluate different classifiers against the largest benchmark available.This empirical validation shows that our approach is promising for the identification of predatory behaviour. Furthermore, we carefully analyse the characteristics of the learnt classifiers. Thispreliminary analysis is a first step towards profiling the behaviour of the sexual predators when chatting on the Internet.
Keywords: cybercrime, machine learning, psycho-linguistic analysis, sexual predation, support vector machines, text mining
Categories: H.3.0, H.4