Using Conjunctions and Adverbs for Author Verification
Daniel Pavelec (Pontifícia Universidade Católica do Paraná, Brazil)
Luiz S. Oliveira (Pontifícia Universidade Católica do Paraná, Brazil)
Edson Justino (Pontifícia Universidade Católica do Paraná, Brazil)
Leonardo V. Batista (Federal University of Paraíba, Brazil)
Abstract: Linguistics and stylistics have been investigated for author identification for quite awhile, but recently, we have testified a impressive growth in the volume with which lawyers and courts have called upon the expertise of linguists in cases of disputed authorship. This motivatescomputer science researchers to look to the problem of author identification from a different perspective. In this work, we propose a stylometric feature set based on conjunctions and ad-verbs of the Portuguese language to address the problem of author identification. Two different approaches of classification were considered. The first one is called writer-independent and it re-duces the pattern recognition problem to a single model and two classes, hence, makes it possible to build robust system even when few genuine samples per writer are available. The second oneis called the personal model, or writer-dependent, which very often performs better but needs a bigger number of samples per writer. Experiments on a database composed of short articlesfrom 30 different authors and Support Vector Machine (SVM) as classifier demonstrate that the proposed strategy can produced results comparable to the literature.
Keywords: author verification, pattern recognition
Categories: H.3.7, H.5.4