|  | Content-based Information Retrieval by Named Entity Recognition and Verb Semantic Role Labelling
               Betina Antony J (Anna University, India)
 
               G. Suryanarayanan Mahalakshmi (Anna University, India)
 
              Abstract: Tamil Siddha medicine, an ancient medicinal   system has yielded us a wide range of untapped information about   traditional medicines. In this paper, we explore into the various   Natural Language Processing techniques that can be implemented to   this syntactically rich corpus. As domain information mostly   concentrates on the central concepts, we start our work by   identifying the Named Entities and categorizing them. An integrated   NER classifier is built which comprises of SVM and Decision Tree   classifier with an accuracy as high as 95%. These entities play   different roles in different context. Hence their roles are labelled   along with the predicates surrounding them. These roles and   predicates give rise to a rule based sentence tagging system,   trained by an MEM model, to tag different contents in this otherwise   unstructured text. These two important techniques are then exploited   to develop our Information Retrieval System that combines the   methods category tagging done by Named Entity Recognition and   content tagging done by Semantic Role Labelling. The system takes   full advantage of the rich features of the language and hence can be   expanded to other domains. 
             
              Keywords: Tamil Siddha medicine, information retrieval, named entity recognition, semantic role labelling 
             Categories: H.3.1, H.3.3, I.2.7  |