Content-based Information Retrieval by Named Entity Recognition and Verb Semantic Role Labelling
Betina Antony J (Anna University, India)
G. Suryanarayanan Mahalakshmi (Anna University, India)
Abstract: Tamil Siddha medicine, an ancient medicinal system has yielded us a wide range of untapped information about traditional medicines. In this paper, we explore into the various Natural Language Processing techniques that can be implemented to this syntactically rich corpus. As domain information mostly concentrates on the central concepts, we start our work by identifying the Named Entities and categorizing them. An integrated NER classifier is built which comprises of SVM and Decision Tree classifier with an accuracy as high as 95%. These entities play different roles in different context. Hence their roles are labelled along with the predicates surrounding them. These roles and predicates give rise to a rule based sentence tagging system, trained by an MEM model, to tag different contents in this otherwise unstructured text. These two important techniques are then exploited to develop our Information Retrieval System that combines the methods category tagging done by Named Entity Recognition and content tagging done by Semantic Role Labelling. The system takes full advantage of the rich features of the language and hence can be expanded to other domains.
Keywords: Tamil Siddha medicine, information retrieval, named entity recognition, semantic role labelling
Categories: H.3.1, H.3.3, I.2.7