(C. Wu, M. Marchese, J. Jiang, A. Ivanyukovich, Y. Liang) Machine Learning-Based Keywords Extraction forScientific Literature

Abstract: With the currently growing interest in the Semantic Web, keywords/metadata extraction is coming to play an increasingly important role. Keywords extraction from documents is a complex task in natural languages processing. Ideally this task concerns sophisticated semantic analysis. However, the complexity of the problem makes current semantic analysis techniques insufficient. Machine learning methods can support the initial phases of keywords extraction and can thus improve the input to further semantic analysis phases. In this paper we propose a machine learning-based keywords extraction for given documents domain, namely scientific literature. More specifically, the least square support vector machine is used as a machine learning method. The proposed method takes the advantages of machine learning techniques and moves the complexity of the task to the process of learning from appropriate samples obtained within a domain. Preliminary experiments show that the proposed method is capable to extract keywords from the domain of scientific literature with promising results.

Keywords: keywords extraction, machine learning, metadata extraction, support vector machine

Categories: H.3.7, H.5.4, M.0, M.7, M.9