Go home now Header Background Image
Submission Procedure
share: |
Follow us
Volume 13 / Issue 10

available in:   PDF (175 kB) PS (182 kB)
Similar Docs BibTeX   Write a comment
Links into Future
DOI:   10.3217/jucs-013-10-1471


Machine Learning-Based Keywords Extraction for Scientific Literature

Chunguo Wu (Jilin University and Beijing Jiaotong University, China)

Maurizio Marchese (University of Trento, Italy)

Jingqing Jiang (Jilin University, China)

Alexander Ivanyukovich (University of Trento, Italy)

Yanchun Liang (Jilin University, China)

Abstract: With the currently growing interest in the Semantic Web, keywords/metadata extraction is coming to play an increasingly important role. Keywords extraction from documents is a complex task in natural languages processing. Ideally this task concerns sophisticated semantic analysis. However, the complexity of the problem makes current semantic analysis techniques insufficient. Machine learning methods can support the initial phases of keywords extraction and can thus improve the input to further semantic analysis phases. In this paper we propose a machine learning-based keywords extraction for given documents domain, namely scientific literature. More specifically, the least square support vector machine is used as a machine learning method. The proposed method takes the advantages of machine learning techniques and moves the complexity of the task to the process of learning from appropriate samples obtained within a domain. Preliminary experiments show that the proposed method is capable to extract keywords from the domain of scientific literature with promising results.

Keywords: keywords extraction, machine learning, metadata extraction, support vector machine

Categories: H.3.7, H.5.4, M.0, M.7, M.9