Weaving Scholarly Legacy Data into Web of Data
Atif Latif (Leibniz Information Center for Economics, Germany)
Muhammad Tanvir Afzal (Mohammad Ali Jinnah University, Pakistan)
Hermann Maurer (Graz University of Technology, Austria)
Abstract: The Linked Open Data project provides a new publishing paradigm for creating machine readable and structured data on the Web. Currently, the significant presence of data sets describing scholarly publications in the Linked Data cloud underpins the importance of Linked Data for the scientific community and for the open access movement. However, these semantically rich datasets need to be exploited and linked with real time applications. In the project we report on this. We have exploited numerous scholarly datasets and have created semantic links to papers in an online journal, particularly Journal of Universal Computer Science (J.UCS). The J. UCS plays an important part in the computer science publishing community and provides a number of innovative features and datasets to its web users. However, the legacy HTML format in which these features are made available makes it difficult for machines to understand and query. Keeping in mind the impressive benefits of the Linked Open Data project, this paper presents an approach to convert J.UCS legacy HTML data from its current form to machine understandable format (RDF). It also interlinks this data with other important Linked Data resources. The approach developed has successfully disambiguated and interlinked J.UCS authors and publications datasets with DBpedia, DBLP, CiteULike and faceted DBLP. Additionally, triplified and interlinked datasets are made available to the scientific and semantic web community for downloading and posing SPARQL queries. This semantically linked dataset can further be used by researchers and semantic agents to identify semantic associations, to build inferencing systems, and to extract useful knowledge.
Categories: H.3.3, L.1.4, M.0