Semantic Preprocessing of Web Request Streams for Web Usage Mining
Jason J. Jung (School of Computer and Information Engineering,
Inha University, Korea)
Abstract: Efficient data preparation needs to discover the underlying knowledge from complicated Web usage data. In this paper, we have focused on two main tasks, semantic outlier detection from online Web request streams and segmentation (or sessionization) of them. We thereby exploit semantic technologies to infer the relationships among Web requests. Web ontologies such as taxonomies and directories can label each Web request as all the corresponding hierarchical topic paths. Our algorithm consists of two steps. The first step is the nested repetition of top-down partitioning for establishing a set of candidates of session boundaries, and the next step is evaluation process of bottom-up merging for reconstructing segmented sequences. In addition, we propose the hybrid approach of this method, as combining with the existing heuristics. Using synthesized dataset and realworld dataset of the access log files of IRCache, we conducted experiments and showed that semantic preprocessing method improves the performance of rule discovery algorithms. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultaneously try to search various kinds of information on the Web.
Keywords: Web usage mining, browsing patterns, semantic analysis
Categories: H.3.3, I.5.3