|  | Linking User Online Behavior across Domains with Internet Traffic
               Yuanyuan Qiao (Beijing University of Posts and Telecommunications, China)
 
               Yan Wu (Beijing University of Posts and Telecommunications, China)
 
               Yaobin He (China Electronics Technology Group Corp., China)
 
               Libo Hao (Beijing University of Posts and Telecommunications, China)
 
               Wenhui Lin (Aisino Corporation, China)
 
               Jie Yang (Beijing University of Posts and Telecommunications, China)
 
              Abstract: We are facing an era of Online With Offline   (OWO) in the smart city - almost everyone is using various online   services to connect friends, watch videos, listen to the music,   download resources, and so on. Our online behaviors are separated by   different domains, which may cause serious problem in the area of   cross-domain recommendation, advertising, and criminal tracking in   online and offline world, since it is a very challenging task to   link user online behaviors belonging to the same natural   person. Existing methods usually tackle user online behavior linkage   problem by estimating the profile content similarity between two   different online services. However, the profile contents in   heterogeneous online services are unreliable or misaligned, and the   proposed methods are always limited to several services in a   specific domain. In order to link individual's online behavior   across domains, in this paper, we propose user Online Behavior   Linkage across Domains (OBLD), a novel hybrid model, to link user   online behavior across domains with Internet traffic. It derives   several signifficant attributes from users' online behaviors, such   as user digital identity, various fingerprints of terminals and   browsers, spatio-temporal behavior of users, and leverages a   supervised classi_cation method to discover the relationship between   users' online behaviors. Also, the proposed model has unsupervised   setting for dataset with non or few label data if a certain   percentage of user digital identities can be extracted from original   dataset.  By using real-world network traffic collected from two   large provinces in China, we evaluate the OBLD model and the linkage   precision achieves 89% and 97.9% for two datasets   respectively. Especially, the inputs of OBLD, i.e., network traffic   flows, cover all online behavior of users who connect with Internet   through monitored networks, which makes it possible to link online   behaviors of users in whole online world. 
             
              Keywords: across domains, internet traffic, online behavior linkage, user digital identity, user identity linkage 
             Categories: L.7.0  |