| 
          
            Text Analysis for Monitoring Personal Information Leakage on Twitter
            
            
               Dongjin Choi (Chosun University, Republic of Korea)  
              
             
            
            
               Jeongin Kim (Chosun University, Republic of Korea)  
              
             
            
            
               Xeufeng Piao (School of Computer Science and Technology, China)  
              
             
            
            
               Pankoo Kim (Chosun University, Republic of Korea)  
              
             
                    
            
              Abstract: Social networking services (SNSs) such as   Twitter and Facebook can be considered as new forms of   media. Information spreads much faster through social media than any   other forms of traditional news media because people can upload   information with no time and location constraints. For this reason,   people have embraced SNSs and allowed them to become an integral   part of their everyday lives. People express their emotional status   to let others know how they feel about certain information or   events. However, they are likely not only to share information with   others but also to unintentionally expose personal information such   as their place of residence, phone number, and date of birth. If   such information is provided to users with inappropriate intentions,   there may be serious consequences such as online and offline   stalking. To prevent information leakages and detect spam, many   researchers have monitored e-mail systems and web blogs. This paper   considers text messages on Twitter, which is one of the most popular   SNSs in the world, to reveal various hidden patterns by using   several coefficient approaches. This paper focuses on users who   exchange Tweets and examines the types of information that they   reciprocate other's Tweets by monitoring samples of 50 million   Tweets which were collected by Stanford University in November   2009. We chose an active Twitter user based on "happy birthday" rule   and detecting their information related to place to live and   personal names by using proposed coefficient method and compared   with other coefficient approaches. As a result of this research, we   can conclude that the proposed coefficient method is able to detect   and recommend the standard English words for non-standard words in   few conditions. Eventually, we detected 88,882 (24.287%) more name   included Tweets and 14,054 (3.84%) location related Tweets compared   by using only standard word matching method. 
             
            
              Keywords: Personal Identifiable Information, Twitter, personal information leakage, social network services, text analysis 
             
            Categories: H.3.1, H.3.2, H.3.3, H.3.7, H.5.1  
           |