Go home now Header Background Image
Search
Submission Procedure
share: |
 
Follow us
 
 
 
 
Volume 14 / Issue 4 / Abstract

available in:   PDF (149 kB) PS (147 kB)
 
get:  
Similar Docs BibTeX   Write a comment
  
get:  
Links into Future

Success Factors in a Weblog Community


Christian Safran
(Institute for Information Systems and Computer Media
Graz University of Technology, Austria
csafran@tugraz.at)

Frank Kappe
Institute for Information Systems and Computer Media
raz University of Technology, Austria
frank.kappe@iicm.edu)

Abstract: User generated content published via weblogs (also known as blogs) has gained importance in the last years, and the number of globally available weblogs increases. However, a large fraction of these show low publishing activity and are rarely read. This paper is a quantitative analysis of success factors in a community of over 15.000 weblogs, hosted by a local Austrian newspaper. We looked at publishing activity by content type, community activity and writing style. Also, the interconnectedness of the community was analyzed.

Keywords: User Generated Content, Weblog Analysis, E-Communities, Blogging

Categories: H.3.5, H.4.3, H.5.1, M.6

1 Introduction

In the age of the so-called Web 2.0, user generated content is credited increasing importance, as participation is one of the key characteristics of this concept [O'Reilly 05]. In the context of journalism the generation of news content by users, also called citizen journalism or grassroots journalism [Gillmor 04], is gradually complementing and, in some cases, even contesting information provided in a classic, publishing environment. Especially in the news domain, weblogs (also known as blogs) have become an important source of information beside newspaper websites. This paper will focus on such user generated content published via weblogs hosted by an Austrian newspaper.

A weblog is a “log of the web”, a term coined by Barger in 1997 [Paquet 03]. The entries in such a weblog, the so called posts are presented in reverse chronological order. Weblog software usually provides the possibility for comments and trackbacks. A trackback is a technology which allows weblogs to be automatically notified when other weblogs link to individual blogs and create backward links [SixApart 04]. In such a way a distributed, collective and interlinked blogosphere is created [Kumar 04].

The key question we asked ourselves in this context is: What makes a weblog successful? In the context of this research, success is measured by the number of visits to a weblog. For the use in this analysis, visits were counted as unique IP addresses, counting a new visit after 20 minutes of inactivity. So, the question can be reformulated as: what are the factors that lead to a high number of visits of the content created by the users?

Page 546

Another interesting aspect of user generated content is the community of the involved users. Questions in this context concern the structure of this community. Is the community divided into smaller communities or is there a central group of active users?

2 Related Work

Quite a lot of research on weblogs has been published in the recent years. Interesting in this context are publications on ranking weblogs like [Kritikopoulos 06]. The authors present a modification of the PageRank algorithm [Page 98] designed to take into account the links between the weblogs and the similarity of the users, as well as links to non-weblog URLs. The target is to rank weblogs by importance.

[Du 06] tries to answer the question for success factors of weblogs from a technology perspective. The study analyzed the impact of technology used on the success of 126 blogs taken from the top 100 listings of the Technorati1 website. In the case of Technorati, success is measured by the number of inbound links to a weblog.

[Choi 07] describes a model for evaluating similarities among a larger number of weblogs. A similarity matrix is computed by using Jaccard’s coefficient, an adjacency matrix, the inverse shortest path algorithm or the sum of inverse distances. This similarity matrix is used to categorize weblogs and find clusters of topically related weblogs.

As far as the more general research on weblogs and communities therein is concerned, [Zhoe 07] defines weblog communities based on social network theories as groups of vertices that have high density edges within them, with low density edges between the groups. Moreover a specialised web crawler is proposed to prepare data for community detection.

As far as the analysis of weblogs communities is concerned, [Cohen 06] counts connections by hyperlinks and relation by type or topic as possible relations forming communities of weblogs. This point of view focuses on the relations provided in the content of the weblog. A complementing approach, as presented in [Li 07] is to take into account the information available from the guest comments to the entries.

To the best of our knowledge, our work is the largest quantitative analysis of success factors for weblogs to date.

3 The Analysis

The “Meine Kleine” Weblogs2 of the local Austrian “Kleine Zeitung” newspaper offer a promising possibility to take a closer look at a large number of weblogs in a relatively closed environment. Users of this environment have the ability to publish text and images (photos) to their own weblogs. In addition, comments as guestbook entries can be written to the weblogs of other users.


1http://www.technorati.com
2http://www.meinekleine.at

Page 547

By November 21st 2006, the blogspace consisted of 15702 active blogs, ranging from topical weblogs by the newspaper’s editors to private diaries of individual readers. In our research, we had access to the servers log files as well as the database holding the weblog entries. The log files and entries from Oct 7th to Nov 21st 2006 (about 6 weeks) were analyzed in the course of this project.

3.1 Basic Statistics

The community of Kleine Zeitung readers consist of a large number of more than 118,000 registered users. Of those, 15702 have activated the weblog for their account and had at least one visit in the analyzed 6-week period. 560 of those bloggers were active publishers in this period, meaning they added content to their weblog in these 6 weeks. This, together with the fact that only the top 1730 weblogs had five or more visits in this period, resulted in our decision to take only the 2000 most visited weblogs into account for all further examinations (see Figure 1).

Figure 1: Distribution of visits of the top 2000 weblogs

Figure 2: Publishing Activity of the 20 most-visited weblog users

Page 548

The publishing activity of the examined users was quite incoherent. The number of (text) entries published ranged from 0 to 45 for the individual blogs. The most active poster of images published 469 images in the examined period. As far as the activity in the community is concerned, the most active users posted 80 comments and 137 guestbook entries in other users’ weblogs. Figure 2 shows the summary of publishing activity for the users with the 20 most-visited weblogs.

3.2 Activity of Authors vs. Readers by Time of Day

Beside the basic statistics of the weblogs, an investigation of the user activity in the course of a day has been carried out. This focussed on the access of readers to the weblogs as well as on authoring activity by the users.

Figure 3: Creation of Content and Comments in the course of a day

Figure 4: Views of different content types in the course of a day

Page 549

On the one hand the activity of the content authors, those users creating text entries or uploading photos, shows an almost even distribution throughout the day, with one peak at noon and one in the early evening. This is in contrast to our assumption that most users would be contributing content to their blogs in the evening and in a home environment. This assumption is true, however, for the posting of comments, which mainly occurs between 19:00 and 22:00.

Figure 3 shows the significantly different graphs for publishing own content versus activity in the community (i.e. writing comments and guestbook entries in other users’ weblogs).

On the other hand, the activity of the visitors of the blogs in the course of a day was also investigated and viewed separately for the different types of content. As with the authoring activity, reading is almost evenly distributed from 7:00 to 23:00. Peak usage is from 19:00 to 22:00, which corresponds to the peak in commenting. The visits of guestbook entries vary from this general observation, as they have a peak in the early afternoon and none in the evening (see Figure 4).

4 Influences on Popularity

The main focus of the research on the “Kleine Zeitung Digital” blogspace was to find and verify factors for the success of weblogs. We decided to analyze nine possible criteria, arranged into three groups. These criteria were chosen based on the available data from the server logs and after intense discussion with those editors responsible for the blogspace of “Kleine Zeitung Digital” concerning their experience and assumptions. Together with these editors nine hypotheses were formulated. These hypotheses were tested by calculating the Pearson’s correlation coefficients for the variables concerned and verifying the significance with a Student’s t-test at a confidence level of 0.99.
 
H1 Authors who write more posts attract more visitors.
H2 Authors who provide more images attract more visitors.
H3 Authors who provide new content more often attract more visitors.
H4 Authors who actively comment other weblogs attract more visitors. 
H5 More frequently visited authors receive more comments. 
H6 Authors who actively post in other guestbooks attract more visitors.
H7 More frequently visited authors receive more guestbook entries.
H8 Authors whose postings are similar to the writing style of newspaper editors attract more visitors. 
H9 The posts of frequently visited authors are similar in writing style.

Table 1: Hypotheses

4.1 Influence of Content Types provided

In a first step the different types of content composing the weblog were analyzed on their impact on the popularity. The visits of the individual blogs were compared to the number of textual entries (H1), images (H2) and the days the user was active (H3) in the period investigated.

Page 550

Textual entries are all posts in the weblog, regardless of the fact whether images were also included or not. Images are access to all image files and photos in the weblog. This results in the fact that an entry with photos is counted as textual entry as well as image. Figure 5 depicts the activity for the 20 top-scoring weblogs in these categories.

The value of active days is meant to give an overview of the activity of the author within the six week period of data collection. It encompasses all days the user has actively contributed to his weblog.

The highest correlation of content type to popularity could be found with the user’s active days (H3), being 0.68. The correlation to the entries (H1) and images (H2) were lower, at 0.60 and 0.53, respectively.

Figure 5: Number of posts and images provided by 20 most visited bloggers

4.2 Influence of Community Activity

Secondly, the influence of community activity was investigated. We decided to analyze the correlation of comment and guestbook activity, outbound as well as inbound. Trackbacks were not considered due to the fact that this technology is not available for “Meine Kleine” weblogs. Figure 6 depicts the activity for the 20 top-scoring weblogs in these categories.

The highest value was found for the obvious correlation of received comments and visits to the weblog with 0.77 (H5). The number of guestbook entries received correlates with 0.69 (H7). For own comments (H4) and guestbook entries in other blogs (H6) the correlations are 0.70 and 0.68, respectively (see Table 2).

Page 551

Figure 6: Comments and guestbook entries given and received by 20 top visited bloggers

It should be noted that these correlations coefficients are higher than those of publishing content. In other words, in order to have a highly visible weblog, it is even more important to be active in the community than to publish own content regularly! This is true for the individual correlations as well as for the summary of content provided respectively own community activity. There is a total correlation of 0.61 of content provided to the number of visits, while the correlation of community activity to number of visits is 0.71.

4.3 Influences of Writing Style

A further aspect of the research investigated upon the contents of the Kleine Zeitung blogspace is the influence of the author’s writing style. For this purpose, the similarity computation of the Autonomy Search Engine, which is based on Bayesian Inference [Autonomy 07], was used.

Individual weblog entries of the top 2000 blogs were compared to the editorial blogs of the Kleine Zeitung (written by professional journalists) and to the top 5 blogs. This first approach led to no significant results.

In a next step we compared the similarity of a summary document containing all weblog entries of one user with the top scoring editorial weblog and the top scoring user weblog. With these conditions we found a correlation of 0.38 for the similarity to the editorial weblog (H8) and a correlation of 0.25 for the similarity to the top scoring user weblog (H9). Both correlations are lower than those for the content type and user activity. Figure 7 shows the similarity to the top scoring editorial weblog and the top scoring user weblog for the 20 most visited weblogs.

Figure 7: Similarity of 20 Top-Ranked Weblogs to Editorial and Top-Ranked Weblog

Page 552

4.4 Conclusions

All of the success factors investigated have a positive correlation with the number of visits, in each case statistically significant with a degree of freedom of 1998 (2000 samples) and a confidence interval of 0.99. This means all nine hypotheses can be confirmed. The highest correlations can be found for the activity in the blogging community. Table 2 illustrates the correlations of the individual factors.
 
 
number of visits correlated to hypotheses correlation t-test confidence interval
active days H1 0.68 41.46 0.99
# of posts written H2 0.60 33.52 0.99
# of images uploaded H3 0.53 27.94 0.99
# of comments given H4 0.70 43.81 0.99
# of comments received H5 0.77 53.94 0.99
# of guestbook entries given H6 0.68 41.46 0.99
# of guestbook entries received H7 0.69 42.61 0.99
similarity to top rated user H8 0.25 11.54 0.99
similarity to editorial blogs H9 0.38 18.36 0.99

Table 2: Summary of correlations 5 Communities in “Meine Kleine” Weblogs

As the previous chapter showed, the activity within the community is a crucial factor for the success of a weblog. In a final step of our research, we tried to visualize the communication between the community members by comments and guestbook entries. We used the prefuse3 framework to visualize a GraphML representation [Brandes 02] of users and their connections.

In the resulting graphs, the members of the community are depicted as nodes and links resulting from comments or guestbook entries as edges. A Fruchterman-Reingold force directed placement algorithm [Fruchterman 91] was chosen for the graph for a comprehensive visualisation of the interconnectedness of the community. An initial version of the community graph containing all communications turned out to be too complex.


3http://prefuse.org/

Page 553

Figure 8: Force Directed Placement of the 20 most-visited Users and their Contacts4

In order to produce clearly arranged graphs, the number of community members taken into account was reduced. As the twenty users with the highest numbers of visits were responsible for 91% of the community activity, only these twenty and the corresponding conversational partners were taken into account.

In the resulting Figure 8 those nodes that represent the 20 most-visited weblogs are presented in darker shade than the others. Four of these top-twenty users have did not give any comments or create guestbook entries in the analyzed period and were thus removed from the graph. The remaining 16 top-scorers form a tight network with several communities unique to individual users.

As a next step to clarify the social network of the Kleine Zeitung blogspace, only those edges were taken into account that represented three or more communication activities. The resulting graph shows a tight network of eleven of the top score users, while the remaining 9 have no connections to the graph (Figure 9). Four of these eleven users also build their own sub graphs of community activities.


4For privacy reasons the user names have been blackened in the community graphs.

Page 554

Figure 9: The Community of the most-visited users

6 Conclusions

The analysis on the blogspace of the Kleine Zeitung online community was conducted to find key factors for success of weblogs. The factors activity, number of textual entries, number of images, comments given, comments received, guestbook entries given and guestbook entries received were analyzed in this context. The comparison of the influence of these nine factors showed that the most important of these factors are the community activities of the authors, i.e. writing comments and guestbook entries in other blogs.

Acknowledgements

Financial support of this research by Styria Media AG, in conjunction with the Endowment Professorship for Innovative Media Technology at Graz University of Technology, is gratefully acknowledged.

References

[Autonomy 07] Autonomy Inc., Autonomy Technical Overview, 2007, http://www.autonomy.com/content/Technology/index.en.html.

Page 555

[Brandes 02] Brandes, U., Eiglsperger, M., Herman, I., Himsolt, M. and Marshall, M.S., GraphML Progress Report: Structural Layer Proposal. Proc. 9th Intl. Symp. Graph Drawing (GD '01), LNCS 2265,  Springer-Verlag, 2002, 501-51.

[Choi 07] Choi, H. and  Krishnamoorthy, M., Categorization of Blogs through Similarity Analysis, In: Intelligence and Security Informatics, IEEE, 2007, 160-165.

[Cohen 06] Cohen, E. and Krishnamurthy, B., A short walk in the Blogistan. Comput. Networks 50, 5, Apr. 2006, 615-630.

[Du 2006] Du, H. S. and Wagner, C., Weblog success: Exploring the role of technology. Int. J. Hum.-Comput. Stud. 64, 9, Sep. 2006, 789-798.

[Fruchterman 91] Fruchterman, T. M. and Reingold, E. M. 1991. Graph drawing by force-directed placement. Softw. Pract. Exper. 21, 11, Nov. 1991, 1129-1164.

[Gillmor 04] Gillmor, D., We the Media. Grassroots Journalism by the People, for the People, O'Reilly Media, ISBN 0596007337, 2004.

[Kritikopoulos 06] Kritikopoulos, A., Sideri, M., and Varlamis, I., BlogRank: ranking weblogs based on connectivity and similarity features. In Proceedings of the 2nd international Workshop on Advanced Architectures and Algorithms For internet Delivery and Applications (Pisa, Italy, October 10 - 10, 2006).

[Kumar 04] Kumar, R., Novak, J., Raghavan, P., and Tomkins, A., Structure and evolution of blogspace. Commun. ACM 47, 12, Dec. 2004, 35-39.

[Li 07] Li, B., Xu, S., and Zhang, J. 2007. Enhancing clustering weblog documents by utilizing author/reader comments. In Proceedings of the 45th Annual Southeast Regional Conference ,Winston-Salem, North Carolina, March 23 - 24, 2007.

[O'Reilly 05] O'Reilly, T., What Is Web 2.0 - Design Patterns and Business Models for the Next Generation of Software, 30.09.2005, http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html.

[Page 98] Page, L., Brin, S., Motwani, R. & Winograd, T., The pagerank citation ranking: Bringing order to the web. Technical report, Stanford, USA. 1998.

[Paquet 03] Paquet, S., Personal knowledge publishing and its uses in research. Knowledge Board, 10., 2003, http://www.knowledgeboard.com/item/253.

[SixApart 04] Six Apart, TrackBack Technical Specification. Developer Documentation, Version 1.2, 2004, http://www.sixapart.com/pronet/docs/trackback_spec.

[Zhou 07] Zhou, Y. and Davis, J., Discovering Web Communities in the Blogspace, In Proceedings of the 40th Annual Hawaii international Conference on System Sciences (January 03 - 06, 2007). HICSS. IEEE Computer Society, 2007.

Page 556