Why Wikipedia articles vary in quality.

Most of the existing research on Wikipedia is at the aggregate level, looking at total number of edits for an article, for example, or how many unique contributors participated in its creation," said Ram, who is a McClelland Professor of MIS in the Eller College.

"What was missing was an explanation for why some articles are of high quality and others are not," she said. "We investigated the relationship between collaboration and data quality."

Wikipedia has an internal quality rating system for entries, with featured articles at the top, followed by A, B, and C-level entries. Ram and Liu randomly collected 400 articles at each quality level and applied a data provenance model they developed in an earlier paper.

"We used data mining techniques and identified various patterns of collaboration based on the provenance or, more specifically, who does what to Wikipedia articles," Ram says. "These collaboration patterns either help increase quality or are detrimental to data quality."

Ram and Liu identified seven specific roles that Wikipedia contributors play.

Starters, for example, create sentences but seldom engage in other actions. Content justifiers create sentences and justify them with resources and links. Copy editors contribute primarily though modifying existing sentences. Some users – the all-round contributors – perform many different functions.

"We then clustered the articles based on these roles and examined the collaboration patterns within each cluster to see what kind of quality resulted," Ram said. "We found that all-round contributors dominated the best-quality entries. In the entries with the lowest quality, starters and casual contributors dominated."

To generate the best-quality entries, she says, people in many different roles must collaborate. Ram and Liu suggest that the results of this study should spark the design of software tools that can help improve quality.

Abstract (p.175):

Data quality in the Wikipedia is debatable. On the one hand, existing research indicates that not only are people willing to contribute articles but the quality of those articles is close to that found in conventional encyclopedias. On the other hand, the public has never stopped criticizing the quality of Wikipedia articles, and critics never have trouble finding low quality Wikipedia articles. Why do Wikipedia articles vary widely in quality? We investigate the relationship between collaboration and data quality. We show that the quality of Wikipedia articles is not only dependent on the different types of contributors but also on how they collaborate. Based on an empirical study, we classify contributors based on their roles in editing individual Wikipedia articles. We identify various patterns of collaboration based on the provenance or, more specifically, who does what to Wikipedia articles. Our research helps identify collaboration patterns that are preferable or detrimental for data quality, thus providing insights for improving data quality in Wikipedia.