Synopsis

Because Wikipedia is a process, not a product, it replaces guarantees
offered by institutions with probabilities supported by process: if enough
 people care enough about an article to read it, then enough people will care
 enough to improve it, and over time this will lead to a large enough body
 of good enough work to begin to take both availability and quality of articles
 for granted, and to integrate Wikipedia into daily use by millions.
 
Clay Shirky (2008), Here Comes Everybody


This study intends to contribute to a better understanding of the Wiki phenomenon as a knowledge management system which aggregates private knowledge and check to what extent information generated through anonymous and freely bestowed mass collaboration is reliable as opposed to the traditional approach. To achieve that goal, we develop a comparative study between Wikipedia and Britannica Encyclopedias, in order to confront the quality of the knowledge repository produced by them. That will allow us to reach a conclusion about the efficacy of the business models behind them.
So, we intend to find out which of the scenarios represented above is the most accurate to describe mass collaboration: the infinite monkeys theorem[1] used by Keen in The Cult of the Amateur or the ode to the “power of the masses" of  Tapscott & Williams in Wikinomics.
 We used a representative random sample[2] which is composed by the articles that are comprised in both encyclopedias[3]. Each pair of articles was previously reformatted to hide its source and then graded by an expert in its subject area using a five-point scale. We asked experts to concentrate only on some[4] intrinsic aspects of the articles’ quality, namely accuracy and objectivity, and discard the contextual, representational and accessibility aspects. Whenever possible, the experts invited to participate in the study are University teachers, because they are used to grading students’ work not using the reputation of the source.
The articles have been divided into four main categories: Arts & Entertainment, History & Society, Science & Technology and Travel & Geography[5]. Each main category has further been subdivided in order to find the most suitable expert to evaluate it. The average results obtained are presented below:

To calculate the impact of having only one evaluator per article - imposed by the sample size and articles’ length - we collected a small[6] convenience sample which only integrates Management articles. Each pair of articles was graded by several[7] experts in order to determine the uncertainty associated with having diverse gradings of the same article. The uncertainty indicators obtained are α = 0,9 and MAD = 0,6.
In order to further normalize the results we transformed the {1, 2, 3, 4, 5} scale used by each evaluator in a nine-point scale {-4,-3,-2,-1, 0, 1, 2, 3, 4}, subtracting in each article the grade of the Wikipedia from the correspondent Britannica pair. This step allows us to concentrate on the difference in quality and mitigate the noise induced by eventual differences in interpretation of the absolute scale among the evaluators.
To deal with the lack of Geography grades, we find a significant nonlinear correlation[8] between the average difference in the number of words per article (grouped by grades difference) and the difference in grades, shown in the following chart:
Using that relation, we managed to predict that the global average grade difference of the Geography articles will be 1.7. The function cannot be used to predict individual differences for a particular pair of articles, but in terms of global result, and despite all the assumptions and weakness of this approach, it is one indicator that a future assessment of these articles may lead to a global value of the same order of magnitude to those obtained from the assessed areas.
In global terms, and setting aside the Geography articles, the conclusion was that the average quality of the Wikipedia articles which were analyzed was superior to its peer’s and that this difference was statistically significant. The graphic below shows that 90% of the Wikipedia articles have been considered as having equivalent or better quality than their Britannica counterparts.

The global value of the difference among the pairs of articles assessed has an average of 1,4 ± 0,9 (average absolute deviation) and a median of 1,0 ± 0,6 (median absolute deviation). Those uncertainty values have been calculated using the uncertainty of the management articles pairs’ grades, assessed by multiple evaluators as mentioned above.
In parallel with this study, a survey[9] answered by university professors, was used to characterize the universe of evaluators, showed that traditional information sources were used only by a few (25%) as the first approach to seeking information. Nevertheless, this inquiry also made clear that reliance on these sources was considerably larger[10] than reliance on information obtained through Wikipedia or other nontraditional sources.
This quality perception, as well as the diametrically opposed results of its evaluation through the blind test assessment, reinforces the evaluating panel’s exemption. The following graph will show the reliance on information obtained through alternative sources when compared to classical ones:
However much the chosen sample is representative of the universe to be studied, results have depended on the evaluators’ personal opinion and chosen criteria. This means that the reproducibility of this study’s conclusions using a different grading panel cannot be guaranteed. Nevertheless, this is not enough of a reason to reject the study results obtained through more than five hundred evaluations.
One explanation for the success of Wikipedia can be found in the altruism of individuals who come together around themes they share the same passion for, and it is expected they have remarkable knowledge on these. In other words, in the case studied, mass collaboration seems self-organized, leading to an organization of self-assessment and self-correction among peers which produces impressive results, against all odds.


 What if information skills become a mass amateur activity? 




[1] A monkey hitting keys at random on a typewriter, for an infinite amount of time, will almost surely type the complete works of Shakespeare.
[2] 245 pairs (490 articles) confidence level (α) 95%, margin of error (d) 5%, proportion (p)  
[3] Because Britannica is much smaller in size (≈25 smaller), 6.382 articles are randomly drawn from Wikipedia, in order to achieve 245 pairs of articles present in both encyclopedias.
[4] Believability and reputation are obviously not available in a blind test.
[5] Unfortunately, it was impossible to find experts available to grade this category (only 5% of the articles are graded), so the conclusions are about all the fields of knowledge except Geography.
[6] 4 pairs (8 articles)
[7] 12 Experts per pair of articles.
[8] (ρ=0.94)
[9] 63 answers (10% Response Rate)
[10] 95% of the respondents are suspicious or indifferent about the quality of alternative sources of information.

Goto Top of Page