Because
Wikipedia is a process, not a product, it replaces guarantees
offered by
institutions with probabilities supported by process: if enough
people care enough about an article to read
it, then enough people will care
enough to improve it, and over time this will
lead to a large enough body
of good enough work to begin to take both
availability and quality of articles
for granted, and to integrate Wikipedia into
daily use by millions.
Clay Shirky (2008), Here Comes Everybody
This study intends to contribute to a better
understanding of the Wiki phenomenon as a
knowledge management system which aggregates private knowledge and check to
what extent information generated through anonymous and freely bestowed mass
collaboration is reliable as opposed to the traditional approach. To achieve
that goal, we develop a comparative study between Wikipedia and Britannica Encyclopedias,
in order to confront the quality of the knowledge repository produced by them.
That will allow us to reach a conclusion about the efficacy of the business
models behind them.
We used a representative random sample[2]
which is composed by the articles that are comprised in both encyclopedias[3].
Each pair of articles was previously reformatted to hide its source and then
graded by an expert in its subject area using a five-point scale. We asked
experts to concentrate only on some[4]
intrinsic aspects of the articles’ quality, namely accuracy and objectivity,
and discard the contextual, representational and accessibility aspects.
Whenever possible, the experts invited to participate in the study are University
teachers, because they are used to grading students’ work not using the
reputation of the source.
The articles have been divided into
four main categories: Arts & Entertainment, History & Society, Science
& Technology and Travel & Geography[5].
Each main category has further been subdivided in order to find the most
suitable expert to evaluate it. The average results obtained are presented
below:
To calculate the impact of having only one evaluator per article - imposed by the sample size and articles’ length - we collected a small[6] convenience sample which only integrates Management articles. Each pair of articles was graded by several[7] experts in order to determine the uncertainty associated with having diverse gradings of the same article. The uncertainty indicators obtained are α = 0,9 and MAD = 0,6.
In order to further
normalize the results we transformed the {1, 2, 3, 4, 5} scale used by each
evaluator in a nine-point scale {-4,-3,-2,-1, 0, 1, 2, 3, 4}, subtracting in each
article the grade of the Wikipedia from the correspondent Britannica pair. This
step allows us to concentrate on the difference in quality and mitigate the
noise induced by eventual differences in interpretation of the absolute scale
among the evaluators.
To deal with the lack of Geography grades,
we find a significant nonlinear correlation[8]
between the average difference in the number of words per article (grouped by
grades difference) and the difference in grades, shown in the following chart:
Using that relation, we managed to
predict that the global average grade difference of the Geography articles will
be 1.7. The function cannot be used to predict individual differences for a
particular pair of articles, but in terms of global result, and despite all the
assumptions and weakness of this approach, it is one indicator that a future
assessment of these articles may lead to a global value of the same order of
magnitude to those obtained from the assessed areas.
In global terms, and setting aside
the Geography articles, the conclusion was that the average quality of the
Wikipedia articles which were analyzed was superior to its peer’s and that this
difference was statistically significant. The graphic below shows that 90% of
the Wikipedia articles have been considered as having equivalent or better quality
than their Britannica counterparts.
The global value of the difference among the pairs of articles assessed has an average of 1,4 ± 0,9 (average absolute deviation) and a median of 1,0 ± 0,6 (median absolute deviation). Those uncertainty values have been calculated using the uncertainty of the management articles pairs’ grades, assessed by multiple evaluators as mentioned above.
In parallel
with this study, a survey[9]
answered by university professors, was used to characterize the universe of evaluators,
showed that traditional information sources were used only by a few (25%) as
the first approach to seeking information. Nevertheless, this inquiry also made
clear that reliance on these sources was considerably larger[10]
than reliance on information obtained through Wikipedia or other nontraditional
sources.
This quality perception,
as well as the diametrically opposed results of its evaluation through the
blind test assessment, reinforces the evaluating panel’s exemption. The following graph will show the reliance
on information obtained through alternative sources when compared
to classical ones:
However much the chosen
sample is representative of the universe to be studied, results have depended
on the evaluators’ personal opinion and chosen criteria. This means that the
reproducibility of this study’s conclusions using a different grading panel
cannot be guaranteed. Nevertheless, this is not enough of a reason to reject
the study results obtained through more than five hundred evaluations.
One explanation for the
success of Wikipedia can be found in the altruism of individuals who come
together around themes they share the same passion for, and it is expected they
have remarkable knowledge on these. In other words, in the case studied, mass
collaboration seems self-organized, leading to an organization of self-assessment
and self-correction among peers which produces impressive results, against all
odds.
What if information skills become a mass amateur
activity?
[1] A monkey hitting keys at random on
a typewriter, for an infinite amount of time, will almost surely type the
complete works of Shakespeare.
[3] Because Britannica is much smaller
in size (≈25 smaller), 6.382 articles are randomly drawn from Wikipedia, in
order to achieve 245 pairs of articles present in both encyclopedias.
[4] Believability and reputation are obviously not
available in a blind test.
[5] Unfortunately, it was impossible to
find experts available to grade this category (only 5% of the articles are
graded), so the conclusions are about all the fields of knowledge except
Geography.
[6] 4 pairs (8 articles)
[7] 12 Experts per pair of articles.
[9] 63 answers (10% Response Rate)
[10] 95% of the respondents are
suspicious or indifferent about the quality of alternative sources of
information.
Goto Top of Page