Wikipedia pageviews: still in decline

post by VipulNaik · 2017-09-26T23:03:27.902Z · LW · GW · 19 comments

Contents

19 comments

In March 2015, I wrote about a decline in Wikipedia desktop pageviews over the last few years (and posted a short version to LessWrong). With a lot of help from Issa Rice over the last year, and a lot more quality data, I've revisited the claims of that post.

This post provides a high-level summary of my takeaways. If enough people express interest in the comments, I intend to write up in more detail on the aspects that people express interest in. If I do a more detailed writeup, it will probably be in the latter half of 2018, giving enough additional data to evaluate how well the decline hypothesis holds up.

Here are the top-level conclusions.

  1. Have English desktop Wikipedia pageviews (i.e., pageviews of Wikipedia pages from desktop devices) actually declined?
    Short answer: Yes, they have declined by over 50% since the peak between late 2012 and late 2013. Some supposedly timeless page types have declined by up to 75-80%. The effect of per-page decline is partly cancelled by increase in number of pages.
    If I do a longer post, I'll compare the time periods September to November 2012 against September to November 2017, and April to June 2013 against April to June 2018. Both are three-month periods, with equal representation of all days of week, the same time of the year, and with a separation of five years.

  2. Why have English desktop pageviews declined?
    Short answer: Substitution to mobile could explain between 10 and 40 percentage points of the desktop decline. I personally gravitate to the lower end of the estimate range.
    Inclusion/exclusion of non-human traffic could explain between 5 and 20 percentage points of the decline.
    Switch to HTTPS and the block of Wikipedia in China explain a sharp mid-2015 decline, but use of Chinese Wikipedia (which should have been most affected) has recovered, and I expect the long-term effect to be close to zero. At most, it is 5 percentage points.
    The residual decline is between 0 and 20 percentage points, which, after rebasing, is between 0 and 40% for desktop. Two leading candidates to explain the residual are increased reliance on social media and search engine algorithm changes.

  3. Have total (desktop + mobile) human English Wikipedia pageviews declined? Why?
    Short answer: Total (desktop + mobile) human pageviews likely peaked around late 2013, and have declined by about 20% since then. Per-page pageviews have gone down significantly more for the page types that saw the biggest desktop declines. Effect of per-page decline is partly cancelled by increase in number of pages.
    Candidate explanations are the same as for (2): increased reliance on social media and search engine algorithm changes.

  4. Is there a compensating increase in other language Wikipedias?
    Short answer: No. In fact, other top language Wikipedias (German, Russian, Spanish, Japanese, French) have a broadly similar decline trend as the English Wikipedia, both overall and per-page.
    Some minor language Wikipedias saw a huge proportional increase but not enough to compensate for the English Wikipedia decline. For instance, monthly Hindi Wikipedia mobile web pageviews exploded from about 1 million in early 2013 to over 30 million in 2017, which is peanuts compared to 3-4 billion monthly English desktop and English mobile web Wikipedia pageviews.
    The lowest-traffic language Wikipedias saw a huge proportional decline in desktop and mobile web traffic in 2015, which is explained by bot filtering being activated.

  5. Do people subjectively feel they are using Wikipedia less? How do we square their subjective impressions with the statistics?
    People generally perceive either no change in use or say they don't use Wikipedia at all.
    But in a head-to-head comparison of "use more now" versus "use less now", the former wins.

Why might this be an interesting thing to study?

Wikipedia pageview data is one of the most comprehensive and granular open datasets covering a wide variety of areas of interest, so they provide a useful way to understand both people's relative interest in different topics, and the trends in individual topics as well as the Internet as a whole. Specifically:

  1. If you're interested in how interest in specific topics has evolved over time, or if you're interested in how people's Internet use has changed over time, Wikipedia pageviews are a useful part of your toolkit, just like Google Trends. Having a good sense of the general trends in Wikipedia pageviews allows you to better "normalize" for these trends and give more context to the numbers you see.

  2. If you're interested in the overall growth (or decline!) of the Internet, Wikipedia, as one of the top sites on the Internet, and one that does not engage in a lot of advertising and view optimization, offers some insight.

  3. One of the hypotheses that might explain part of the decline, namely increased reliance on social media, is of particular interest to rationalists and LessWrong. LessWrong pageviews also peaked at roughly the same time as Wikipedia pageviews, and social media (particularly Facebook) has been implicated in the decline of LessWrong (see the comments here).

So, what do you think? How interesting do you find this topic? What parts are you skeptical of? What parts are you most interested in seeing explored or justified more rigorously?

PS: If you're curious what a more detailed report might look like, check out the draft Issa and I worked on last year. All responsibility for errors, both in the draft and in this teaser post, is mine. You can also check out the timeline of Wikimedia analytics to understand changes relevant to interpreting analytics.

19 comments

Comments sorted by top scores.

comment by AndHisHorse · 2017-09-26T23:40:46.368Z · LW(p) · GW(p)

I am someone who has found that I'm using Wikipedia less, and I find that I'm relying more on Google than I used to, for what I used to use Wikipedia for. In particular, Featured Snippets in Search (which will often pull an excerpt from a Wikipedia article!) are a fantastic substitute for quick questions that I would, in past years, have asked Wikipedia, although it isn't a substitute for a deeper exploration.

Replies from: DragonGod, None
comment by DragonGod · 2017-09-27T18:06:25.805Z · LW(p) · GW(p)

This sounds very plausible, and my singular data point confirms.

comment by [deleted] · 2017-09-27T00:11:46.096Z · LW(p) · GW(p)

I agree that Featured Snippets is probably the cause of my decreased reliance as well.

comment by John_Maxwell (John_Maxwell_IV) · 2017-09-27T21:40:48.803Z · LW(p) · GW(p)

Another possibility is that Wikipedia is facing increased competition from other info providers such as content marketers?

Edit: I suppose you might measure this effect by trying to see if Wikipedia's position in search engine rankings has dropped. Or alternatively, it might be interesting to compare Wikipedia traffic for a particular concept to Google Trends for that concept. If it's a concept that doesn't get discussed much on social media, and Google Trends is increasing while Wikipedia is declining, that seems like evidence against the social media displacement hypothesis.

Replies from: VipulNaik
comment by VipulNaik · 2017-09-28T05:03:43.539Z · LW(p) · GW(p)

Great points. As I noted in the post, search and social media are the two most likely proximal mechanisms of causation for the part of the decline that's real. But neither may represent the "ultimate" cause: the growth of alternate content sources, or better marketing by them, or changes in user habits, might be what's driving the changes in social media and search traffic patterns (in the sense that the reason Google's showing different results, or Facebook is making some content easier to share, is itself driven by some combination of what's out there and what users want).

The main challenge with search engine ranking data is that (a) the APIs forbid downloading the data en masse across many search terms, and (b) getting historical data is difficult. Some SEO companies offer historical data, but based on research Issa and I did last year, we'd have to pay a decent amount to even be able to see if the data they have is helpful to us, and it may very well not be.

The problem with Google Trends is that (a) it does a lot of normalization (it normalizes search volume relative to total search volume at the time), which makes it tricky to interpret data over time, and (b) it's hard to download data en masse. Also, a lot of Google Trends results are just amusingly weird, e.g. https://trends.google.com/trends/explore?date=all&q=Facebook (see https://www.facebook.com/vipulnaik.r/posts/10208985033078964 for more discussion)-- are we really to believe that interest in Facebook spiked in October 2012, and that it has returned in 2017 (after a 5-year decline) to what it used to be back in 2009? Google Trends is just yet another messy data series that I would have to acquire expertise in the nuances of, not a reliable beacon of truth against which Wikipedia data can be compared.

The one external data source I have been able to collect with reasonable reliability is Facebook share counts. At the end of each month, I record Facebook share counts for a number of Wikipedia pages by hitting the Facebook API (a process that takes several days because of Facebook's rate limiting). Based on this I now have decent time series of cumulative Facebook share counts, such as https://wikipediaviews.org/displayviewsformultiplemonths.php?tag=Colors&allmonths=allmonths-api&language=en&drilldown=cumulative-facebook-shares If I do a more detailed analysis, this data will be important for evaluating the social media hypothesis.

How interested are you in seeing an exploration of the search engine ranking and increased use of social media hypotheses?

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2017-10-02T03:34:37.314Z · LW(p) · GW(p)

are we really to believe that interest in Facebook spiked in October 2012, and that it has returned in 2017 (after a 5-year decline) to what it used to be back in 2009

That seems very plausible to me; this kind of cyclical interest seems pretty common for social sites. This would also explain Facebook's eagerness to acquire up-and-comers like Instagram and Snapchat.

How interested are you in seeing an exploration of the search engine ranking and increased use of social media hypotheses?

Somewhat interested, although I'm also not super clear on what relevance we think Wikipedia traffic has in the grand scheme of things.

comment by ChristianKl · 2017-09-27T11:43:17.317Z · LW(p) · GW(p)

It's interesting how this concern is completely absent from the present Wikimedia strategy planning exercise.

Replies from: VipulNaik
comment by VipulNaik · 2017-09-27T15:10:27.060Z · LW(p) · GW(p)

The Wikimedia Foundation has not ignored the decline. For instance, they discuss the overall trends in detail in their quarterly readership metrics reports, the latest of which is at https://commons.wikimedia.org/wiki/File:Wikimedia_Foundation_Readers_metrics_Q4_2016-17_(Apr-Jun_2017).pdf The main difference between what they cover and what I intend to cover are (a) they only cover overall rather than per-page pageviews, (b) they focus more on year-over-year comparisons than long-run trends, (c) related to (b), they don't discuss the long-run causes. However, these reports are a great way of catching up on incremental overall traffic level updates as well as any analytics or measurement discrepancies that might be driving weird numbers.

The challenge of raising more funds with declining traffic has also been noted in fundraiser discussions, such as at https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-14/News_and_notes which has the quote:

Better performing banners are required to raise a higher budget with declining traffic. We’ll continue testing new banners into the next quarter and sharing highlights as we go.

comment by Ben Pace (Benito) · 2017-09-26T23:31:17.988Z · LW(p) · GW(p)

Added to the frontpage. (Also, deleted your two empty comments.)

Replies from: VipulNaik
comment by VipulNaik · 2017-09-27T01:43:49.377Z · LW(p) · GW(p)

They still show up in the total comment count :).

comment by lmn · 2017-09-29T01:45:28.581Z · LW(p) · GW(p)

To mention the elephant in the living room, I wonder if the increasingly broken wikipedia mod culture has something to do with this.

Replies from: VipulNaik
comment by VipulNaik · 2017-09-29T04:51:46.927Z · LW(p) · GW(p)

Great point. As somebody who has been in the crosshairs of Wikipedia mods (see ANI) my bias would push me to agree :). However, despite what I see as problems with Wikipedia mod culture, it remains true that Wikipedia has grown quite a bit, both in number of articles and length of already existing articles, over the time period when pageviews declined. I suspect the culture is probably a factor in that it represents an opportunity cost: a better culture might have led to an (even) better Wikipedia that would not have declined in pageviews so much, but I don't think the mod culture led to a quality decline per se. In other words, I don't think the mechanism:

counterproductive mod culture -> quality decline -> pageview decline

is feasible.

Replies from: lmn
comment by lmn · 2017-09-30T03:59:51.953Z · LW(p) · GW(p)

You seem to be conflating quantity and quality.

Replies from: VipulNaik
comment by VipulNaik · 2017-09-30T04:05:04.345Z · LW(p) · GW(p)

In the case of Wikipedia, I think the aspects of quality that correlate most with explaining pageviews are readily proxied by quantity. Specifically, the main quality factors in people reading a Wikipedia page are (a) the existence of the page (!), (b) whether the page has the stuff they were looking for. I proxied the first by number of pages, and the second by length of the pages that already existed. Admittedly, there are a lot more subtleties to quality measurement (which I can go into in depth at some other point) some of which can have indirect, long-term effects on pageviews, but on most of these dimensions Wikipedia hasn't declined in the last few years (though I think it has grown more slowly than it would with a less dysfunctional mod culture, and arguably too slowly to keep pace with the competition).

Replies from: lmn
comment by lmn · 2017-09-30T04:14:03.619Z · LW(p) · GW(p)

Specifically, the main quality factors in people reading a Wikipedia page are (a) the existence of the page (!), (b) whether the page has the stuff they were looking for.

(c) whether the information on the page is accurate.

I proxied the first by number of pages, and the second by length of the pages that already existed.

Except not all topics and not all information are of equal interest to people.

Replies from: VipulNaik, VipulNaik
comment by VipulNaik · 2017-09-30T14:50:26.209Z · LW(p) · GW(p)

FWIW, my impression is that data on Wikipedia has gotten somewhat more accurate over time, due to the push for more citations, though I think much of this effect occurred before the decline started. I think the push for accuracy has traded off a lot against growth of content (both growth in number of pages and growth in amount of data on each page). These are crude impressions (I've read some relevant research but don't have strong reason to believe that should be decisive in this evaluation) but I'm curious to hear what specific impressions you have that are contrary to this.

comment by VipulNaik · 2017-09-30T14:46:35.195Z · LW(p) · GW(p)

If you have more fine-grained data at your disposal on different topics and how much each has grown or shrunk in terms of number of pages, data available on each page, and accuracy, please share :).

comment by panickedapricott · 2017-09-27T04:34:17.796Z · LW(p) · GW(p)

I've never been a big user of Wikipedia.

comment by VipulNaik · 2017-09-26T23:04:47.956Z · LW(p) · GW(p)