Skip to contents

Tidy analysis of Wikipedia in R

What’s in a name?

wiki: There are many wikis, but one dominates the Wikiverse. Wikipedia is the largest repository of facts ever assembled by human hands. Scholars the world over are turning to Wikipedia to understand how twenty-first century society understands itself.

quiddity: The ‘whatness’ of a thing. The kind of thing it is. What is Wikipedia? Is it merely another encyclopaedia? It is news presented as history? Is it the consensus of a global village, or the battleground of an ideological war?

tidy: The best kind of data. R programmers are lucky to have access to the tidyverse, a collection of packages that make it easy to analyse, visualise and publish data. This package embodies tidy data principles by returning results from Wikipedia’s APIs as tibbles or simple vectors, and by providing a number of vectorised analysis functions that can be applied reliably and without fuss to the data you retrieve.

Thus wikkitidy’s aim: to help you work out what Wikipedia is with minimal data wrangling and cleaning.

Getting to 1.0

Version Feature Done?
0.1 Basic request objects
0.2 Calls and response objects for Core and Wikimedia REST APIs
0.3 Calls and response objects for MediaWiki Action API Query Modules
0.4 Interface to Wikipedia XML dumps
0.5 Implementation of Wikiblame
0.6 Calls and response objects for the XTools and WikiMedia APIs


You can install wikkitidy from CRAN with:


You can install the development version from Github with:


Code of Conduct

Please note that the wikkitidy project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.