Query Wikipedia using the MediaWiki Action API
Source:R/wiki-action-request.R
wiki_action_request.Rd
Wikipedia exposes a To build up a query, you first call
wiki_action_request()
to create the basic request object, then use the
helper functions query_page_properties()
, query_list_pages()
and
query_generate_pages()
to modify the request, before calling next_batch()
or retrieve_all()
to perform the query and download results from the
server.
Arguments
- ...
<
dynamic-dots
> Parameters for the request- action
The action to perform, typically 'query'
- language
The language edition of Wikipedia to request, e.g. 'en' or 'fr'
Value
An action_api
object, an S3 list that subclasses httr2::request.
The dependencies between different aspects of the Action API are complex.
At the time of writing, there are five major subclasses of
action_api/httr2_request
:
generator/action_api/httr2_request
, returned (sometimes) by query_generate_pageslist/action_api/httr2_request
, returned by query_list_pagestitles
,pageids
andrevids/action_api/httr2_request
, returned by the various query_by_ functionsYou can use query_page_properties to modify any kind of query except for
list
queries: indeed, the central limitation of thelist
queries is that you cannot choose what properties to return for the pages the meet the given criterion. The concept of agenerator
is complex. If thegenerator
is based on a property module, then it must be combined with a query_by_ function to produce a valid query. If the generator is based on a list module, then it cannot be combined with a query_by_ query.
Details
wikkitidy provides an ergonomic API for the Action API's Query modules. These modules are most
useful for researchers, because they allow you to explore the structure of
Wikipedia and its back pages. You can obtain a list of available modules in
your R console using list_all_property_modules()
, list_all_list_modules()
and list_all_generators()
,
Examples
# List the first 10 pages in the category 'Australian historians'
historians <- wiki_action_request() %>%
query_list_pages(
"categorymembers",
cmtitle = "Category:Australian_historians",
cmlimit = 10
) %>%
gracefully(next_batch)
historians
#> <complete/query_tbl>
#> ℹ There are more results on the server. Retrieve them with `next_batch()` or `retrieve_all()`
#> ✔ Data complete for all records
#> # A tibble: 10 × 3
#> pageid ns title
#> <int> <int> <chr>
#> 1 72000612 0 Michelle Arrow
#> 2 74445832 0 Alan Atkinson (historian)
#> 3 46828642 0 Craig Benjamin
#> 4 53702558 0 Frank Murcott Bladen
#> 5 59403076 0 Frank Bongiorno
#> 6 31593145 0 R. J. B. Bosworth
#> 7 23093698 0 Tim Bowden
#> 8 22906640 0 Phillip Bradley
#> 9 33224431 0 Richard Broome
#> 10 68287945 0 David Brophy (historian)