Skip to contents

These functions provide access to the CategoryMembers endpoint of the Action API.

query_category_members() builds a generator query to return the members of a given category.

build_category_tree() finds all the pages and subcategories beneath the passed category, then recursively finds all the pages and subcategories beneath them, until it can find no more subcategories.


  namespace = NULL,
  type = c("file", "page", "subcat"),
  limit = 10,
  sort = c("sortkey", "timestamp"),
  dir = c("ascending", "descending", "newer", "older"),
  start = NULL,
  end = NULL,
  language = "en"

build_category_tree(category, language = "en")



A query request object


The category to start from. query_category_members() accepts either a numeric pageid or the page title. build_category_tree() accepts a vector of page titles.


Only return category members from the provided namespace


Alternative to namespace: the type of category member to return. Multiple types can be requested using a character vector. Defaults to all.


The number to return each batch. Max 500.


How to sort the returned category members. 'timestamp' sorts them by the date they were included in the category; 'sortkey' by the category member's unique hexadecimal code


The direction in which to sort them


If sort == 'timestamp', only return category members from after this date. The argument is parsed by lubridate::as_date()


If sort == 'timestamp', only return category members included in the category from before this date. The argument is parsed by lubridate::as_date()


The language edition of Wikipedia to query


query_category_members(): A request object of type generator/query/action_api/httr2_request, which can be passed to next_batch() or retrieve_all(). You can specify which properties to retrieve for each page using query_page_properties().

build_category_tree(): A list containing two dataframes. nodes lists all the subcategories and pages found underneath the passed categories. edges records the connections between them. The source column gives the pageid of the parent category, while the target column gives the pageid of any categories, pages or files contained within the source category. The timestamp records the moment when the target page or subcategory was included in the source category. The two dataframes in the list can be passed to igraph::graph_from_data_frame for network analysis.

See also


# Get the first 10 pages in 'Category:Physics' on English Wikipedia
physics_members <- wiki_action_request() %>%
  query_category_members("Physics") %>%
#> <complete/query_tbl>
#> # A tibble: 10 × 3
#>      pageid    ns title                      
#>       <int> <int> <chr>                      
#>  1    22939     0 Physics                    
#>  2   168907     0 Naïve physics              
#>  3   844186     0 Modern physics             
#>  4  1653925   100 Portal:Physics             
#>  5 74985603     0 Edge states                
#>  6 78053369     0 Bijel                      
#>  7 78147827     0 Electrostatic solitary wave
#>  8 78245824     0 Nottingham effect          
#>  9 78554064     0 History of the LED         
#> 10 78751748     0 Missile lofting            
#>  There are more results on the server. Retrieve them with `next_batch()` or `retrieve_all()`
#>  Data complete for all records

# Build the tree of all albums for the Melbourne band Custard
tree <- build_category_tree("Category:Custard_(band)_albums")
#> ⠙ Walking subcategories: 1 done (556/s) | 2ms
#> ⠹ Walking subcategories: 2 done (14/s) | 148ms
#> $nodes
#> # A tibble: 12 × 4
#>      pageid    ns title                                      type  
#>       <int> <int> <chr>                                      <chr> 
#>  1 41181643    14 Category:Custard_(band)_albums             root  
#>  2 47888836     0 Come Back, All Is Forgiven                 page  
#>  3 59271122     0 The Common Touch (album)                   page  
#>  4 30333352     0 Loverama                                   page  
#>  5 63691299     0 Respect All Lifeforms                      page  
#>  6 77627299     0 Suburban Curtains                          page  
#>  7 43770191     0 Wahooti Fandango                           page  
#>  8 30333401     0 We Have the Technology                     page  
#>  9 43769837     0 Wisenheimer                                page  
#> 10 41148700    14 Category:Custard (band) compilation albums subcat
#> 11 43770688     0 Brisbane 1990–1993                         page  
#> 12 43770872     0 Goodbye Cruel World (Custard album)        page  
#> $edges
#> # A tibble: 11 × 3
#>      source   target timestamp           
#>       <int>    <int> <chr>               
#>  1 41181643 47888836 2015-09-21T10:58:43Z
#>  2 41181643 59271122 2019-01-06T17:20:32Z
#>  3 41181643 30333352 2013-11-24T21:09:05Z
#>  4 41181643 63691299 2020-04-18T06:08:40Z
#>  5 41181643 77627299 2024-10-08T21:14:08Z
#>  6 41181643 43770191 2014-09-08T08:02:46Z
#>  7 41181643 30333401 2013-11-24T21:09:09Z
#>  8 41181643 43769837 2014-09-08T06:31:49Z
#>  9 41181643 41148700 2013-11-21T14:38:43Z
#> 10 41148700 43770688 2015-05-20T06:12:07Z
#> 11 41148700 43770872 2015-04-26T23:42:41Z

# For network analysis and visualisation, you can pass the category tree
# to igraph
tree_graph <- igraph::graph_from_data_frame(tree$edges, vertices = tree$nodes)
#> IGRAPH 8ac7ebd DN-B 12 11 -- 
#> + attr: name (v/c), ns (v/n), title (v/c), type (v/c), timestamp (e/c)
#> + edges from 8ac7ebd (vertex names):
#>  [1] 41181643->47888836 41181643->59271122 41181643->30333352 41181643->63691299
#>  [5] 41181643->77627299 41181643->43770191 41181643->30333401 41181643->43769837
#>  [9] 41181643->41148700 41148700->43770688 41148700->43770872