Skip to contents

Any two revisions of a Wikipedia page can be compared using the 'diff' tool. The tool compares the 'from' revision to the 'to' revision, looking for insertions, deletions or relocations of text. This operation can be performed in any order, across any span of revisions.

Usage

get_diff(from, to, language = "en", simplify = TRUE)

Arguments

from

Vector of revision ids

to

Vector of revision ids

language

Vector of two-letter language codes (will be recycled if length==1)

simplify

logical: should R simplify the result (see return)

Value

The return value depends on the simplify parameter.

  • If simplify == TRUE: A list of tibble::tbl_df objects the same length as from and to. Most of the response data is stripped away, leaving just the textual differences between the revisions, their location, type and 'highlightRanges' if the textual differences are complicated.

  • If simplify == FALSE: A list the same length as from and to containing the full wikidiff2 response for each pair of revisions. This response includes additional data for displaying diffs onscreen.

Examples

# Compare revision 847170467 to 851733941 on English Wikipedia
get_diff(847170467, 851733941)
#> # A tibble: 2 × 5
#>    type lineNumber text                                    offset_from offset_to
#>   <int>      <int> <chr>                                         <int>     <int>
#> 1     1         97 ""                                               NA     15633
#> 2     1         98 "In 2016, a new species of [[Cecidomyi…          NA     15634

# The function is vectorised, so you can compare multiple pairs of revisions
# in a single call
# See diffs for the last two revisions of the Main Page
revisions <- wiki_action_request() %>%
  query_by_title("Main Page") %>%
  query_page_properties(
    "revisions",
    rvlimit = 2, rvprop = "ids", rvdir = "older"
  ) %>%
  gracefully(next_result)

if (tibble::is_tibble(revisions)) {
  revisions <- revisions %>%
    tidyr::unnest(cols = c(revisions)) %>%
    dplyr::mutate(diffs = get_diff(from = parentid, to = revid))

  print(revisions)
}
#> # A tibble: 2 × 6
#>     pageid    ns title          revid   parentid diffs           
#>      <int> <int> <chr>          <int>      <int> <list>          
#> 1 15580374     0 Main Page 1225315602 1225257602 <tibble [2 × 6]>
#> 2 15580374     0 Main Page 1225257602 1223300368 <tibble [2 × 6]>