These functions provide an unbiased alternative to the corresponding
base functions.
sample(x, size, replace = FALSE, prob = NULL) sample.int(n, size = n, replace = FALSE, prob = NULL)
| x | either a vector of one or more elements from which to choose, or a positive integer. |
|---|---|
| size | a non-negative integer giving the number of items to choose. |
| replace | should sampling be with replacement? |
| prob | a vector of probability weights for obtaining the elements of the vector being sampled. |
| n | a positive number, the number of items to choose from. |
Currently there is no support for weighted sampling and for long vectors.
If such situations are encountered, the functions fall back to the equivalent functions
in base.
The used algorithm needs a random 32bit unsigned integer as input. R does
not provide an interface for such a random number. Instead unif_rand()
returns a random double in \((0, 1)\). Internally, the result of unif_rand()
is multiplied with \(2^{32}\) to produce a 32bit unsigned integer. This
works correctly for the default generator Mersenne-Twister, since that produces
a 32bit unsigned integer which is then devided by \(2^{32}\). However, other
generators in R do not follow this pattern so that this procedure might introduce
a new bias.
Daniel Lemire (2018), Fast Random Integer Generation in an Interval, https://arxiv.org/abs/1805.10941.
sample and sample.int
# base::sample produces very different amount of odd an even numbers m <- 2/5 * 2^32 x <- sample(m, 1000000, replace = TRUE) table(x %% 2)#> #> 0 1 #> 500434 499566