These functions provide an unbiased alternative to the corresponding
base
functions.
sample(x, size, replace = FALSE, prob = NULL) sample.int(n, size = n, replace = FALSE, prob = NULL)
x | either a vector of one or more elements from which to choose, or a positive integer. |
---|---|
size | a non-negative integer giving the number of items to choose. |
replace | should sampling be with replacement? |
prob | a vector of probability weights for obtaining the elements of the vector being sampled. |
n | a positive number, the number of items to choose from. |
Currently there is no support for weighted sampling and for long vectors.
If such situations are encountered, the functions fall back to the equivalent functions
in base
.
The used algorithm needs a random 32bit unsigned integer as input. R does
not provide an interface for such a random number. Instead unif_rand()
returns a random double in \((0, 1)\). Internally, the result of unif_rand()
is multiplied with \(2^{32}\) to produce a 32bit unsigned integer. This
works correctly for the default generator Mersenne-Twister, since that produces
a 32bit unsigned integer which is then devided by \(2^{32}\). However, other
generators in R do not follow this pattern so that this procedure might introduce
a new bias.
Daniel Lemire (2018), Fast Random Integer Generation in an Interval, https://arxiv.org/abs/1805.10941.
sample
and sample.int
# base::sample produces very different amount of odd an even numbers m <- 2/5 * 2^32 x <- sample(m, 1000000, replace = TRUE) table(x %% 2)#> #> 0 1 #> 500434 499566