When you want to use random number generators (RNG) for parallel computations, you need to make sure that the sequences of random numbers used by the different processes do not overlap. There are two main approaches to this problem:1
The RNGs included in dqrng
offer at least one of these
methods for parallel RNG usage. When using the R or C++ interface
independent streams can be accessed using the two argument
dqset.seed(seed, stream)
or
dqset_seed(seed, stream)
functions.
The Threefry engine uses internally a counter with \(2^{256}\) possible states, which can be
split into different substreams. When used from R or C++ with the two
argument dqset.seed
or dqset_seed
this counter
space is split into \(2^{64}\) streams
with \(2^{192}\) possible states each.
This is equivalent to \(2^{64}\)
streams with a period of \(2^{194}\)
each.
In the following example a matrix with random numbers is generated in parallel using the parallel package. The resulting correlation matrix should be close to the identity matrix if the different streams are independent:
library(parallel)
cl <- parallel::makeCluster(2)
res <- clusterApply(cl, 1:8, function(stream, seed, N) {
library(dqrng)
dqRNGkind("Threefry")
dqset.seed(seed, stream)
dqrnorm(N)
}, 42, 1e6)
stopCluster(cl)
res <- matrix(unlist(res), ncol = 8)
symnum(x = cor(res), cutpoints = c(0.001, 0.003, 0.999),
symbols = c(" ", "?", "!", "1"),
abbr.colnames = FALSE, corr = TRUE)
Correlation matrix:
[1,] 1
[2,] 1
[3,] ? 1
[4,] ? ? 1
[5,] ? ? 1
[6,] ? 1
[7,] ? 1
[8,] ? 1
attr(,"legend")
[1] 0 ‘ ’ 0.001 ‘?’ 0.003 ‘!’ 0.999 ‘1’ 1
As expected the correlation matrix for the different columns is almost equal to the identity matrix.
The Xoshiro256+/++/** generators has a period of \(2^{256} -1\) and offers \(2^{128}\) sub-sequences that are \(2^{128}\) random draws apart as well as
\(2^{64}\) streams that are \(2^{192}\) random draws apart. The
Xoroshiro128+/++/** generators has a period of \(2^{128} -1\) and offers \(2^{64}\) sub-sequences that are \(2^{64}\) random draws apart as well as
\(2^{32}\) streams that are \(2^{98}\) random draws apart. You can go
from one sub-sequence to the next using the jump()
or
long_jump()
method and the convenience wrapper
jump(int n)
or long_jump(int n)
, which
advances to the n
th sub-sequence. When used from R or C++
with the two argument dqset.seed
and
dqset_seed
you get \(2^{64}\) streams that are \(2^{192}\) and \(2^{64}\) random draws apart for
Xoshiro256+/++/** and Xoroshiro128+/++/**, respectively.
As an example using C++ we draw and sum a large number of uniformly
distributed numbers. This is done several times sequentially as well as
using OpenMP for parallelisation. Care has been taken to keep the global
RNG rng
usable outside of the parallel block.
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::plugins(openmp)]]
#include <omp.h>
// [[Rcpp::export]]
std::vector<double> random_sum(int n, int m) {
::uniform_distribution dist(0.0, 1.0); // Uniform distribution [0,1)
dqrng::xoshiro256plus rng(42); // properly seeded rng
dqrngstd::vector<double> res(m);
for (int i = 0; i < m; ++i) {
double lres(0);
for (int j = 0; j < n; ++j) {
+= dist(rng);
lres }
[i] = lres / n;
res}
return res;
}
// [[Rcpp::export]]
std::vector<double> parallel_random_sum(int n, int m, int ncores) {
::uniform_distribution dist(0.0, 1.0); // Uniform distribution [0,1)
dqrng::xoshiro256plusplus rng(42); // properly seeded rng
dqrngstd::vector<double> res(m);
// ok to use rng here
#pragma omp parallel num_threads(ncores)
{
::xoshiro256plusplus lrng(rng); // make thread local copy of rng
dqrng.long_jump(omp_get_thread_num() + 1); // advance rng by 1 ... ncores jumps
lrng
#pragma omp for
for (int i = 0; i < m; ++i) {
double lres(0);
for (int j = 0; j < n; ++j) {
+= dist(lrng);
lres }
[i] = lres / n;
res}
}
// ok to use rng here
return res;
}
/*** R
bm <- bench::mark(
parallel_random_sum(1e7, 8, 4),
random_sum(1e7, 8),
check = FALSE
)
bm[,1:4]
*/
Result:
expression min median `itr/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl>
1 parallel_random_sum(1e+07, 8, 4) 98.3ms 99ms 10.1
2 random_sum(1e+07, 8) 270.2ms 271ms 3.68
From the PCG family we will look at pcg64, a 64-bit generator with a
period of \(2^{128}\). It offers the
function advance(int n)
,
which is equivalent to n
random draws but scales as \(O(ln(n))\) instead of \(O(n)\). In addition, it offers \(2^{127}\) separate streams that can be
enabled via the function set_stream(int n)
or the two
argument constructor with seed
and stream
.
When used from R or C++ with the two argument dqset.seed
and dqset_seed
you get \(2^{64}\) streams out of the possible \(2^{127}\) separate streams.
In the following example a matrix with random numbers is generated in
parallel using RcppParallel. Instead of using the more traditional
approach of generating the random numbers from a certain distribution,
we are using the fast sampling methods from dqrng_sample.h
.
As a consequence, we cannot use pcg64
directly but have to
wrap it as dqrng::generator
. The resulting correlation
matrix should be close to the identity matrix if the different streams
are independent:
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <pcg_random.hpp>
#include <dqrng_sample.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
struct RandomFill : public RcppParallel::Worker {
::RMatrix<int> output;
RcppParalleluint64_t seed;
(Rcpp::IntegerMatrix output, const uint64_t seed) : output(output), seed(seed) {};
RandomFill
void operator()(std::size_t begin, std::size_t end) {
auto rng = dqrng::generator<pcg64>(seed, end);
for (std::size_t col = begin; col < end; ++col) {
auto sampled = dqrng::sample::sample<std::vector<int>, uint32_t>(*rng, 100000, output.nrow(), true);
::RMatrix<int>::Column column = output.column(col);
RcppParallelstd::copy(sampled.begin(), sampled.end(), column.begin());
}
}
};
// [[Rcpp::export]]
::IntegerMatrix parallel_random_matrix(const int n, const int m, const int ncores) {
Rcpp::IntegerMatrix res(n, m);
Rcpp(res, 42);
RandomFill randomFill::parallelFor(0, m, randomFill, m/ncores + 1);
RcppParallelreturn res;
}
/*** R
res <- parallel_random_matrix(1e6, 8, 4)
head(res)
symnum(x = cor(res), cutpoints = c(0.001, 0.003, 0.999),
symbols = c(" ", "?", "!", "1"),
abbr.colnames = FALSE, corr = TRUE)
*/
Head of the random matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 67984 16279 69262 7126 21441 37720 51107 51045
[2,] 69310 21713 82885 81157 54051 5261 91165 17833
[3,] 76742 31232 78953 4626 94939 29416 85652 78296
[4,] 76349 47427 1770 37957 33888 59134 94591 65793
[5,] 85008 89224 43493 7925 60866 2464 14080 10763
[6,] 38017 88509 51195 73086 1883 68193 75259 62216
Correlation matrix:
[1,] 1
[2,] 1
[3,] ? 1
[4,] ? 1
[5,] 1
[6,] ? ? ? 1
[7,] ? 1
[8,] ? 1
attr(,"legend")
[1] 0 ‘ ’ 0.001 ‘?’ 0.003 ‘!’ 0.999 ‘1’ 1
So as expected the correlation matrix is almost equal to the identity matrix.
See for example https://www.pcg-random.org/posts/critiquing-pcg-streams.html.↩︎