Lack of reproducibility in the glm module

Kodi Arfer sent a script that reveals a bug in the glm module.  If you set the random number generator seed in the initial values, then the output from jags should be identical every time you run the model.  Unfortunately, Kodi’s script shows that this is not the case.   In the interests of transparency I am putting a note here on the blog but I currently have no idea how to fix this.

The script, and some sample output, is below

library(rjags)
    load.module("glm")

set.seed(55)

ilogit = function (v)  1 / (1 + exp(-v))

N = 100
x1 = (1:N - N/2)/100
x2 = rnorm(N, 0, 3)
z = .1 + .3 * x1 - .7 * x2
y = rbinom(N, 1, ilogit(z))

model = 'model
   {for (i in 1:N)
       {y[i] ~ dbern(ilogit(b0 + b1 * x1[i] + b2 * x2[i]))}
    b0 ~ dnorm(0, 100^(-2))
    b1 ~ dnorm(0, 100^(-2))
    b2 ~ dnorm(0, 100^(-2))}'

go = function()
   {inits = list(b0 = 0, b1 = 0, b2 = 0)
    jags = jags.model(textConnection(model),
        data = list(N = N, y = y, x1 = x1, x2 = x2),
        inits = c(inits, list(
            .RNG.name = "base::Wichmann-Hill",
            .RNG.seed = 5
            )),
        n.chains = 1, n.adapt = 200, quiet = T)
    samp = coda.samples(jags,
        variable.names = names(inits),
        n.iter = 500, quiet = T)
    list(jags = jags, samp = samp)}

print(replicate(10,
   {x = go();
    as.numeric(x$samp[[1]][1,1])}))

The script prints one of the sampled values after 500 iterations. This value should be the same across all 10 replications, which is what we see without the glm module:

 [1] 0.1390914 0.1390914 0.1390914 0.1390914 0.1390914 0.1390914
 [7] 0.1390914 0.1390914 0.1390914 0.1390914

But when the glm module is loaded, the sampled values are not identical across runs

 [1] -0.20043641  0.38309648  0.38309648  0.38309648  0.38309648
 [6]  0.38309648  0.38309648  0.38309648  0.38309648 -0.08749869

Of course, the statistical properties of the sampled output are still correct. This does not mean you should avoid using the glm module, just that you will not necessarily be able to reproduce the simulations exactly in a later run.

Advertisements

2 thoughts on “Lack of reproducibility in the glm module

  1. Perhaps this is related to the use of srand / rand in src/modules/glm/CSparse/cs_randperm.c.
    rand() is not re-entrant and thread-safe, and if any of that is going on, results will be unpredictable. Maybe srand_48r,drand_48r would be better.
    This is the result of looking for very low hanging fruit, so the real problem might very well be something else.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s