… but sometimes you just need a good example to show you how. Saana Isojunno from the University of Saint Andrews wrote to me with an example that stopped working when she upgraded from JAGS 3.4.0 to JAGS 4.2.0. Saana’s model showed the classic signs of a memory leak. Compilation was very slow and memory consumption increased until all available RAM was used and the process was killed by the operating system.

Saana sent me a reproducible example, including data and a script file to run the model. I cannot over-emphasize how important this is. Using the valgrind memory profiler I was able to track down and fix the leak, which was due to the compiler creating millions of redundant constant nodes.

I plugged the memory leak, but solving one problem uncovered another. It took an incredible *16 minutes* to compile the model.

This problem was caused by the way the JAGS compiler scans model files. I tend to write models in the BUGS language starting from the data and then working backwards to the parameters, e.g.

for (i in 1:N) { Y[i] ~ dnorm(mu, tau) } mu ~ dnorm(0, 1.0E-3) tau ~ dgamma(1.0E-3, 1.0E-3)

In this example, the paramters `mu`

and `sigma`

are used in the definition of `Y[i]`

before their prior distributions are defined. In JAGS 4.0.0 and above the compiler starts at the bottom of the model file and work upwards. So `tau`

is defined first and then `mu`

so both parameters are defined when the compiler creates `Y[1] ... Y[N]`

.

Saana’s model was written in the opposite order, with many complex deterministic relations defined in forward-sampling order (i.e. a variable is defined on the left before being used on the right). It turns out that JAGS is really bad at compiling models written in this order. By reversing the order of the relations in Saana’s model file, I reduced the compilation time from 16 minutes to 8.3 seconds.

I had no idea that the order of relations could affect compilation efficiency so strongly. Happily I can report that I have fixed this bug. The current version of JAGS in the Sourceforge repository alternates forward and backwards sweeps through the model file which means compilation is now much faster and no longer strongly dependent on the order of relations. The two versions of Saana’s model with different ordering now compile in 6.8 seconds or 8.8 seconds.

Now for the strange part. Today in the forums John Siryj reported a model that was using excessive memory during compilation (8 GB). John’s model also had many deterministic relations defined in forward sampling order. Here is an extract:

for (t in 1:n){ pseudo_x[t] <- x[t]*pow(xisigma2[t],0.5) pseudo_y[t] <- y[t]*pow(yisigma2[t],0.5) NZD_AUD_t_cop[t] <- log( (1/(2*3.14159*sqrt((1-pow(rho,2))))) * (1/(dt(pseudo_x[t],0,1,nu)*dt(pseudo_y[t],0,1,nu))) * pow(1 + (pow(pseudo_x[t],2) + pow(pseudo_y[t],2) + (2*rho*pseudo_x[t]*pseudo_y[t])/(nu*(1-pow(rho,2)))),-(nu + 2)/2)) loglik[t] <- NZD_AUD_t_cop[t] copulat[t] <- -loglik[t] + C zeros[t] ~ dpois(copulat[t]) }

Having just diagnosed and fixed the underlying bug I was able to advise John about how to work around the issue by reversing the order of the relations. But this issue has been around at least since the release of JAGS 4.2.0 in February, and probably longer. Why did I get two bug reports pointing to the same problem in the space of a few days?

This is similar to the so-called “Baader-Meinhof phenomenon”, a form of cognitive bias that makes you think you hear a new word or phrase with unusually high frequency after hearing it for the first time. Most likely, I had been seeing similar reports but not recognizing the underlying problem. Saana’s example caused a crash and was a clear deterioration between JAGS versions. So that is how I knew it was a bug.

This is great news because I’ve been suffering from what sounds like the same problem for months! However I’m confused about how to get the current version which has been fixed to sweep back and forth. The latest version I could find on sourceforge is 4.2.0 released 2016-06-22, and wouldn’t have the bug fix you made in Sep 2016, right?

This bug is serious enough that it merits a patch release, but that will take a little time. I’m a little tied up at the moment (see the next post) but will try to push 4.3.0 with the fix as soon as possible.

I can’t thank you enough for this info. I reversed the order of statements in my model description, and it made a huge improvement in my run time and ability to stay within available RAM. One model ran for 13 hours before it ate up 16GB of RAM and crashed. After I reversed the model description, then the job successfully ran and finished in only 1/2 an hour. 🙂