This is a direct (though minor) answer to Daniel’s blogpost Power Analysis for default Bayesian t-tests, which I found very interesting, as I have been trying to get my head around Bayesian statistics for quite a while now. However, one thing that bugs me, is the time needed for the simulation. On my machine it took around 22 minutes. Depending on the task, 22 minutes for a signle test can be way too long (especially if the tests are done in a business environment where many tests are needed – yesterday) and a simple t-test might be more appealing only because it takes a shorter computing time. Here is my solution to speed-up the code using snowfall‘s load-balancing parallel structures to reduce the time to 8.5 minutes.
The initial code (which you can find in the original post) uses a for loop, snowfall, however, uses a function that we need to export, which is simply called simFun. The function takes the sample size n, the true effect size D, as well as the effect size of the alternative hypothesis rscaleBF as arguments (note, the names are equivalent to Daniel’s, as well as the values of the variables (i.e., n = 50 etc.)).
simFun <- function(n, D, rscaleBF){ library(BayesFactor) x <- rnorm(n = n, mean = 0, sd = 1) y <- rnorm(n = n, mean = D, sd = 1) return(exp((ttestBF(x, y, rscale = rscaleBF))@bayesFactor$bf)) }
To load, initiate, and finally execute the snowfall-code, we can use the following:
# load the library library(snowfall) n <- 50; D <- 0.0; nSim <- 100000; rscaleBF <- sqrt(2)/2; threshold <- 3 # for time-keeping t0 <- Sys.time() # initiate a parallel cluster with 4 cpus sfInit(parallel = T, cpus = 4) # export the function to the clusters sfExport("simFun", "n", "D", "rscaleBF") # execute the code bf <- sfClusterApplyLB(1:nSim, function(i) simFun(n = n, D = D, rscaleBF = rscaleBF)) # stop the clusters sfStop() # print the time it took for the calculation Sys.time() - t0 # Time difference of 8.490639 mins # and finally the result sum(bf < (1/threshold))/nSim #[1] 0.68578
With Daniel’s code I got supportH0 = 0.6904, the snowfall-code almost reproduces the same number 0.68578, but reduces the time substantially.
I hope you found this post interesting/helpful, should you have any questions, you are more then welcome to ask them here, otherwise send me an e-mail.
Lastly, thanks to Daniel, who was able to find some bugs in an earlier version of the code.
Note, the earlier version stated that the code was improved by 60x (without giving a result), the numbers are now corrected.
That looks really cool, but if I try to run the code on my computer, it doesn’t work 😦 What am I missing?
LikeLike
You are right, there was a bug in the code. Thanks for pointing it out! I corrected it and it should run again.
LikeLike
Thank you for the code example. Since a long time I try to use parallel packages on R to speed up my computational time. However, I am really unsure if this really works and I do not know what I am doing wrong.
I tried your code with a “new” i7-4790CPU @3.60 GHz, 64bit system and 8 CPUs (16GB ram). However, with 4 CPUs it took 8.316368 mins and using 8 CPUs it took 5.159408 mins. Still better than 22min, but I thought I would easily beat your 1.24 min. example, but I didn´t…Why? 🙂
please tell me that your hardware is muuuuch better than mine (whatever that means). Do you have any idea, why my machine is so slow?
Oh…also, the object “bf” was not created. So in the end, there was not even a result.
cheers
LikeLike