Parallel Processing in R

parallel
Author
Published

Wednesday, June 5, 2024

Modified

Friday, February 28, 2025

Parallel processing is a powerful technique to speed up computations by utilizing multiple CPU cores simultaneously. In R, several functions and packages enable parallel processing, making it easier to handle large datasets and complex calculations efficiently. This blog post will introduce you to some of these key functions, such as mclapply, parLapply, and parSapply, and demonstrate how to use them in your R scripts.

library(parallel)
library(lme4)
Loading required package: Matrix
### Check the number of cores
detectCores()
[1] 8

mclapply

### mclapply works on unix system, it will call lapply in windows
f <- function(i) {
  lmer(Petal.Width ~ . - Species + (1 | Species), data = iris)
}

system.time(save1 <- lapply(1:100, f))
   user  system elapsed 
  0.618   0.005   0.628 
system.time(save2 <- mclapply(1:100, f))
   user  system elapsed 
  0.033   0.031   0.457 

parlapply

### Works on windows, but slower than mclapply
numCores <- detectCores()

### Starting a cluster
cl <- makeCluster(numCores)
parSapply(cl, Orange, mean, na.rm = TRUE)
         Tree           age circumference 
           NA      922.1429      115.8571 
### Close the cluster, best practise
stopCluster(cl)
### lapply
system.time({save1 <- lapply(1:100, f)})
   user  system elapsed 
  0.645   0.011   0.691 
### mclapply
system.time({save2 <- mclapply(1:100, f)})
   user  system elapsed 
  0.031   0.033   0.459 
###
system.time(
    {
        cl <- makeCluster(detectCores())
        clusterEvalQ(cl, library(lme4))
        save3 <- parLapply(cl, 1:100, f)
        stopCluster(cl)
    }
)
   user  system elapsed 
  0.115   0.017   1.215 

Reference