Opening image files in R in parallel

J S Source

What I'm trying to do: Open a stack of images using EBImage, process them, and save the processed image in a new file. I am attempting this in parallel using the package "doParallel" and "foreach".

The problem: Any time I use more than one processor core for the task, R returns the error:

Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

I do not know how to get any more information on this error. If I try to use the same script but with only one processor core, I don't get any issue.

Sample script:

library(EBImage)
library(foreach)
library(doParallel)

nCores = 1
registerDoParallel(makeCluster(nCores))

img_stack_ids = c("A", "B", "C", "D")
foreach(i = 1:384, .packages = c("EBImage")) %dopar% {
  imgs = tryCatch(readImage(sprintf("/INPUT_IMGS/%s_%s, i, img_stack_ids)), 
                  error = function(e) array(0, dim = c(0,0,0)))

  img_processed = processingFunction(img_list)
  writeImage(img_processed, sprintf("/OUTPUT_IMGS/%s", i))
}

The code works when nCores = 1, it does not when nCores is anything between 1 and the maximum number of cores available.

The system I want this to run on is a virtual machine with 36 cores running CentOS 7.

The individual workers SHOULD be accessing unique files based on the file ID so I can't image that it's an issue with file locking or simultaneous reading, unless linux has issues with simultaneous reading and writing to the same directory as well.

I'd honestly be happy for a workaround as well as a solution.

Thank you!


My session info: R version 3.3.1 (2016-06-21) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
    base     

other attached packages:
[1] doParallel_1.0.10  iterators_1.0.8    foreach_1.4.3
    ZProjection_0.99.0 EBImage_4.16.0    

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1      lattice_0.20-34     codetools_0.2-15
     png_0.1-7           fftwtools_0.9-7     tiff_0.1-5
     grid_3.3.1          tools_3.3.1         jpeg_0.1-8
     abind_1.4-5        
[11] rsconnect_0.5       BiocGenerics_0.20.0
rparallel-processingdoparallel

Answers

answered 1 year ago aoles #1

Below a reproducible example based on your original code. I was able to successfully run it in parallel on both RedHat Linux (Fedora and CentOS 6.5) and OS X Yosemite (10.10.5). This indicates that your issue might be system- or configuration-specific.

library(EBImage)
library(foreach)
library(doParallel)

nCores = detectCores()
registerDoParallel(makeCluster(nCores))

input_dir = "input_imgs"
output_dir = "output_imgs"

dir.create(input_dir)
dir.create(output_dir)

no_images = 384
img_stack_ids = LETTERS[1:4]

## create sample images
n = 8 # image dimensions

for (i in 1:no_images)
  for (id in img_stack_ids)
    writeImage(Image(runif(n^2), c(n, n)),
               sprintf("%s/%s_%s.png", input_dir, i, id))

## do the actual work
foreach(i = 1:no_images, .packages = c("EBImage")) %dopar% {
  imgs = tryCatch(
    readImage(sprintf("%s/%s_%s.png", input_dir, i, img_stack_ids)), 
    error = function(e) array(0, dim = c(0,0,0))
  )

  ## do the processing

  writeImage(imgs, sprintf("output_imgs/%s.tif", i))
}

comments powered by Disqus