I spent the past few day hunting what I thought was a memory leak somewhere in an R program I wrote. It turned out it was caused by some R features I don't really grasp. My hunch is that it has to do with promises and lazy evaluation. Here is an example to reproduce the problem:
M <- matrix(rnorm(1E7), 1000)
format(object.size(M), "Mb") ## An 80 Mbs matrix
gc() ## Memory usage should be around 80 Mbs
LF <- apply(M, 1, function(X) {sdX <- sd(X); function(X) X / sdX})
format(object.size(LF), "Mb") ## 2.9 Mb (isn't it a lot for a few functions? but it's not really the point)
gc() ## Memory usage is at 158 Mbs event though our workspace only contains two objects of 80 and 2.9 Mbs
rm(M)
gc() ## Back to around 80 Mbs but M is gone
rm(LF)
gc() ## Back to normal
You can see that memory usage will grow out of hands if we repeat the operation too often.
It seems that R needs to store the entire matrix to be able to call the functions in LF. Any insights on what happens when we create the functions in LF? A workaround?
ls(envir = environment(LF[[1]])). R has to associate these objects with each function, otherwise they wouldn't work if you removedMfrom the global environment. My advice: Don't do this. If you described why you are doing this, we could suggest alternatives.sdXsomewhere but why does it store the entire column? It could just store the value ofsdXthat is small