Unexpected R memory management behaviour

Question

I spent the past few day hunting what I thought was a memory leak somewhere in an R program I wrote. It turned out it was caused by some R features I don't really grasp. My hunch is that it has to do with promises and lazy evaluation. Here is an example to reproduce the problem:

M <- matrix(rnorm(1E7), 1000)
format(object.size(M), "Mb") ## An 80 Mbs matrix
gc() ## Memory usage should be around 80 Mbs
LF <- apply(M, 1, function(X) {sdX <- sd(X); function(X) X / sdX})
format(object.size(LF), "Mb") ## 2.9 Mb (isn't it a lot for a few functions? but it's not really the point)
gc() ## Memory usage is at 158 Mbs event though our workspace only contains two objects of 80 and 2.9 Mbs
rm(M)
gc() ## Back to around 80 Mbs but M is gone
rm(LF)
gc() ## Back to normal

You can see that memory usage will grow out of hands if we repeat the operation too often. It seems that R needs to store the entire matrix to be able to call the functions in LF. Any insights on what happens when we create the functions in LF? A workaround?

You are creating closures there. Check out ls(envir = environment(LF[[1]])). R has to associate these objects with each function, otherwise they wouldn't work if you removed M from the global environment. My advice: Don't do this. If you described why you are doing this, we could suggest alternatives. — Roland
– Roland, Commented Feb 24, 2018 at 16:09
The goal is to create functions that transforms a set of variable. Each column of the matrix is a different variable in the example. Note that this is a simplified version of what I do in my program. I understand that R has to save sdX somewhere but why does it store the entire column? It could just store the value of sdX that is small — eaglefreeman
– eaglefreeman, Commented Feb 24, 2018 at 17:52

Roland · Accepted Answer · 2018-02-26 07:38:15Z

1

The enclosing environment of the function you return is the local environment of the function passed to apply. Obviously, the function parameters must be stored in this environment. Usually this environment is lost after the call, but you preserve it because you return a closure. You can delete undesired objects:

LF <- apply(M, 1, function(X) {sdX <- sd(X); rm("X"); function(X) X / sdX})
ls(envir = environment(LF[[1]]))
#[1] "sdX"

However, I still don't see a reason for using closures and recommend redesigning your whole approach. E.g., in this specific example I'd return the standard deviations and pass them as a parameter to the transforming function.

answered Feb 26, 2018 at 7:38

Roland

134k12 gold badges203 silver badges305 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Unexpected R memory management behaviour

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related