3

I spent the past few day hunting what I thought was a memory leak somewhere in an R program I wrote. It turned out it was caused by some R features I don't really grasp. My hunch is that it has to do with promises and lazy evaluation. Here is an example to reproduce the problem:

M <- matrix(rnorm(1E7), 1000)
format(object.size(M), "Mb") ## An 80 Mbs matrix
gc() ## Memory usage should be around 80 Mbs
LF <- apply(M, 1, function(X) {sdX <- sd(X); function(X) X / sdX})
format(object.size(LF), "Mb") ## 2.9 Mb (isn't it a lot for a few functions? but it's not really the point)
gc() ## Memory usage is at 158 Mbs event though our workspace only contains two objects of 80 and 2.9 Mbs
rm(M)
gc() ## Back to around 80 Mbs but M is gone
rm(LF)
gc() ## Back to normal

You can see that memory usage will grow out of hands if we repeat the operation too often. It seems that R needs to store the entire matrix to be able to call the functions in LF. Any insights on what happens when we create the functions in LF? A workaround?

2
  • 2
    You are creating closures there. Check out ls(envir = environment(LF[[1]])). R has to associate these objects with each function, otherwise they wouldn't work if you removed M from the global environment. My advice: Don't do this. If you described why you are doing this, we could suggest alternatives. Commented Feb 24, 2018 at 16:09
  • The goal is to create functions that transforms a set of variable. Each column of the matrix is a different variable in the example. Note that this is a simplified version of what I do in my program. I understand that R has to save sdX somewhere but why does it store the entire column? It could just store the value of sdX that is small Commented Feb 24, 2018 at 17:52

1 Answer 1

1

The enclosing environment of the function you return is the local environment of the function passed to apply. Obviously, the function parameters must be stored in this environment. Usually this environment is lost after the call, but you preserve it because you return a closure. You can delete undesired objects:

LF <- apply(M, 1, function(X) {sdX <- sd(X); rm("X"); function(X) X / sdX})
ls(envir = environment(LF[[1]]))
#[1] "sdX"

However, I still don't see a reason for using closures and recommend redesigning your whole approach. E.g., in this specific example I'd return the standard deviations and pass them as a parameter to the transforming function.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.