1

I have blood biochemistry markers measured in 28 subjects, where samples were collected every month.

I can run a simple modG with gamm and get the acf and pacf plots.

  modG <- gamm(Met1 ~ s(Timepoint, k=7), data=df, method="REML")

I run into problems when trying to run a gamm for a mod GI (gam works fine). I would like to check acf and pacf plots for this and correct for potentially AR or MA greater than order one (bam only allows AR1 correction).

 modGI <- gamm(Met1 ~ s(Timepoint, k=7) + s(Timepoint, by=subject, k=7),
         random(list(subject=~1)), data=df, method="REML")

Error in MEestimate(lmeSt, grps) :
Singularity in backsolve at level 0, block 1

What does this error mean, and how can I overcome it? A similar issue was observed here https://stats.stackexchange.com/questions/197726/different-interactions-specifications-with-gamm, but its not clear how to resolve it.

Edit: Output from draw(modGI) using the bs="sz". The different colours correspond to the different subjects (ID_new).

enter image description here

enter image description here

1 Answer 1

1

A singular matrix is one with a determinant of 0 and is not invertible. The operation that raised the error is likely involved with doing that inversion in a numerically accurate way (via a Cholesky decomposition).

In practical terms, what this likely means is that you have a model matrix (or some statistical transformation downstream of that matrix created during model fitting) that is rank deficient or numerically close to being rank deficient that as far as the computer is concerned it is.

This likely stems from that fact that you have multiple smooths of Timepoint:

  1. the s(Timepoint, k=7), and
  2. the set of smooths s(Timepoint, by=subject, k=7)

As there is nothing to stop the smooth from (1) looking like some or all of the smooths from (2), you are likely including concurved terms in the model. Concurvity is that non-linear equivalent of colinearity - effectively you are including the same "effect" in the model multiple times. We call this an "identifiability" issue, but it can also show up as problems elsewhere as rank deficiency.

What we suggest in the HGAM paper from where you are taking the model names of G and GI is to make the model identifiable by changing the focus of the penalty on the factor by smooths to penalise any deviation from a flat, constant function rather than from a linear function. We do that with the thin plate splines by setting the penalty to be order 1 with m = 1:

modGI <- gamm(Met1 ~ s(Timepoint, k=7) +
    s(Timepoint, by=subject, k=7, m = 1),
  random(list(subject=~1)), data=df, method="REML")

As 28 subjects isn't that large, if this the model you want to fit, I would actually suggest you moving back to the gam() function and make use of a new basis in {mgcv} that wasn't available at the time we wrote the HGAM paper, namely the "sz" basis:

modGI <- gam(Met1 ~ s(Timepoint, k = 7) +
    s(Timepoint, subject, bs = "sz", k = 7),
  data = df, method = "REML")

where now the factor-smooth interaction is, by it's construction, orthogonal (uncorrelated) with the main effect s(Timepoint).

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you very much for this – very helpful! I tried the modGI with the m=1, and found that the graphs for each subject were very piecewise linear (compared to when not including m=1), which I think might be due to the too few time points perhaps? When using the bs=”sz” instead, I didn’t realise that all subjects would be plotted on one plot (like with bs=”fs”). From the summary and the plots, I can see that the common trend s(Timepoint) isn’t that non-linear, whilst s(Timepoint, ID_new) is significant so from my understanding some subjects do have notable differences from the common trend....
But it’s hard to tell which ones, unless I devise plots that show each subject for this model, but this will only be visual evidence. As a last question, as long as I’m checking concurvity, would a model with bam(Met1 ~ s(Timepoint, k=7) + s(Timepoint, by=subject, k=7) + s(subject, bs="fs"), data=df, rho=r1, AR.start = df$start.event, method=”fREML”) be acceptable to use? Guidance for rho and AR.start from cran.r-project.org/web/packages/itsadug/vignettes/acf.html
FYI, I had a typo in the last model using the sz basis. You should fit it using gam(). I haven't looked to see if this new basis works with gamm().
As for the piecewise linear fits, yes, that's to be expected: they'll be smoother once you combine them with global smooth. If you don't like the sz plot, it is best to visualize the fits for each group. To do this create a data_slice over Timepoint and subject, use fitted_values() and then plot as you want.
An fs smooth needs two covariates; i) a continuous variable, and ii) a factor. If you fix that (i think you meant bs = "re"), that model suffers same problem of lack of identifiability because you don't have m=1 on the by smooths
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.