0

I'm trying to convert a SAS program into a R one and I have stumbled at the for() loop and array part. It keeps saying in the log:

"Error in for (. in i) seq_len(NBR_LIGNES_MAX) : 4 arguments passed to 'for' which requires 3" .

Can someone show me what I'm missing? My first thought is that I'm programming too much as SAS code in R. As for the data, I have 11 variables: 1 Id, 5 status, 5 dates. The date are linked with each status.

Before creating TR_HISTO_STATUT, the df was like :

ID           status     date
125521   1            2020-01-01
125521   5            2022-05-06
125521   4            2025-10-07
999125   1            2020-02-02
999125   4            2021-11-02
888525   1            2021-03-30

I transpose the data so I would have :

ID          status1 status2 status3  date1             date2             date3

125521  1          5           4            2020-01-01   2022-05-06   2025-10-07
999125  1          4           .            2020-02-02   2021-11-02   .  
888525  1          .            .           2021-03-30    .                    .

Globally, I want the last status and the status before that with the corresponding dates (date1, ..., date5) by comparing those dates with DT_DEB and or DT_FIN.

Ex : For ID = 125521, DT_DEB is after date2 (2025-04-30 vs 2022-05-06). I would have the following result for that person:

ID          status1 status2 status3  date1             date2          date3           DERN_STATUT  DERN_DT_EFF  AV_DERN_STATUT  AV_DERN_DT_EFF  
125521  1          5           4            2020-01-01  2022-05-06  2025-10-07  2022-05-06   2020-01-01  5  1

and so on for the rest of my data.

Here's an example of the data before transposing : STATUT

Here's an example of the first lines of my data : TR_HISTO_STATUT

Here's my R code:

NBR_LIGNES_MAX <- 5
DT_DEB <- as.Date("2025-04-01")
DT_FIN <- as.Date("2025-04-30")

TR_HISTO_STATUT3 <- TR_HISTO_STATUT %>%

rowwise() %>%    

    for (i in seq_len(NBR_LIGNES_MAX)) {
      for (j in seq((i + 1), (NBR_LIGNES_MAX - 1))) {
        if (!is.na(DT_EFF_vec[i]) && DT_EFF_vec[i] < DT_DEB && is.na(DT_EFF_vec[j])) {
          DERN_STATUT <- TR_HISTO_STATUT[,i]
          DERN_DT_EFF <- TR_HISTO_STATUT[,i + NBR_LIGNES_MAX]
          if (i > 1) {
            AV_DERN_STATUT <- TR_HISTO_STATUT[, i - 1]
            AV_DERN_DT_EFF <- TR_HISTO_STATUT[, i - 1 + NBR_LIGNES_MAX]
            break
          }
          code <- 1
        } 
      }
    }
    
    row$code <- code
    row$DERN_STATUT <- DERN_STATUT
    row$DERN_DT_EFF <- DERN_DT_EFF
    row$AV_DERN_STATUT <- AV_DERN_STATUT
    row$AV_DERN_DT_EFF <- AV_DERN_DT_EFF
    
    row
  }) %>%
  ungroup()

vs the equivalent SAS code:

data TR_HISTO_STATUT3;
 format DERN_DT_EFF AV_DERN_DT_EFF YYMMDD10.;
 retain CODE DERN_STATUT DERN_DT_EFF;
 set TR_HISTO_STATUT;
 
 array tvar_statut[&NBR_LIGNES_MAX]     statut1-statut&NBR_LIGNES_MAX;
 array tvar_DT_EFF[&NBR_LIGNES_MAX]     DT_EFF1-DT_EFF&NBR_LIGNES_MAX; 
 
 code = 0;
 
 do i = 1 to &NBR_LIGNES_MAX;
    do j = i+1 to &NBR_LIGNES_MAX-1;
 
        if tvar_DT_EFF[i] NE . and tvar_DT_EFF[i] < &DT_DEB and tvar_DT_EFF[j] = . then do;
           DERN_STATUT = tvar_statut[i];
           DERN_DT_EFF = tvar_DT_EFF[i];
           if i > 1 then AV_DERN_STATUT = tvar_statut[i-1];
           if i > 1 then AV_DERN_DT_EFF = tvar_DT_EFF[i-1];
           code = 1;
        end;
        
      end;
 end;
run;
11
  • 3
    You cannot pipe to for, that's where the extra argument comes from. In a pipe, the first argument defaults to what you are passing from the previous instruction on to the next one. Also, what is row? If you are counting on rowwise to create an object named row, well, it does not. Commented Oct 22 at 6:12
  • 3
    And images are not a good way for posting data (or code). See this Meta and a relevant xkcd. Can you post sample data in dput format? Please edit the question with the code you tried and with the output of dput(TR_HISTO_STATUT). Or, if the data set is too big with the output of dput(head(TR_HISTO_STATUT, 20)). And the same with STATUT. Commented Oct 22 at 6:17
  • 7
    Regardless of the error, that R code looks terribly inefficient. I think you should scrap your approach and use appropriate join operations (I'd use package data.table) with the original long-format data. Commented Oct 22 at 6:27
  • 4
    I simply don't understand what you're trying to do. (1) "... comparing those dates with DT_DEB and or DT_FIN". That's ambiguous. (a) Is it "and" or "or"? (b) What is the comparison? (2) "I want the last status and the status before that" implies you want two status/date pairs per subject. But your desired output has three dates and additional columns you haven't described. And probably more... Commented Oct 22 at 7:17
  • 1
    As the other comments note, I think you are a little confused. You seem to be combining a tidyverse approach (using the verbs ungroup() and rowwise() ) with direct row manipulation. Generally speaking we want to avoid this kind of row by row operation that is very common in SAS, but if you want to do it you can just use the loop without needing the pipes. Commented Oct 22 at 9:41

1 Answer 1

2

First, let me demonstrate how to "transpose" your data as you've shown.

I'll start with some reproducible data.

quux <- structure(list(individual = c(14418L, 14418L, 14418L, 14419L, 14419L, 14420L, 14420L, 14421L, 14421L, 14422L, 14422L, 14423L, 14423L, 14424L, 14424L, 14425L, 14425L, 14426L, 125521L, 125521L, 125521L, 999125L, 999125L, 888525L), status = c(4L, 3L, 20L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 1L, 12L, 1L, 5L, 4L, 1L, 4L, 1L), effective_on = structure(c(9093, 13215, 18628, 12324, 13525, 12324, 13525, 12324, 13525, 12324, 13525, 12324, 13525, 12324, 13525, 12324, 12704, 12324,  18262, 19118, 20368, 18294, 18933, 18716), class = "Date")), class = "data.frame", row.names = c(NA, -24L))

quux
#    individual status effective_on
# 1       14418      4   1994-11-24
# 2       14418      3   2006-03-08
# 3       14418     20   2021-01-01
# 4       14419     12   2003-09-29
# 5       14419     12   2007-01-12
# 6       14420     12   2003-09-29
# 7       14420     12   2007-01-12
# 8       14421     12   2003-09-29
# 9       14421     12   2007-01-12
# 10      14422     12   2003-09-29
# 11      14422     12   2007-01-12
# 12      14423     12   2003-09-29
# 13      14423     12   2007-01-12
# 14      14424     12   2003-09-29
# 15      14424     12   2007-01-12
# 16      14425     12   2003-09-29
# 17      14425      1   2004-10-13
# 18      14426     12   2003-09-29
# 19     125521      1   2020-01-01
# 20     125521      5   2022-05-06
# 21     125521      4   2025-10-07
# 22     999125      1   2020-02-02
# 23     999125      4   2021-11-02
# 24     888525      1   2021-03-30

Assumptions:

  • dplyr is loaded, and we'll add tidyr for pivoting/transposing
  • all effective_on "dates" are already sorted within each individual, it's trivial to fix this assumption by prepending arrange(effective_on) before the code assigning rn=
library(dplyr)
library(tidyr)
quux |>
  mutate(.by = individual, rn = row_number()) |>
  pivot_wider(id_cols=individual, names_from = rn, values_from = c(status, effective_on))
# # A tibble: 12 × 7
#    individual status_1 status_2 status_3 effective_on_1 effective_on_2 effective_on_3
#         <int>    <int>    <int>    <int> <date>         <date>         <date>        
#  1      14418        4        3       20 1994-11-24     2006-03-08     2021-01-01    
#  2      14419       12       12       NA 2003-09-29     2007-01-12     NA            
#  3      14420       12       12       NA 2003-09-29     2007-01-12     NA            
#  4      14421       12       12       NA 2003-09-29     2007-01-12     NA            
#  5      14422       12       12       NA 2003-09-29     2007-01-12     NA            
#  6      14423       12       12       NA 2003-09-29     2007-01-12     NA            
#  7      14424       12       12       NA 2003-09-29     2007-01-12     NA            
#  8      14425       12        1       NA 2003-09-29     2004-10-13     NA            
#  9      14426       12       NA       NA 2003-09-29     NA             NA            
# 10     125521        1        5        4 2020-01-01     2022-05-06     2025-10-07    
# 11     999125        1        4       NA 2020-02-02     2021-11-02     NA            
# 12     888525        1       NA       NA 2021-03-30     NA             NA            

In order to force this out to 5-wide, I'll add a fake individual with 5 rows of data, filter all other individuals to be no more than 5 observations, and then remove the fake individual after the pivot. I'm using -1 for the fake individual, you will need to make sure whatever you use here is guaranteed to be not present in the real data.

quux1 <- quux |>
  dplyr::rows_append(tibble(individual=-1L, effective_on=Sys.Date()+1:5)) |>
  mutate(.by = individual, rn = row_number()) |>
  filter(rn <= 5) |>
  pivot_wider(id_cols=individual, names_from = rn, values_from = c(status, effective_on)) |>
  filter(individual != -1L)
quux1
# # A tibble: 12 × 11
#    individual status_1 status_2 status_3 status_4 status_5 effective_on_1 effective_on_2 effective_on_3 effective_on_4 effective_on_5
#         <int>    <int>    <int>    <int>    <int>    <int> <date>         <date>         <date>         <date>         <date>        
#  1      14418        4        3       20       NA       NA 1994-11-24     2006-03-08     2021-01-01     NA             NA            
#  2      14419       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA            
#  3      14420       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA            
#  4      14421       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA            
#  5      14422       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA            
#  6      14423       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA            
#  7      14424       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA            
#  8      14425       12        1       NA       NA       NA 2003-09-29     2004-10-13     NA             NA             NA            
#  9      14426       12       NA       NA       NA       NA 2003-09-29     NA             NA             NA             NA            
# 10     125521        1        5        4       NA       NA 2020-01-01     2022-05-06     2025-10-07     NA             NA            
# 11     999125        1        4       NA       NA       NA 2020-02-02     2021-11-02     NA             NA             NA            
# 12     888525        1       NA       NA       NA       NA 2021-03-30     NA             NA             NA             NA            

You mentioned:

comparing those dates with DT_DEB and or DT_FIN

I don't know how DT_FIN factors into this, but I'll add your logic for DT_DEB by first determining the two relevant dates/statuses, and then joining that on to the above.

quux2 <- quux |>
  filter(effective_on < DT_DEB) |>
  slice_max(by = individual, order_by = effective_on, n = 2) |>
  arrange(desc(effective_on)) |>
  reframe(.by = individual, DERN_STATUT = effective_on[1], DERN_DT_EFF = effective_on[2], AV_DERN_STATUT = status[1], AV_DERN_DT_EFF = status[2])
quux2
#    individual DERN_STATUT DERN_DT_EFF AV_DERN_STATUT AV_DERN_DT_EFF
# 1      125521  2022-05-06  2020-01-01              5              1
# 2      999125  2021-11-02  2020-02-02              4              1
# 3      888525  2021-03-30        <NA>              1             NA
# 4       14418  2021-01-01  2006-03-08             20              3
# 5       14419  2007-01-12  2003-09-29             12             12
# 6       14420  2007-01-12  2003-09-29             12             12
# 7       14421  2007-01-12  2003-09-29             12             12
# 8       14422  2007-01-12  2003-09-29             12             12
# 9       14423  2007-01-12  2003-09-29             12             12
# 10      14424  2007-01-12  2003-09-29             12             12
# 11      14425  2004-10-13  2003-09-29              1             12
# 12      14426  2003-09-29        <NA>             12             NA

with the final join revealing:

left_join(quux1, quux2, join_by(individual))
# # A tibble: 12 × 15
#    individual status_1 status_2 status_3 status_4 status_5 effective_on_1 effective_on_2 effective_on_3 effective_on_4 effective_on_5 DERN_STATUT DERN_DT_EFF AV_DERN_STATUT AV_DERN_DT_EFF
#         <int>    <int>    <int>    <int>    <int>    <int> <date>         <date>         <date>         <date>         <date>         <date>      <date>               <int>          <int>
#  1      14418        4        3       20       NA       NA 1994-11-24     2006-03-08     2021-01-01     NA             NA             2021-01-01  2006-03-08              20              3
#  2      14419       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA             2007-01-12  2003-09-29              12             12
#  3      14420       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA             2007-01-12  2003-09-29              12             12
#  4      14421       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA             2007-01-12  2003-09-29              12             12
#  5      14422       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA             2007-01-12  2003-09-29              12             12
#  6      14423       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA             2007-01-12  2003-09-29              12             12
#  7      14424       12       12       NA       NA       NA 2003-09-29     2007-01-12     NA             NA             NA             2007-01-12  2003-09-29              12             12
#  8      14425       12        1       NA       NA       NA 2003-09-29     2004-10-13     NA             NA             NA             2004-10-13  2003-09-29               1             12
#  9      14426       12       NA       NA       NA       NA 2003-09-29     NA             NA             NA             NA             2003-09-29  NA                      12             NA
# 10     125521        1        5        4       NA       NA 2020-01-01     2022-05-06     2025-10-07     NA             NA             2022-05-06  2020-01-01               5              1
# 11     999125        1        4       NA       NA       NA 2020-02-02     2021-11-02     NA             NA             NA             2021-11-02  2020-02-02               4              1
# 12     888525        1       NA       NA       NA       NA 2021-03-30     NA             NA             NA             NA             2021-03-30  NA                       1             NA

BTW, I'm using |> as a pipe, you are free to use %>% instead, it requires no changes here. (For discussion of the differences, see What are the differences between R's native pipe `|>` and the magrittr pipe `%>%`?)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.