I'm trying to convert a SAS program into a R one and I have stumbled at the for() loop and array part. It keeps saying in the log:
"Error in for (. in i) seq_len(NBR_LIGNES_MAX) : 4 arguments passed to 'for' which requires 3" .
Can someone show me what I'm missing? My first thought is that I'm programming too much as SAS code in R. As for the data, I have 11 variables: 1 Id, 5 status, 5 dates. The date are linked with each status.
Before creating TR_HISTO_STATUT, the df was like :
ID status date
125521 1 2020-01-01
125521 5 2022-05-06
125521 4 2025-10-07
999125 1 2020-02-02
999125 4 2021-11-02
888525 1 2021-03-30
I transpose the data so I would have :
ID status1 status2 status3 date1 date2 date3
125521 1 5 4 2020-01-01 2022-05-06 2025-10-07
999125 1 4 . 2020-02-02 2021-11-02 .
888525 1 . . 2021-03-30 . .
Globally, I want the last status and the status before that with the corresponding dates (date1, ..., date5) by comparing those dates with DT_DEB and or DT_FIN.
Ex : For ID = 125521, DT_DEB is after date2 (2025-04-30 vs 2022-05-06). I would have the following result for that person:
ID status1 status2 status3 date1 date2 date3 DERN_STATUT DERN_DT_EFF AV_DERN_STATUT AV_DERN_DT_EFF
125521 1 5 4 2020-01-01 2022-05-06 2025-10-07 2022-05-06 2020-01-01 5 1
and so on for the rest of my data.
Here's an example of the data before transposing : STATUT
Here's an example of the first lines of my data : TR_HISTO_STATUT
Here's my R code:
NBR_LIGNES_MAX <- 5
DT_DEB <- as.Date("2025-04-01")
DT_FIN <- as.Date("2025-04-30")
TR_HISTO_STATUT3 <- TR_HISTO_STATUT %>%
rowwise() %>%
for (i in seq_len(NBR_LIGNES_MAX)) {
for (j in seq((i + 1), (NBR_LIGNES_MAX - 1))) {
if (!is.na(DT_EFF_vec[i]) && DT_EFF_vec[i] < DT_DEB && is.na(DT_EFF_vec[j])) {
DERN_STATUT <- TR_HISTO_STATUT[,i]
DERN_DT_EFF <- TR_HISTO_STATUT[,i + NBR_LIGNES_MAX]
if (i > 1) {
AV_DERN_STATUT <- TR_HISTO_STATUT[, i - 1]
AV_DERN_DT_EFF <- TR_HISTO_STATUT[, i - 1 + NBR_LIGNES_MAX]
break
}
code <- 1
}
}
}
row$code <- code
row$DERN_STATUT <- DERN_STATUT
row$DERN_DT_EFF <- DERN_DT_EFF
row$AV_DERN_STATUT <- AV_DERN_STATUT
row$AV_DERN_DT_EFF <- AV_DERN_DT_EFF
row
}) %>%
ungroup()
vs the equivalent SAS code:
data TR_HISTO_STATUT3;
format DERN_DT_EFF AV_DERN_DT_EFF YYMMDD10.;
retain CODE DERN_STATUT DERN_DT_EFF;
set TR_HISTO_STATUT;
array tvar_statut[&NBR_LIGNES_MAX] statut1-statut&NBR_LIGNES_MAX;
array tvar_DT_EFF[&NBR_LIGNES_MAX] DT_EFF1-DT_EFF&NBR_LIGNES_MAX;
code = 0;
do i = 1 to &NBR_LIGNES_MAX;
do j = i+1 to &NBR_LIGNES_MAX-1;
if tvar_DT_EFF[i] NE . and tvar_DT_EFF[i] < &DT_DEB and tvar_DT_EFF[j] = . then do;
DERN_STATUT = tvar_statut[i];
DERN_DT_EFF = tvar_DT_EFF[i];
if i > 1 then AV_DERN_STATUT = tvar_statut[i-1];
if i > 1 then AV_DERN_DT_EFF = tvar_DT_EFF[i-1];
code = 1;
end;
end;
end;
run;
for, that's where the extra argument comes from. In a pipe, the first argument defaults to what you are passing from the previous instruction on to the next one. Also, what isrow? If you are counting onrowwiseto create an object namedrow, well, it does not.dputformat? Please edit the question with the code you tried and with the output ofdput(TR_HISTO_STATUT). Or, if the data set is too big with the output ofdput(head(TR_HISTO_STATUT, 20)). And the same withSTATUT.