0

I am running a loop through all the genes present in my datasets and creating a matrix. The matrix columns is 49 and rows will depend upon rsid. I want to create one big matrix in which all the genes information is stored. I am trying to use rbind option in R but it gives me only the last gene value.

for(k in 1:3) {  ##for gene
     z <- 0
     for(i in 1:49){  #i=tissue                                                      
         y <- list.dfs[[i]]$gene %in% genes_union[k] ##y stores the gene
         x <- list.dfs[[i]][y,]$rsid ## store the snp for the gene for a particular tissue
         z <<- unique(append(z,x))}  ##get only unique SNP for that gene and remove duplicates(SNP can be shared among tissues and will create duplicate value)
     z <- z[2:length(z)]
     models_df_gene1 <- matrix(,nrow=length(z),ncol=49)
 colnames(models_df_gene1) <- col_names
 rownames(models_df_gene1) <- z
 for(t in 1:49){
     u <- which(list.dfs[[t]]$gene %in% genes_union[k])
     a <- list.dfs[[t]][u,]
     for(j in 1:length(z)){
         b <- which(z[1:length(z)] %in% a$rsid[j])
         models_df_gene1[b,t] <- a$weight[j]
         models_df_gene1 <- replace(models_df_gene1, is.na(models_df_gene1), 0)}
}
file_chr22 <- rbind(models_df_gene1)
#write.table(models_df_gene1,file=paste0(genes_union[k],".txt"),sep="\t",row.names=T,col.names=T)
}


Input data:
genes_union:
str(genes_union)
 chr [1:21650] "ENSG00000261456.5" "ENSG00000151240.16" ...
list.dfs[[1]]:
str(list.dfs[[1]])
'data.frame':   249965 obs. of  6 variables:
 $ gene      : chr  "ENSG00000261456.5" "ENSG00000261456.5" "ENSG00000261456.5" "ENSG00000261456.5" ...
 $ rsid      : chr  "rs11252127" "rs11252546" "rs11591988" "rs4495823" 
 $ varID     : chr  "chr10_52147_C_T_b38" "chr10_58487_T_C_b38" "chr10_80130_C_T_b38" "chr10_97603_G_A_b38" 
 $ ref_allele: chr  "C" "T" "C" "G" 
 $ eff_allele: chr  "T" "C" "T" "A" 
 $ weight    : num  0.0523 -0.0335 0.0143 -0.0308 0.013 

Does anyone know how to keep adding on to the matrix when its column length is same ?

5
  • Adding rows one at a time in a loop is one of the worst things you can do for performance. If you share a little bit of sample input and show the desired output - just a few rows to demonstrate - we can probably help you find a better way. But it's quite hard to debug code that we can't run because we don't have sample input. Commented Mar 8, 2023 at 20:34
  • I added the input data. I cannot use dput since my data is really big. Commented Mar 8, 2023 at 20:47
  • file_chr22 <- rbind(models_df_gene1) this line is just replacing the variable file_chr22 each time. If you are going to loop like this(bad idea), you need to pass rbind the current data, and the new rows. Commented Mar 8, 2023 at 20:54
  • is there any other method instead of rbind in loop to add the values? Commented Mar 8, 2023 at 21:08
  • The str() output is informative but not useful. Of course you can't dput() your whole data set, please dput() a small sample that illustrates the problem, for example dput(genes_union[1:5]) for the first 5 entries of that string, and dput(list.dfs[[1]][1:5, ]) for the first 5 rows of the first data frame. If you can't provide us with a small sample of something that we can work with it will be very very hard to help. Commented Mar 9, 2023 at 13:51

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.