I am running a loop through all the genes present in my datasets and creating a matrix. The matrix columns is 49 and rows will depend upon rsid. I want to create one big matrix in which all the genes information is stored. I am trying to use rbind option in R but it gives me only the last gene value.
for(k in 1:3) { ##for gene
z <- 0
for(i in 1:49){ #i=tissue
y <- list.dfs[[i]]$gene %in% genes_union[k] ##y stores the gene
x <- list.dfs[[i]][y,]$rsid ## store the snp for the gene for a particular tissue
z <<- unique(append(z,x))} ##get only unique SNP for that gene and remove duplicates(SNP can be shared among tissues and will create duplicate value)
z <- z[2:length(z)]
models_df_gene1 <- matrix(,nrow=length(z),ncol=49)
colnames(models_df_gene1) <- col_names
rownames(models_df_gene1) <- z
for(t in 1:49){
u <- which(list.dfs[[t]]$gene %in% genes_union[k])
a <- list.dfs[[t]][u,]
for(j in 1:length(z)){
b <- which(z[1:length(z)] %in% a$rsid[j])
models_df_gene1[b,t] <- a$weight[j]
models_df_gene1 <- replace(models_df_gene1, is.na(models_df_gene1), 0)}
}
file_chr22 <- rbind(models_df_gene1)
#write.table(models_df_gene1,file=paste0(genes_union[k],".txt"),sep="\t",row.names=T,col.names=T)
}
Input data:
genes_union:
str(genes_union)
chr [1:21650] "ENSG00000261456.5" "ENSG00000151240.16" ...
list.dfs[[1]]:
str(list.dfs[[1]])
'data.frame': 249965 obs. of 6 variables:
$ gene : chr "ENSG00000261456.5" "ENSG00000261456.5" "ENSG00000261456.5" "ENSG00000261456.5" ...
$ rsid : chr "rs11252127" "rs11252546" "rs11591988" "rs4495823"
$ varID : chr "chr10_52147_C_T_b38" "chr10_58487_T_C_b38" "chr10_80130_C_T_b38" "chr10_97603_G_A_b38"
$ ref_allele: chr "C" "T" "C" "G"
$ eff_allele: chr "T" "C" "T" "A"
$ weight : num 0.0523 -0.0335 0.0143 -0.0308 0.013
Does anyone know how to keep adding on to the matrix when its column length is same ?
file_chr22 <- rbind(models_df_gene1)this line is just replacing the variablefile_chr22each time. If you are going to loop like this(bad idea), you need to passrbindthe current data, and the new rows.str()output is informative but not useful. Of course you can'tdput()your whole data set, pleasedput()a small sample that illustrates the problem, for exampledput(genes_union[1:5])for the first 5 entries of that string, anddput(list.dfs[[1]][1:5, ])for the first 5 rows of the first data frame. If you can't provide us with a small sample of something that we can work with it will be very very hard to help.