I have many sequencing experiments each with multiple results for each of a few hundred genes, when the data is outputted from another programme it isn't in a useful format for me as all the Experiments and each result are listed along the top and there is one row for each gene. I have written an example data set and how I am currently solving this problem as an example but I would like a more optimal method as my data sets are very large.
col1<- c("","", "gene1", "gene2", "gene3", "gene4")
col2<- c("Experiment1", "Part 1", "a","b","c","d")
col3<- c("Experiment1", "Part 2", "e", "f", "g", "h")
col4<- c("Experiment2", "Part 1", "i", "j", "k", "l")
col5<- c("Experiment2", "Part 2", "m", "n", "o", "p")
pp<- data.frame(col1,col2,col3,col4,col5)
one<-data.frame(pp$col1, pp$col2)
onetwo<- data.frame(pp$col1,pp$col3)
two<-data.frame(pp$col1, pp$col4)
twotwo<-data.frame(pp$col1,pp$col5)
one$V3[3:6]<-as.character(one[2,2])
one<-one[-2,]
one<-one[-1,]
colnames(one)<- c("gene", "Experiment 1", "part")
onetwo$V3[3:6]<-as.character(onetwo[2,2])
onetwo<-onetwo[-2,]
onetwo<-onetwo[-1,]
colnames(onetwo)<- c("gene", "Experiment 1", "part")
x1<-rbind(one, onetwo)
two$V3[3:6]<-as.character(two[2,2])
two<-two[-2,]
two<-two[-1,]
colnames(two)<- c("gene", "Experiment 2", "part")
twotwo$V3[3:6]<-as.character(twotwo[2,2])
twotwo<-twotwo[-2,]
twotwo<-twotwo[-1,]
colnames(twotwo)<- c("gene", "Experiment 2", "part")
x2<-rbind(two, twotwo)
x3<-merge(x1,x2)
I apologise for the large amount of code but I am unable to verbalise this operation specifically. pp is the example data frame and x3 is the format I require. Is there a better way to do this?
See Question&Answers more detail:
os