Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
338 views
in Technique[技术] by (71.8m points)

dataframe - R - Speeding up combination between for loop and paste/paste0

I am handling a data frame 'df' that have millions of rows and four columns (i.e., Chromosome, Position, Allele1, Allele2). Now I am wanting to concatenate characters in these columns into one separate vector 'cc'. This is my first try:

myfunc = function(CHR) {
    chr = subset(df, df$Chromosome == CHR)
    cc = data.frame(No=seq.int(nrow(chr)), pos_al1_al2=NA)
    for (i in 1: nrow(chr)) {
        cc$pos_al1_al2[i] = paste(CHR, chr$Position[i], ".", chr$Allele1[i], chr$Allele2[i])
        cc = cc[, -1] # remove the column 'No'
    }
} 

# Run my code 
myfunc(7)

where CHR is the number of chromosome of my interest I will input to the function (e.g., 1,2,3,..., or 22). Of course, CHR must be in a range of from 1 to 22 as in the column Chromosome of the 'df'.

My idea is that: I first created an empty vector called cc whose the number of rows are the same as the data.frame 'df'.

Now I created a new column in the cc called pos_al1_al2 whose each row includes characters as you can see in the function.

The computation time is very slow. I guess It comes from the for loop but I do have no idea to optimize my function.

Any help is appreciated! Thanks in advance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Is there any reason why you can't use paste() in vectorized mode:

myfunc <- function(CHR) {
    chr <- subset(df, df$Chromosome == CHR)
    cc <- data.frame(No = seq.int(nrow(chr)), pos_al1_al2=NA)
    cc$pos_al1_al2 <- paste(CHR, chr$Position, ".", chr$Allele1, chr$Allele2)
    cc = cc[, -1] # remove the column 'No'
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...