How can I improve the performance of this R code?

Question

Welcome To Ask or Share your Answers For Others

How can I improve the performance of this R code?

asked Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

How can I improve the performance of this R code?

I'm having trouble with the performance of this code. I have to loop through 2225 columns and calculate a betti_number. I'm wondering if there is some way to speed this up. I'd like a list of betti numbers at the end so I can add it to an excel file.

library(nonlinearTseries)
library(TDAstats)
library(tidyverse)
library(dplyr)
library(readxl)
library(tm) #NLP

betti_num_list <- list()
for (i in 1:ncol(wordvec_df)){
vec <- zoo::na.trim(wordvec_df[,i], is.na = "all") #Removes NA's from bottom of vector
text_vector_matrix <- data.matrix(vec)
tak <- buildTakens(text_vector_matrix,2,3)
hom <- calculate_homology(tak,return_df = TRUE) 
hom <- hom %>%
  mutate(persistence = death-birth) %>%
  mutate(persistent = ifelse(persistence > 0.1, 1,0))
hom_matrix <- tibble(hom) %>% select(dimension, persistent)
betti_num <- sum(hom$persistent == 1 & hom$dimension == 1)
betti_num_list <- append(betti_num_list, betti_num)
}

Here's some dummy code to try it with

V1 <- c(1,0,0,0,0,1,2,NA)
V2 <- c(2,1,1,0,0,1,NA,NA)
V3 <- c(1,2,1,0,0,NA,NA,NA)

wordvec_df <- data.frame(V1,V2,V3)

There are varying numbers of NA's at the bottom of each column because they're not all the same length.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-19T03:47:11+0000

You'll have to check if the betti numbers make sense, I couldn't get your code to work 100%. For the same reason I also haven't done any benchmarks, but I expect it to give some improvement. The main issues I see with your code are a number of unnecessary steps (mainly remappings) and use of inefficient data structures (use vector, matrix and array when possible).

library(TDAstats)
library(nonlinearTseries)

bettinum <- function(vec, embedding.dim=2, time.lag=3) {
    hom <- calculate_homology(buildTakens(vec, embedding.dim, time.lag))
    sum((hom[, "death"] - hom[, "birth"] > 0.1) & hom[, "dimension"] == 1) 
}

set.seed(1)
vec_list <- lapply(8:20, function(x) sample(0:5, x, rep=TRUE))

betti <- sapply(vec_list, bettinum)
betti
#  [1] 0 1 0 0 0 1 0 1 1 2 0 1 1

Categories

How can I improve the performance of this R code?

How can I improve the performance of this R code?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags