I'm having trouble with the performance of this code. I have to loop through 2225 columns and calculate a betti_number. I'm wondering if there is some way to speed this up. I'd like a list of betti numbers at the end so I can add it to an excel file.
library(nonlinearTseries)
library(TDAstats)
library(tidyverse)
library(dplyr)
library(readxl)
library(tm) #NLP
betti_num_list <- list()
for (i in 1:ncol(wordvec_df)){
vec <- zoo::na.trim(wordvec_df[,i], is.na = "all") #Removes NA's from bottom of vector
text_vector_matrix <- data.matrix(vec)
tak <- buildTakens(text_vector_matrix,2,3)
hom <- calculate_homology(tak,return_df = TRUE)
hom <- hom %>%
mutate(persistence = death-birth) %>%
mutate(persistent = ifelse(persistence > 0.1, 1,0))
hom_matrix <- tibble(hom) %>% select(dimension, persistent)
betti_num <- sum(hom$persistent == 1 & hom$dimension == 1)
betti_num_list <- append(betti_num_list, betti_num)
}
Here's some dummy code to try it with
V1 <- c(1,0,0,0,0,1,2,NA)
V2 <- c(2,1,1,0,0,1,NA,NA)
V3 <- c(1,2,1,0,0,NA,NA,NA)
wordvec_df <- data.frame(V1,V2,V3)
There are varying numbers of NA's at the bottom of each column because they're not all the same length.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…