I have several dataframes that I want to merge. I'm looking for a scalable solution and I've found this nice one. So I do:
library(purrr)
library(dplyr)
df_a <- data.frame(id = 1:8)
df_b <- data.frame(id = 5:10)
df_c <- data.frame(id = 2:6)
df_d <- data.frame(id = 3:6)
dfs_to_merge <- list(df_a, df_b, df_c, df_d)
dfs_to_merge %>%
reduce(left_join, by = "id")
#> id
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
Created on 2021-01-25 by the reprex package (v0.3.0)
But what if, for example, I wanted to condition whether df_c
will be included in dfs_to_merge
based on a the value of a variable my_condition_df_c
?
Example — Not a scalable solution
If my_condition_df_c > 5
then include df_c
in dfs_to_merge
my_condition_df_c <- sample(1:10, 1)
if (my_condition_df_c > 5) {
dfs_to_merge <- list(df_a, df_b, df_c, df_d)
} else {
dfs_to_merge <- list(df_a, df_b, df_d)
}
dfs_to_merge %>%
reduce(left_join, by = "id")
My Problem
Consider that I may have several dataframes to merge, and that each one of them may have its own condition that determines whether it should be passed forward for merging.
my_condition_df_a <- sample(1:100, 1) ## include df_a if my_condition_df_a > 65
my_condition_df_b <- sample(c("foo", "blah"), 1) ## include df_b if my_condition_df_b == "foo"
my_condition_df_d <- sample(c(NA, 1, 2, 3, NA, 19), 1) ## include df_d if my_condition_df_d is not NA
How could I elegantly condition which data frame gets in and which is not? Using if-else blocks as I did above is not a scalable solution as it will easily become messy and unreadable code.
UPDATE — I made some progress
So what I do is to make a character vector of object names, containing the names of dataframes to be included in the list later (or not). Being included in this vector is subject to specific condition per data frame.
dfs_to_merge_names <- c()
if (my_condition_df_a > 65) {
dfs_to_merge_names <- c(dfs_to_merge_names, "df_a")
}
if (my_condition_df_b == "foo") {
dfs_to_merge_names <- c(dfs_to_merge_names, "df_b")
}
if (my_condition_df_c > 5) {
dfs_to_merge_names <- c(dfs_to_merge_names, "df_c")
}
if (!is.na(my_condition_df_d)) {
dfs_to_merge_names <- c(dfs_to_merge_names, "df_d")
}
mget(dfs_to_merge_names) %>% ## https://stackoverflow.com/a/45963957/6105259
reduce(left_join, by = "id")
I will still be happy for ideas whether this code could be shortened and more concise.
question from:
https://stackoverflow.com/questions/65886231/how-to-decide-which-dataframes-will-be-bundled-into-a-list-based-on-dataframe-s