Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
768 views
in Technique[技术] by (71.8m points)

dplyr - R function with multiple input data frames and colnames as function arguments for updating a data frame

I would like to write a function which updates one data frame based on the other data frame. When the id is in the updated_data I would like to update the column product in created_data. If the id is not in the updated_data I would like to continue with the already existing value for product from created_data. It's just a fictive example and in reality I would need to update multiple columns not only product, that's why I am using it as an argument to my function.

However due to this function approach I am struggeling with accessing the columns.


# some fictive data
created_data <- data.frame(id = c("ab01", "ab02", "ab03", "ab04", "ab05", "ab06", "ab07",
                                  "ab08", "ab09", "ab10", "ab11", "ab12", "ab13", "ab14"),
                           rank = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
                           colour = c("blue", "blue", "red", "purple", "yellow", "black",
                                      "green", "magenta", "black", "orange", "white",
                                      "orange", "lightblue", "magenta"),
                           product = c("shoes", "socks", "socks", "shirt", "jacket",
                                       "shoes", "socks", "socks", "shirt", "jacket",
                                       "shoes", "socks", "socks", "shirt"),
                           candy = c("mars", "twix", "kitkat", "bounty", "mars",
                                     "cookie", "cookie", "mars", "twix", "bounty",
                                     "twix", "twix", "twix", "twix"))

# some update data
updated_data <- data.frame(id = c("ab03", "ab07", "ab08"),
                           product = c("shirt", "trousers", "trousers"))


# one possible solution to solve the task without using a function
created_data$id <- as.character(created_data$id)
updated_data$id <- as.character(updated_data$id)

updated_data1 <- updated_data %>% 
  rename(product_new = product)

results_without_function <- created_data %>% 
  left_join(updated_data1, by = "id") %>% 
  mutate(product = ifelse(is.na(product_new), product, product_new)) %>% 
  select(-product_new)

# one trial for my function
update_fun <- function(orig_df, upd_df, column_to_update){
  
  orig_df1 <- orig_df %>% 
    mutate(column_to_update = ifelse(id %in% upd_df$id, upd_df$column_to_update, column_to_update))
  
  return(orig_df1)
}

# another trial 
update_fun <- function(orig_df, upd_df, column_to_update){
  
  orig_df1 <- orig_df %>% 
    mutate(!! column_to_update := ifelse(id %in% upd_df$id, upd_df$column_to_update, !! column_to_update))
  
  return(orig_df1)
}

# results
result <- update_fun(orig_df = created_data, upd_df = update_fun, column_to_update = "product")

EDIT:

sorry for not being explicit enough. I know how to solve this problem without using a function, code above has been adapted. However my question is how to translate this solution to a function where created_data and input_data as well as the id and the product column are handled as input parameters.

question from:https://stackoverflow.com/questions/65942015/r-function-with-multiple-input-data-frames-and-colnames-as-function-arguments-fo

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This sounds like a merge operation.

Base R

merged <- merge(created_data, updated_data, by = "id", all.x = TRUE)
merged
#      id rank    colour product.x  candy product.y
# 1  ab01    1      blue     shoes   mars      <NA>
# 2  ab02    2      blue     socks   twix      <NA>
# 3  ab03    3       red     socks kitkat     shirt
# 4  ab04    4    purple     shirt bounty      <NA>
# 5  ab05    5    yellow    jacket   mars      <NA>
# 6  ab06    6     black     shoes cookie      <NA>
# 7  ab07    7     green     socks cookie  trousers
# 8  ab08    8   magenta     socks   mars  trousers
# 9  ab09    9     black     shirt   twix      <NA>
# 10 ab10   10    orange    jacket bounty      <NA>
# 11 ab11   11     white     shoes   twix      <NA>
# 12 ab12   12    orange     socks   twix      <NA>
# 13 ab13   13 lightblue     socks   twix      <NA>
# 14 ab14   14   magenta     shirt   twix      <NA>

At this point, product.x indicates the original product field from created_data, and product.y the new values. When the id is not a match, product.y is NA. We can use this to choose which of the two values to keep within product. (Actually, product was renamed product.x in this merge, so we need to rename and/or reform it.)

merged <- within(merged, { product = ifelse(is.na(product.y), product.x, product.y); })
merged
#      id rank    colour product.x  candy product.y  product
# 1  ab01    1      blue     shoes   mars      <NA>    shoes
# 2  ab02    2      blue     socks   twix      <NA>    socks
# 3  ab03    3       red     socks kitkat     shirt    shirt
# 4  ab04    4    purple     shirt bounty      <NA>    shirt
# 5  ab05    5    yellow    jacket   mars      <NA>   jacket
# 6  ab06    6     black     shoes cookie      <NA>    shoes
# 7  ab07    7     green     socks cookie  trousers trousers
# 8  ab08    8   magenta     socks   mars  trousers trousers
# 9  ab09    9     black     shirt   twix      <NA>    shirt
# 10 ab10   10    orange    jacket bounty      <NA>   jacket
# 11 ab11   11     white     shoes   twix      <NA>    shoes
# 12 ab12   12    orange     socks   twix      <NA>    socks
# 13 ab13   13 lightblue     socks   twix      <NA>    socks
# 14 ab14   14   magenta     shirt   twix      <NA>    shirt

(And then clean up the two new columns,

merged$product.x <- merged$product.y <- NULL

dplyr

library(dplyr)
created_data %>%
  left_join(updated_data, by = "id") %>%
  mutate(product = coalesce(product.y, product.x)) %>%
  select(-product.x, -product.y)
#      id rank    colour  candy  product
# 1  ab01    1      blue   mars    shoes
# 2  ab02    2      blue   twix    socks
# 3  ab03    3       red kitkat    shirt
# 4  ab04    4    purple bounty    shirt
# 5  ab05    5    yellow   mars   jacket
# 6  ab06    6     black cookie    shoes
# 7  ab07    7     green cookie trousers
# 8  ab08    8   magenta   mars trousers
# 9  ab09    9     black   twix    shirt
# 10 ab10   10    orange bounty   jacket
# 11 ab11   11     white   twix    shoes
# 12 ab12   12    orange   twix    socks
# 13 ab13   13 lightblue   twix    socks
# 14 ab14   14   magenta   twix    shirt

data.table

(I don't recommend shifting to data.table solely for this implementation: while its performance and speed are impressive, it also has a bit of a learning curve. Certainly try it, but the base R above might be best if you're just starting with merges.)

library(data.table)
setDT(created_data)
setDT(updated_data)

updated_data[created_data, on = .(id) ][
  , product := fcoalesce(product, i.product) ][
  , i.product := NULL][]
#         id  product  rank    colour  candy
#     <char>   <char> <num>    <char> <char>
#  1:   ab01    shoes     1      blue   mars
#  2:   ab02    socks     2      blue   twix
#  3:   ab03    shirt     3       red kitkat
#  4:   ab04    shirt     4    purple bounty
#  5:   ab05   jacket     5    yellow   mars
#  6:   ab06    shoes     6     black cookie
#  7:   ab07 trousers     7     green cookie
#  8:   ab08 trousers     8   magenta   mars
#  9:   ab09    shirt     9     black   twix
# 10:   ab10   jacket    10    orange bounty
# 11:   ab11    shoes    11     white   twix
# 12:   ab12    socks    12    orange   twix
# 13:   ab13    socks    13 lightblue   twix
# 14:   ab14    shirt    14   magenta   twix

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...