r - Mutate all columns matching a pattern each time based on the previous columns

Question

Welcome To Ask or Share your Answers For Others

r - Mutate all columns matching a pattern each time based on the previous columns

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Mutate all columns matching a pattern each time based on the previous columns

How can I mutate all columns containing a pattern (mutate_at I guess) using each time the previous column using dplyr? --> Here for example all column continaing foo in their name should be mutated using the column just before (i.e., a for column fooa, b for foob and so on).


set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10))
df
#            a      fooa        b      foob         c     fooc
# 1  0.5543269 0.6611216 48.26791 3.0999527  98.06053 6.035485
# 2 -0.2802719 0.8783709 51.15647 0.1586242 113.96432 2.299504
# 3  1.7751634 0.8905590 52.34582 2.3070636 101.00663 9.668332
# 4  0.1873201 0.5662805 50.58978 1.6501046  98.85561 6.045547
# 5  1.1425261 0.5935473 50.35224 3.1676038 107.02225 6.396047

library(dplyr)
df %>% mutate(fooa = fooa/100 * a,
              foob = foob/100 * b,
              fooc = fooc/100 * c)
#            a         fooa        b       foob         c     fooc
# 1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
# 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
# 3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
# 4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
# 5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194

# Equivalently, in base R:
for (i in c(2, 4, 6)) {
  df[,i] = df[,i]/100 * df[, i-1]
}

So I am looking for something like this I guess:

# What should <PREVIOUS_COLUMN> be?
df %>% mutate_at(vars(contains('foo')), funs(./100 * <PREVIOUS_COLUMN>)) 

# OR, even better (more generic but in my case it will always be the previous column):
df %>% mutate_at(vars(contains('foo')), funs(./100 * <COLUMN_NAME_WITH_'foo'_PATTERN_REMOVED>))

EDIT: I should have mentioned that the original data.frame could contain more columns, possibly with another pattern than X then fooX, so that the ideal solution should be able to localize them properly (but I'll leave it as such as all answers provide nice solutions and features).

A better example would have been:

set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                bla = 5,
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10),
                blo = 8)
df
#            a      fooa        b      foob bla         c     fooc blo
# 1  0.5543269 0.6611216 48.26791 3.0999527   5  98.06053 6.035485   8
# 2 -0.2802719 0.8783709 51.15647 0.1586242   5 113.96432 2.299504   8
# 3  1.7751634 0.8905590 52.34582 2.3070636   5 101.00663 9.668332   8
# 4  0.1873201 0.5662805 50.58978 1.6501046   5  98.85561 6.045547   8
# 5  1.1425261 0.5935473 50.35224 3.1676038   5 107.02225 6.396047   8

question from:https://stackoverflow.com/questions/65915885/mutate-all-columns-matching-a-pattern-each-time-based-on-the-previous-columns

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:09:35+0000

Here is another approach using across() and cur_column(). I personally would not recommend to make calculations based on the position of columns, and would instead recommend to work with the columns names, since this seems safer.

In the example below we loop over the columns a,b and c with across and access the values of each corresponding foo column by using get() and cur_column.

set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10))

library(dplyr)

df %>% 
  mutate(across(matches("^[a-z]$"),
                ~ get(paste0("foo", cur_column())) / 100 * .x,
                .names = "foo{col}"))
#>            a         fooa        b       foob         c     fooc
#> 1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
#> 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
#> 3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
#> 4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
#> 5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194

^{Created on 2021-01-27 by the reprex package (v0.3.0)}

Categories

r - Mutate all columns matching a pattern each time based on the previous columns

r - Mutate all columns matching a pattern each time based on the previous columns

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags