Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
262 views
in Technique[技术] by (71.8m points)

r - Mutate all columns matching a pattern each time based on the previous columns

How can I mutate all columns containing a pattern (mutate_at I guess) using each time the previous column using dplyr? --> Here for example all column continaing foo in their name should be mutated using the column just before (i.e., a for column fooa, b for foob and so on).


set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10))
df
#            a      fooa        b      foob         c     fooc
# 1  0.5543269 0.6611216 48.26791 3.0999527  98.06053 6.035485
# 2 -0.2802719 0.8783709 51.15647 0.1586242 113.96432 2.299504
# 3  1.7751634 0.8905590 52.34582 2.3070636 101.00663 9.668332
# 4  0.1873201 0.5662805 50.58978 1.6501046  98.85561 6.045547
# 5  1.1425261 0.5935473 50.35224 3.1676038 107.02225 6.396047

library(dplyr)
df %>% mutate(fooa = fooa/100 * a,
              foob = foob/100 * b,
              fooc = fooc/100 * c)
#            a         fooa        b       foob         c     fooc
# 1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
# 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
# 3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
# 4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
# 5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194

# Equivalently, in base R:
for (i in c(2, 4, 6)) {
  df[,i] = df[,i]/100 * df[, i-1]
}

So I am looking for something like this I guess:

# What should <PREVIOUS_COLUMN> be?
df %>% mutate_at(vars(contains('foo')), funs(./100 * <PREVIOUS_COLUMN>)) 

# OR, even better (more generic but in my case it will always be the previous column):
df %>% mutate_at(vars(contains('foo')), funs(./100 * <COLUMN_NAME_WITH_'foo'_PATTERN_REMOVED>)) 

EDIT: I should have mentioned that the original data.frame could contain more columns, possibly with another pattern than X then fooX, so that the ideal solution should be able to localize them properly (but I'll leave it as such as all answers provide nice solutions and features).

A better example would have been:

set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                bla = 5,
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10),
                blo = 8)
df
#            a      fooa        b      foob bla         c     fooc blo
# 1  0.5543269 0.6611216 48.26791 3.0999527   5  98.06053 6.035485   8
# 2 -0.2802719 0.8783709 51.15647 0.1586242   5 113.96432 2.299504   8
# 3  1.7751634 0.8905590 52.34582 2.3070636   5 101.00663 9.668332   8
# 4  0.1873201 0.5662805 50.58978 1.6501046   5  98.85561 6.045547   8
# 5  1.1425261 0.5935473 50.35224 3.1676038   5 107.02225 6.396047   8
question from:https://stackoverflow.com/questions/65915885/mutate-all-columns-matching-a-pattern-each-time-based-on-the-previous-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is another approach using across() and cur_column(). I personally would not recommend to make calculations based on the position of columns, and would instead recommend to work with the columns names, since this seems safer.

In the example below we loop over the columns a,b and c with across and access the values of each corresponding foo column by using get() and cur_column.

set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10))

library(dplyr)

df %>% 
  mutate(across(matches("^[a-z]$"),
                ~ get(paste0("foo", cur_column())) / 100 * .x,
                .names = "foo{col}"))
#>            a         fooa        b       foob         c     fooc
#> 1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
#> 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
#> 3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
#> 4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
#> 5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194

Created on 2021-01-27 by the reprex package (v0.3.0)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...