How can I mutate all columns containing a pattern (mutate_at
I guess) using each time the previous column using dplyr
?
--> Here for example all column continaing foo
in their name should be mutated using the column just before (i.e., a
for column fooa
, b
for foob
and so on).
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10))
df
# a fooa b foob c fooc
# 1 0.5543269 0.6611216 48.26791 3.0999527 98.06053 6.035485
# 2 -0.2802719 0.8783709 51.15647 0.1586242 113.96432 2.299504
# 3 1.7751634 0.8905590 52.34582 2.3070636 101.00663 9.668332
# 4 0.1873201 0.5662805 50.58978 1.6501046 98.85561 6.045547
# 5 1.1425261 0.5935473 50.35224 3.1676038 107.02225 6.396047
library(dplyr)
df %>% mutate(fooa = fooa/100 * a,
foob = foob/100 * b,
fooc = fooc/100 * c)
# a fooa b foob c fooc
# 1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
# 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
# 3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
# 4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
# 5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
# Equivalently, in base R:
for (i in c(2, 4, 6)) {
df[,i] = df[,i]/100 * df[, i-1]
}
So I am looking for something like this I guess:
# What should <PREVIOUS_COLUMN> be?
df %>% mutate_at(vars(contains('foo')), funs(./100 * <PREVIOUS_COLUMN>))
# OR, even better (more generic but in my case it will always be the previous column):
df %>% mutate_at(vars(contains('foo')), funs(./100 * <COLUMN_NAME_WITH_'foo'_PATTERN_REMOVED>))
EDIT: I should have mentioned that the original data.frame
could contain more columns, possibly with another pattern than X then fooX
, so that the ideal solution should be able to localize them properly (but I'll leave it as such as all answers provide nice solutions and features).
A better example would have been:
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
bla = 5,
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10),
blo = 8)
df
# a fooa b foob bla c fooc blo
# 1 0.5543269 0.6611216 48.26791 3.0999527 5 98.06053 6.035485 8
# 2 -0.2802719 0.8783709 51.15647 0.1586242 5 113.96432 2.299504 8
# 3 1.7751634 0.8905590 52.34582 2.3070636 5 101.00663 9.668332 8
# 4 0.1873201 0.5662805 50.58978 1.6501046 5 98.85561 6.045547 8
# 5 1.1425261 0.5935473 50.35224 3.1676038 5 107.02225 6.396047 8
question from:
https://stackoverflow.com/questions/65915885/mutate-all-columns-matching-a-pattern-each-time-based-on-the-previous-columns