Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
230 views
in Technique[技术] by (71.8m points)

ggplot2 - R - optimal way to truncate with dplyr

I'm using R and ggplot to visualise variable distributions. But most of the time, because of some extrem values, I have to truncate the variable to generate a better visualisation. For instance:

library(tidyverse)

data.frame(x = c(runif(500, min = 0, max = 1), 1e3)) %>%
  ggplot() + geom_density(aes(x = x))

enter image description here

I use the base functions quantile() and ifelse() to truncate and get a better visualisation. But I don't feel it is optimal, the function quantile() is repeted, meaning it's calculated twice. Does someone now a more optimal way? (without saving the quantile in a previous step)

data.frame(x = c(runif(500, min = 0, max = 1), 1e3)) %>%
  mutate_at(vars(x), list(~ ifelse(. > quantile(., .99), quantile(., .99), .))) %>% 
  ggplot() + geom_density(aes(x = x))

enter image description here

question from:https://stackoverflow.com/questions/66065975/r-optimal-way-to-truncate-with-dplyr

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
data.frame(x = c(runif(500, min = 0, max = 1), 1e3)) %>%
  mutate_at(vars(x), list(~ pmin(., quantile(., .99)))) %>% 
  ggplot() + geom_density(aes(x = x))

pmin does vector-wise mins, ala

x <- sample(10)
x
#  [1] 10  9  6  4  5  3  2  1  7  8
pmin(x, 5)
#  [1] 5 5 5 4 5 3 2 1 5 5

And you only calculate the quantile once.

FYI, mutate_at has been superseded by the use of across.

data.frame(x = c(runif(500, min = 0, max = 1), 1e3)) %>%
  mutate(across(x, ~ pmin(., quantile(., .99)))) %>% 
  ggplot() + geom_density(aes(x = x))

Note that the list(~ quantile(., 0.99)) method is still supported, but when a list, the naming convention is different. Compare:

set.seed(42)
x <- data.frame(x = c(runif(500, min = 0, max = 1), 1e3))
x %>%
  mutate(across(x, list(~ pmin(., quantile(., .99))))) %>%
  head(.)
#           x       x_1
# 1 0.9148060 0.9148060
# 2 0.9370754 0.9370754
# 3 0.2861395 0.2861395
# 4 0.8304476 0.8304476
# 5 0.6417455 0.6417455
# 6 0.5190959 0.5190959
x %>%
  mutate(across(x, ~ pmin(., quantile(., .99)))) %>%
  head(.)
#           x
# 1 0.9148060
# 2 0.9370754
# 3 0.2861395
# 4 0.8304476
# 5 0.6417455
# 6 0.5190959

(where the list method produces a new column named x_1, but ggplot2 is still looking at the untruncated x).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...