Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
594 views
in Technique[技术] by (71.8m points)

r - Filter by ranges supplied by two vectors, without a join operation

I wish to do exactly this: Take dates from one dataframe and filter data in another dataframe - R

except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter.

Here is sample data:

tmp_df <- data.frame(a = 1:10)

I wish to do an operation that looks like this:

lower_bound <- c(2, 4)
upper_bound <- c(2, 5)
tmp_df %>%
    filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately

and my desired result is:

> tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F] 
# one way to get indices to subset data frame, impractical for a long range vector
  a
2 2
4 4
5 5

My problem with memory requirements (with respect to the join solution linked) is when tmp_df has many more rows and the lower_bound and upper_bound vectors have many more entries. A dplyr solution, or a solution that can be part of pipe is preferred.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Maybe you could borrow the inrange function from data.table, which

checks whether each value in x is in between any of the intervals provided in lower,upper.

Usage:

inrange(x, lower, upper, incbounds=TRUE)

library(dplyr); library(data.table)

tmp_df %>% filter(inrange(a, c(2,4), c(2,5)))
#  a
#1 2
#2 4
#3 5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.9k users

...