Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
427 views
in Technique[技术] by (71.8m points)

Select Rows from Different Table where String from 1st table column is present in R

I am trying to match tables if a string is fully present in the other tables' column. However, I have managed to join it partially and then I am applying Levenstein distance to get close matches. This approach has limited use and accuracy. Approach:

checkg <- check %>% 
  fuzzy_inner_join(LOCATIONS, by = c("STRING" = "STRING"), match_fun = str_detect) %>%
  rowwise() %>%
  mutate(DIST = adist(x=STRING, y=LOCATION, ignore.case = TRUE)) 

is there any way to map it in the following way? The STATUS column in the output table is just given to make it clear that partial string matching is not the objective. It is not required in the output. Thanks

TABLE 1

**STRING** 
BATANGAS
QINGDAO

TABLE2

**STRING**
BATNAGAS LUZON
QINGDAO PT

OUTPUT TABLE checkg

TABLE1.STRING   TABLE2.STRING    STATUS
BATANGAS        BATNAGAS LUZON   Accept
QINGDAO         QINGDAO PT       Accept
BATANGAS        TANGA            Reject
question from:https://stackoverflow.com/questions/65901438/select-rows-from-different-table-where-string-from-1st-table-column-is-present-i

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can reverse the syntax to avoid partial matching from LOCATIONS table.

library(fuzzyjoin)

check <- data.frame(STRING = c("BATANGAS", "QINGDAO"))
LOCATIONS <- data.frame(STRING = c("BATANGAS LUZON", "QINGDAO PT", "TANGA"))

LOCATIONS %>% 
  fuzzy_right_join(check, by = c("STRING" = "STRING"), match_fun = str_detect)

        STRING.x STRING.y
1 BATANGAS LUZON BATANGAS
2     QINGDAO PT  QINGDAO

To check further for full words only, you can do this..

check <- structure(list(To_check = c("BATANGAS", "QINGDAO", "ABC", "DEF"
), id = 1:4), class = "data.frame", row.names = c(NA, -4L))

check
> check
  To_check id
1 BATANGAS  1
2  QINGDAO  2
3      ABC  3
4      DEF  4

> LOCATIONS
          STRING
1 BATANGAS LUZON
2     QINGDAO PT
3          TANGA
4           ABCD

LOCATIONS %>% 
  fuzzy_right_join(check %>% mutate(dummy = paste0('\b', To_check, '\b')), 
                   by = c("STRING" = "dummy"), match_fun = str_detect) %>%
  select(-dummy)

          STRING To_check id
1 BATANGAS LUZON BATANGAS  1
2     QINGDAO PT  QINGDAO  2
3           <NA>      ABC  3
4           <NA>      DEF  4

needless to say you can use fuzzy_inner_join for having matched results only


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.9k users

...