Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
303 views
in Technique[技术] by (71.8m points)

Merging dataframes in R - resulting dataframe is too large

I am trying to merge two dataframes in R, joining them by the one column that they share. Here are screenshots of the two dataframes, and I am merging on the column "INC_KEY".

enter image description here

enter image description here

This is the code I have written to merge the two dataframes: dp <- inner_join(d,p,by="INC_KEY")

d has 177156 observations, and p has 1641137 observations, but the final merged dataframe has 8416113 observations, which does not make sense to me. I have also tried changing the inner_join function above to the merge function, but I still get the same result. I am wondering how to fix this code so that the merged dataframe has a realistic number of observations - thanks so much for any help!

question from:https://stackoverflow.com/questions/65661137/merging-dataframes-in-r-resulting-dataframe-is-too-large

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You most probably have duplicates in either d or p or both of them. Try keeping only one row for each unique INC_KEY value before joining.

library(dplyr)

dp <- inner_join(d %>% distinct(INC_KEY, .keep_all = TRUE),
                 p %>% distinct(INC_KEY, .keep_all = TRUE),by="INC_KEY")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...