conditional - R, conditionally remove duplicate rows

Question

Welcome To Ask or Share your Answers For Others

conditional - R, conditionally remove duplicate rows

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

conditional - R, conditionally remove duplicate rows

I have a dataframe in R containing the columns ID.A, ID.B and DISTANCE, where distance represents the distance between ID.A and ID.B. For each value (1->n) of ID.A, there may be multiple values of ID.B and DISTANCE (i.e. there may be multiple duplicate rows in ID.A e.g. all of value 4 which each has a different ID.B and distance in that row).

I would like to be able to remove rows where ID.A is duplicated, but conditional upon the distance value such that I am left with the smallest distance values for each ID.A record.

Hopefully that makes sense?

Many thanks in advance

EDIT

Hopefully an example will prove more useful than my text. Here I would like to remove the second and third rows where ID.A = 3:

myDF <- read.table(text="ID.A ID.B DISTANCE
  1 3 1
  2 6 8
  3 2 0.4
  3 3 1
  3 8 5
  4 8  7
  5 2 11", header = TRUE)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:52:31+0000

You can also do it easily in base R. If dat is your dataframe,

do.call(rbind, 
        by(dat, INDICES=list(dat$ID.A), 
           FUN=function(x) head(x[order(x$DISTANCE), ], 1)))

Categories

conditional - R, conditionally remove duplicate rows

conditional - R, conditionally remove duplicate rows

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags