r - How can I remove all duplicates so that NONE are left in a data frame?

Question

Welcome To Ask or Share your Answers For Others

r - How can I remove all duplicates so that NONE are left in a data frame?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - How can I remove all duplicates so that NONE are left in a data frame?

There is a similar question for PHP, but I'm working with R and am unable to translate the solution to my problem.

I have this data frame with 10 rows and 50 columns, where some of the rows are absolutely identical. If I use unique on it, I get one row per - let's say - "type", but what I actually want is to get only those rows which only appear once. Does anyone know how I can achieve this?

I can have a look at clusters and heatmaps to sort it out manually, but I have bigger data frames than the one mentioned above (with up to 100 rows) where this gets a bit tricky.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T21:19:01+0000

This will extract the rows which appear only once (assuming your data frame is named df):

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

Categories

r - How can I remove all duplicates so that NONE are left in a data frame?

r - How can I remove all duplicates so that NONE are left in a data frame?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags