Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
335 views
in Technique[技术] by (71.8m points)

r - Comparing column values with rowSums

I'm trying to use rowSums but using a comparison on values for the condition.

Here is an example of my data frame, based on surveys. Where rows refer to participants, columns to a date of birth of a child.

  b3_01 b3_02 b3_03 b3_04 b3_05 b3_06
1  1360  1360  1266  1228  1181  1158    
2  1362  1342  1301  1264  1245  1191 
3  1379    NA    NA    NA    NA    NA  
4  1355  1330  1293  1293  1227  1208  
5  1391  1371  1358  1334  1311  1311

Here, a similar date refers to twins. What I would like to do is create a new column that tells me how many times, for each row, values for those columns are similar. Which would give me something like:

  b3_01 b3_02 b3_03 b3_04 b3_05 b3_06 twins
1  1360  1360  1266  1228  1181  1158     1
2  1362  1342  1301  1264  1245  1191     0
3  1379    NA    NA    NA    NA    NA     0
4  1355  1330  1293  1293  1227  1208     1
5  1391  1371  1358  1334  1311  1311     1

EDIT: Sorry, I forgot to say that if any number appears 3 or more times, it should not be counted as a twin. The end goal is to have 4 columns : one for singletons (when every number appears only once), one for twins, one for triplets (if any number appears three times), and one for quadruplets.

I'm working with dplyr. As the data.frame is very large, I need to specify the range of columns I want to comparison to be done on. I have tried the following code, along with variants:

twins<-df%>%
  mutate(twins= rowSums(select(.,starts_with("b3_")) == select(.,starts_with("b3_")),na.rm=TRUE))

Which does not work. I have played with other functions too but could not figure out a solution.

Do you have any idea on how to achieve this? I feel like the solution is simple, but I am an absolute beginner in R.

question from:https://stackoverflow.com/questions/66066306/comparing-column-values-with-rowsums

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Referring to my comment and assuming that n same values in a row are counted as n-1 twins, define

countTwins <- function(row) {
  length(row)-length(unique(row))
}

and get the column twins as

twinCol <- apply(df,1,countTwins)

If you want to count n same values as 1 twin, use instead the function

countTwins2 <- function(row) {
  sum(table(unname(unlist(row)))>1)
}

Update according to my comment:

countSinglesTwinsAndTriplets <- function(row) {
  tt <- table(unname(unlist(row)))
  c(sum(tt==1),sum(tt==2),sum(tt==3)) #nr of singletons,twins,triplets
}

addCols <- setNames(data.frame(t(apply(df,1,countSinglesTwinsAndTriplets))),c("singletons","twins","triplets"))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...