I'm trying to use rowSums but using a comparison on values for the condition.
Here is an example of my data frame, based on surveys. Where rows refer to participants, columns to a date of birth of a child.
b3_01 b3_02 b3_03 b3_04 b3_05 b3_06
1 1360 1360 1266 1228 1181 1158
2 1362 1342 1301 1264 1245 1191
3 1379 NA NA NA NA NA
4 1355 1330 1293 1293 1227 1208
5 1391 1371 1358 1334 1311 1311
Here, a similar date refers to twins. What I would like to do is create a new column that tells me how many times, for each row, values for those columns are similar. Which would give me something like:
b3_01 b3_02 b3_03 b3_04 b3_05 b3_06 twins
1 1360 1360 1266 1228 1181 1158 1
2 1362 1342 1301 1264 1245 1191 0
3 1379 NA NA NA NA NA 0
4 1355 1330 1293 1293 1227 1208 1
5 1391 1371 1358 1334 1311 1311 1
EDIT: Sorry, I forgot to say that if any number appears 3 or more times, it should not be counted as a twin. The end goal is to have 4 columns : one for singletons (when every number appears only once), one for twins, one for triplets (if any number appears three times), and one for quadruplets.
I'm working with dplyr. As the data.frame is very large, I need to specify the range of columns I want to comparison to be done on. I have tried the following code, along with variants:
twins<-df%>%
mutate(twins= rowSums(select(.,starts_with("b3_")) == select(.,starts_with("b3_")),na.rm=TRUE))
Which does not work. I have played with other functions too but could not figure out a solution.
Do you have any idea on how to achieve this? I feel like the solution is simple, but I am an absolute beginner in R.
question from:
https://stackoverflow.com/questions/66066306/comparing-column-values-with-rowsums 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…