Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
144 views
in Technique[技术] by (71.8m points)

r - Union between two dataframe columns

I have two data frames:

df1 has these columns: participantid, formid, c1, c2, c3, c4

df2 has these columns: participantid, c5, c6, c7, c8

I want a union of all participantids from the first data frame where formid = 'some value' and all of participantids from the second dataframe. I am only interested in a list of participantids. I am not interested in any of the other columns: c1, c2, c3, c4,...

I have tried:

union(df1[df1$formid == "some value", "participantid"], df2["participantid"])
union(df1[df1$formid == "some value", "participantid"], df2[["participantid"]])
union(df1[df1$formid == "some value", "participantid"], df2$participantid)

Neither worked.

Any pointers?

Thank you in advance!

Edit: I have tried the following code and it works:

df1 <- data.frame(participantid = c("A1", "A2", "A3", "A4"),
                 formid = c("F1","F1","F1","F2"),
                 c1 = c(0,0,0,0))

df2 <- data.frame(participantid = c("B1", "B2", "B3", "B4"),
                  c2 = c(0,0,0,0))

union(df1[df1$formid == "F1", "participantid"], df2$participantid)

When I run class(df2$participantid) or class(df1[df1$formid == "F1", "participantid"]), it returns [1] "factor"

My real data is coming from CSV files and when I run on this real data class(df1[df1$formid == "F1", "participantid"]) it returns [1] "tbl_df" "tbl" "data.frame" and when I run class(df2$participantid) it returns [1] "character". Do you guys know why that is?

Edit #2: I was able to reproduce my predicament using dummy CSV files:

df1 CSV file:

participantid,formid,c1
A1,F1,0
A2,F1,0
A3,F1,0
A4,F2,0

df2 CSV file:

participantid,c2
B1,0
B2,0
B3,0
B4,0

When I run the union command above I get this:

[[1]]
[1] "A1" "A2" "A3"

[[2]]
[1] "B1"

[[3]]
[1] "B2"

[[4]]
[1] "B3"

[[5]]
[1] "B4"

with a length() of 5, when it should have been a length of 7. Does this make sense?

I was expecting the output to be either

"A1" "A2" "A3" "B1" "B2" "B3" "B4"

or

"A1" 
"A2" 
"A3" 
"B1" 
"B2" 
"B3" 
"B4"

Edit #3: I am going to answer my own question. This worked for me in the end:

union(df1[df1$formid == "F1",]$participantid, df2$participantid)
question from:https://stackoverflow.com/questions/65928488/union-between-two-dataframe-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
union(unique(df1[df1$formid == "some value", 'participantid']), unique(df2$participantid))

I used unique as I guess you don't need duplicated values.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...