Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
698 views
in Technique[技术] by (71.8m points)

r - dplyr / tidyr - Summarise data with conditions

Problem I am trying to use dyplr & tidyr to achieve an output table (like a contingency table I think) which summarises this data into frequency (eg a count of titles, descriptions & bodies which are negative, neutral and positive numbers). I have tried a number of different methods and the closest example I can find is at Using Tidyr/Dplyr to summarise counts of groups of strings. But this doesn't fit, quite.

Example Data The data looks a little like...

df <- data.frame( "story_title"=c(0.0,0.0,0.0,-1.0,1.0),
                  "story_description"=c(-0.3,-0.3,-0.3,0.5,0.3),
                  "story_body"=c(-0.3,0.2,0.4,0.2,0))

Desired Output The output would hopefully look a bit like this, showing the summary frequencies for each story part...

                  Negative  Neutral  Positive 
story_title              1         3        1         
story_description        3         0        2
story_body               1         1        3

(edited totals for story_body - Thanks Akrun)

Attempted Approach

If I'm right the first step will be to reshape the data using gather thusly...

df <- df %>% gather(type,score,starts_with("story"))

> df 
      type score
1        story_title   0.0
2        story_title   0.0
3        story_title   0.0
4        story_title  -1.0
5        story_title   1.0
6  story_description  -0.3
7  story_description  -0.3
8  story_description  -0.3
9  story_description   0.5
10 story_description   0.3
11        story_body  -0.3
12        story_body   0.2
13        story_body   0.4
14        story_body   0.2
15        story_body   0.0

From here I think it's a combination of group_by and summarise and I've tried...

df %>% group_by(sentiment) %>%
          summarise(Negative = count("sentiment_title"<0),
                    Neutral  = count("sentiment_title"=0),
                    Positive  = count("sentiment_title">0)
                   )

Obviously this hasn't worked.

Can anyone help with a dplyr/tidyr solution (a base table answer would also be useful as an example)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

TRy

library(dplyr)
library(tidyr)
gather(df) %>% 
      group_by(key,value= sign(value))%>%
      tally()  %>% 
      mutate(ind= factor(value, levels=c(-1,0,1), 
                    labels=c('Negative', 'Neutral', 'Positive'))) %>% 
      select(-value) %>% 
      spread(ind, n, fill=0)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...