Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

graph - Creating a Pareto Chart with ggplot2 and R

I have been struggling with how to make a Pareto Chart in R using the ggplot2 package. In many cases when making a bar chart or histogram we want items sorted by the X axis. In a Pareto Chart we want the items ordered descending by the value in the Y axis. Is there a way to get ggplot to plot items ordered by the value in the Y axis? I tried sorting the data frame first but it seems ggplot reorders them.

Example:

val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")

the data frame val is sorted but the output looks like this:

alt text
(source: cerebralmastication.com)

Hadley correctly pointed out that this produces a much better graphic for showing actuals vs. predicted:

ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))

which returns:

alt text
(source: cerebralmastication.com)

But it's still not a Pareto Chart. Any tips?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Subsetting and sorting your data;

valact <- subset(val, variable=='actual')
valsort <- valact[ order(-valact[,"Value"]),]

From there it's just a standard boxplot() with a very manual cumulative function on top:

op <- par(mar=c(3,3,3,3)) 
bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1),    
              names.arg=as.character(valsort[,"State"]), main="How's that?") 
lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]), 
      ylim=c(0,1.05), col='red') 
axis(4)
box() 
par(op)

which should look like this

alt text
(source: eddelbuettel.com)

and it doesn't even need the overplotting trick as lines() happily annotates the initial plot.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...