Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
130 views
in Technique[技术] by (71.8m points)

r - Expand ranges defined by "from" and "to" columns

I have a data frame containing "name" of U.S. Presidents, the years when they start and end in office, ("from" and "to" columns). Here is a sample:

name           from  to
Bill Clinton   1993 2001
George W. Bush 2001 2009
Barack Obama   2009 2012

...and the output from dput:

dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

I want to create data frame with two columns ("name" and "year"), with a row for each year that a president was in office. Thus, I need to create a regular sequence with each year from "from", to "to". Here's my expected out:

name           year
Bill Clinton   1993
Bill Clinton   1994
...
Bill Clinton   2000
Bill Clinton   2001
George W. Bush 2001
George W. Bush 2002
... 
George W. Bush 2008
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012

I know that I can use data.frame(name = "Bill Clinton", year = seq(1993, 2001)) to expand things for a single president, but I can't figure out how to iterate for each president.

How do I do this? I feel that I should know this, but I'm drawing a blank.

Update 1

OK, I've tried both solutions, and I'm getting an error:

foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use the plyr package:

library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]

and if it is important that the data be sorted by year, you can use the arrange function:

df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012

Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply to account for presidents with non-consecutive terms:

adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...