We'll use the XML package
library(XML)
and create a closure that contains a function to handle the 'SCHOOL' node, as well as two helper functions to retrieve results when done. The SCHOOL function is invoked on each SCHOOL node. If it finds a hockey team, it uses the /SCHOOL/NAME/text() as a 'key', and the /SCHOOL/TEAMS/HOCKEY/text() and //STUDENT/text() (or /SCHOOL/GRADES/STUDENT/text()) as values. A message is printed for every 100 (by default) schools with hockey teams, so that there's some indication of progress. The 'get' function is used after the fact to retrieve the result.
teams <- function(progress=1000) {
res <- new.env(parent=emptyenv()) # for results
it <- 0L # iterator -- nodes visited
list(SCHOOL=function(elt) {
## handle 'SCHOOL' nodes
if (getNodeSet(elt, "not(/SCHOOL/TEAMS/HOCKEY)"))
## early exit -- no hockey team
return(NULL)
it <<- it + 1L
if (it %% progress == 0L)
message(it)
school <- getNodeSet(elt, "string(/SCHOOL/NAME/text())") # 'key'
res[[school]] <-
list(team=getNodeSet(elt,
"normalize-space(/SCHOOL/TEAMS/HOCKEY/text())"),
students= xpathSApply(elt, "//STUDENT", xmlValue))
}, getres = function() {
## retrieve the 'res' environment when done
res
}, get=function() {
## retrieve 'res' environment as data.frame
school <- ls(res)
team <- unlist(eapply(res, "[[", "team"), use.names=FALSE)
student <- eapply(res, "[[", "students")
len <- sapply(student, length)
data.frame(school=rep(school, len), team=rep(team, len),
student=unlist(student, use.names=FALSE))
})
}
We use the function as
branches <- teams()
xmlEventParse("event.xml", handlers=NULL, branches=branches)
branches$get()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…