I am trying to extract information from an XML file from ClinicalTrials.gov. The file is organized in the following way:
<clinical_study>
...
<brief_title>
...
<location>
<facility>
<name>
<address>
<city>
<state>
<zip>
<country>
</facility>
<status>
<contact>
<last_name>
<phone>
<email>
</contact>
</location>
<location>
...
</location>
...
</clinical_study>
I can use the R XML package from CRAN in the following code to extract all location nodes from the XML file:
library(XML)
clinicalTrialUrl <- "http://clinicaltrials.gov/ct2/show/NCT01480479?resultsxml=true"
xmlDoc <- xmlParse(clinicalTrialUrl, useInternalNode=TRUE)
locations <- xmlToDataFrame(getNodeSet(xmlDoc,"//location"))
This works kind of ok. However, if you look at the data frame, you will notice that the xmlToDataFrame function lumped together everything under <facility>
into a single concatenated string. A solution would be to write code to generate the data frame column by column, for example, you could generate
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…