Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

ruby - trying to get content inside cdata tags in xml file using nokogiri

I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2.

A snippet of the xml looks like this:

<NewsLineText>
  <![CDATA[
  Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly     creme brulee.
  ]]>
</NewsLineText>

I am trying to parse this out to get the text associated with the NewsLineText

r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext')
s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext')
t = node.at_xpath('.//newslinetext').content if node.at_xpath('.//newslinetext')
puts r
puts s ? if s.blank? 'NOTHING' : s
puts t ? if t.blank? 'NOTHING' : t

What I get in return is

<newslinetext></newslinetext>
NOTHING
NOTHING

So I know my tags are named/spelled correctly to get at the newslinetext data, but the cdata text never shows up.

What do I need to do with nokogiri to get this text?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You're trying to parse XML using Nokogiri's HMTL parser. If node as from the XML parser then r would be nil since XML is case sensitive; your r is not nil so you're using the HTML parser which is case insensitive.

Use Nokogiri's XML parser and you will get things like this:

>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "
  ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "
  Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly     creme brulee.
  ">, #<Nokogiri::XML::Text:0x8066a8d4 "
">]>
>> r.text
=> "
  
  Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly     creme brulee.
  
"

and you'll be able to get at the CDATA through r.text or r.children.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...