I am trying to use HtmlAgilityPack to parse HTML, but am having problems.
Sample HTML Doc:
<tr>
<td class="css_lokalita" colspan="4">
<select id="region" name="region">
<option value="0" selected>V?etky regiony</option>
<optgroup>Banskobystricky kraj</optgroup>
<option value="k_1" style="color: #000000; font-weight:bold;">Banskobystricky kraj</option>
<option value="1"> Banská Bystrica</option>
.
.
.
<option value="174"> CZ - ústecky kraj</option>
<option value="175"> CZ - Zlínsky kraj</option>
</select>
</td>
</tr>
<tr>
<td class="css_sfotkou" colspan="4">
<input type="checkbox" name="foto" value="1" id="foto" />
<label for="foto">Iba pou?ívatelia s fotkou</label>
</td>
</tr>
<tr>
<td class="css_miestnost" colspan="4">
<select name="akt-miest" id="onoffaci">
<option value="a_0">V?etci</option>
.
.
.
<optgroup label="Zá?uby a záujmy">
<option value="m_1419307"> Bez Lásky</option>
.
.
.
<option value="m_1108016"> Drum N Bass</option>
</optgroup>
</select>
</td>
</tr>
I need parse value from <select name="akt-miest" id="onoffaci">
For example:
<option value="**a_0**">**V?etci**</option>
I need get value **a_0**
and text **V?etci**
.
So I try first access to select by Id:
var selectNode = htmlDoc.GetElementbyId("onoffaci");
Then with Xpath select all option node.
var nodes = selectNode.SelectNodes("//option");
And get values:
foreach (var node in nodes)
{
string roomName = node.NextSibling.InnerText;
string roomId = node.Attributes["value"].Value;
rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}
But I get values from another select (<select id="region" name="region">
) this select is on the top of html code.
EDITED:
I apply advice of Darin Dimitrov an try this:
HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");
var nodes = selectNode.SelectNodes("option");
foreach (var node in nodes)
{
string roomName = node.NextSibling.InnerText;
string roomId = node.Attributes["value"].Value;
rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}
return rooms;
I parse only first three option element, because I think the problem is that select consist
optgroup tag.
<select name="akt-miest" id="onoffaci">
<option value="a_0">V?etci</option>
<option value="a_1">Iba prihlásení</option>
<option value="a_5" selected="selected">Teraz na Pokeci</option>
<optgroup label="Hlavné miestnosti">
<option value="m_13"> Bez záv?zkov</option>
<option value="m_9"> Do pohody</option>
<option value="m_39"> Dámsky klub</option>
</optgroup>
.
.
.
I try select all following node with this
var nodes = selectNode.SelectNodes("option::*");
But I get this error: xpath has an invalid token.
I would like access to all childs of selectNode:
HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");
EDIT #2:
Here is it all html file, from which I need parse option tags.
http://hotfile.com/dl/98442053/577b556/source.html
See Question&Answers more detail:
os