I'm trying to parse tabular content out of some html elements and arrange them in customized manner so that I can write them accordingly in a csv file later.
The table looks almost exactly like this.
Html elements are like (truncated):
<tr>
<td align="center" colspan="4" class="header">ATLANTIC</td>
</tr>
<tr>
<td class="black10bold">Facility</td>
<td class="black10bold">Type</td>
<td class="black10bold">Funding</td>
</tr>
<tr>
<td style="width: 55%">
<a href="fsFacilityDetails.aspx?item=NJ60104"> Complete Care at Linwood, LLC </a>
</td>
</tr>
<tr>
<td style="width: 55%">
<a href="fsFacilityDetails.aspx?item=NJ60102">The Health Center At Galloway</a>
</td>
</tr>
<tr>
<td align="center" colspan="4" class="header">BERGEN</td>
</tr>
<tr>
<td class="black10bold">Facility</td>
<td class="black10bold">Type</td>
<td class="black10bold">Funding</td>
</tr>
<tr>
<td style="width: 55%">
<a href="fsFacilityDetails.aspx?item=30201">The Actors Fund Homes</a>
</td>
</tr>
<tr>
<td style="width: 55%">
<a href="fsFacilityDetails.aspx?item=NJAL02007"> Actors Fund Home, The </a>
</td>
</tr>
I've tried so far:
for item in soup.select("tr"):
try:
header = item.select_one("td.header").text
except AttributeError:
header = ""
try:
item_name = item.select_one("td > a").text
except AttributeError:
item_name = ""
print(item_name,header)
Output it produces:
ATLANTIC
Complete Care at Linwood, LLC
The Health Center At Galloway
BERGEN
The Actors' Fund Homes
Actors Fund Home, The
Output I would like to have:
Complete Care at Linwood, LLC ATLANTIC
The Health Center At Galloway ATLANTIC
The Actors' Fund Homes BERGEN
Actors Fund Home, The BERGEN
question from:
https://stackoverflow.com/questions/65833506/cant-scrape-some-data-out-of-a-table-in-a-customized-way 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…