python - Can't scrape some data out of a table in a customized way

Question

Welcome To Ask or Share your Answers For Others

python - Can't scrape some data out of a table in a customized way

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Can't scrape some data out of a table in a customized way

I'm trying to parse tabular content out of some html elements and arrange them in customized manner so that I can write them accordingly in a csv file later.

The table looks almost exactly like this.

Html elements are like (truncated):

<tr>
    <td align="center" colspan="4" class="header">ATLANTIC</td>
</tr>
<tr>
    <td class="black10bold">Facility</td>
    <td class="black10bold">Type</td>
    <td class="black10bold">Funding</td>
</tr>
<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=NJ60104"> Complete Care at Linwood, LLC </a>
    </td>
</tr>
<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=NJ60102">The Health Center At Galloway</a>
    </td>
</tr>

<tr>
    <td align="center" colspan="4" class="header">BERGEN</td>
</tr>

<tr>
    <td class="black10bold">Facility</td>
    <td class="black10bold">Type</td>
    <td class="black10bold">Funding</td>
</tr>

<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=30201">The Actors Fund Homes</a>
    </td>
</tr>
<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=NJAL02007"> Actors Fund Home, The </a>
    </td>
</tr>

I've tried so far:

for item in soup.select("tr"):
    try:
        header = item.select_one("td.header").text
    except AttributeError:
        header = ""
    try:
        item_name = item.select_one("td > a").text
    except AttributeError:
        item_name = ""
    print(item_name,header)

Output it produces:

ATLANTIC
 
Complete Care at Linwood, LLC  
The Health Center At Galloway 

 BERGEN
 
The Actors' Fund Homes
Actors Fund Home, The

Output I would like to have:

Complete Care at Linwood, LLC  ATLANTIC
The Health Center At Galloway  ATLANTIC
The Actors' Fund Homes         BERGEN
Actors Fund Home, The          BERGEN

question from:https://stackoverflow.com/questions/65833506/cant-scrape-some-data-out-of-a-table-in-a-customized-way

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:36:44+0000

This should produce the output the way you wanted to have.

for item in soup.select("tr"):
    if item.select_one("td.header"):
        header = item.select_one("td.header").text

    elif item.select_one("td > a"):
        item_name = item.select_one("td > a").text
        print(item_name,header)

Categories

python - Can't scrape some data out of a table in a customized way

python - Can't scrape some data out of a table in a customized way

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags