Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
128 views
in Technique[技术] by (71.8m points)

python - Can't scrape some data out of a table in a customized way

I'm trying to parse tabular content out of some html elements and arrange them in customized manner so that I can write them accordingly in a csv file later.

The table looks almost exactly like this.

Html elements are like (truncated):

<tr>
    <td align="center" colspan="4" class="header">ATLANTIC</td>
</tr>
<tr>
    <td class="black10bold">Facility</td>
    <td class="black10bold">Type</td>
    <td class="black10bold">Funding</td>
</tr>
<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=NJ60104"> Complete Care at Linwood, LLC </a>
    </td>
</tr>
<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=NJ60102">The Health Center At Galloway</a>
    </td>
</tr>

<tr>
    <td align="center" colspan="4" class="header">BERGEN</td>
</tr>

<tr>
    <td class="black10bold">Facility</td>
    <td class="black10bold">Type</td>
    <td class="black10bold">Funding</td>
</tr>

<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=30201">The Actors Fund Homes</a>
    </td>
</tr>
<tr>
    <td style="width: 55%">
        <a href="fsFacilityDetails.aspx?item=NJAL02007"> Actors Fund Home, The </a>
    </td>
</tr>

I've tried so far:

for item in soup.select("tr"):
    try:
        header = item.select_one("td.header").text
    except AttributeError:
        header = ""
    try:
        item_name = item.select_one("td > a").text
    except AttributeError:
        item_name = ""
    print(item_name,header)

Output it produces:

ATLANTIC
 
Complete Care at Linwood, LLC  
The Health Center At Galloway 

 BERGEN
 
The Actors' Fund Homes
Actors Fund Home, The 

Output I would like to have:

Complete Care at Linwood, LLC  ATLANTIC
The Health Center At Galloway  ATLANTIC
The Actors' Fund Homes         BERGEN
Actors Fund Home, The          BERGEN
question from:https://stackoverflow.com/questions/65833506/cant-scrape-some-data-out-of-a-table-in-a-customized-way

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This should produce the output the way you wanted to have.

for item in soup.select("tr"):
    if item.select_one("td.header"):
        header = item.select_one("td.header").text

    elif item.select_one("td > a"):
        item_name = item.select_one("td > a").text
        print(item_name,header)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...