Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
73 views
in Technique[技术] by (71.8m points)

python - Get results in to df when using function

I am trying to use BeautifulSoup to scrape a table whose information I only want from one column. I have put this code in a function so that I can more easily apply this to multiple pages. As soon as I call the function multiple times I get multiple lists, but as soon as I want to convert this list into a dataframe I get the results in columns instead of rows.

total_points = []

def getTotalpoints(tag):
    url = f'https://www.procyclingstats.com/team/{tag}/analysis/start'
    html_content = requests.get(url).text
    soup = BeautifulSoup(html_content, "lxml")

    team_riders = soup.find_all("table", attrs={"class": "basic"})

    table = soup.findAll('table')[0]
    rows = table.findAll('tr')
    heading = table.find('tr')

    headings = []
    for item in heading.find_all("th"): # loop through all th elements
        # convert the th elements to text and strip "
"
        item = (item.text).rstrip("
")
        # append the clean column name to headings
        headings.append(item)
    headings_true = headings[4]
    # print(headings)

  
    points = []
    for row in rows[1:]:
        points.append(row.findAll('td')[4].text)

    total_points.append(points)
    
    return

getTotalpoints('astana-pro-team-2010')
getTotalpoints('astana-pro-team-2013')
getTotalpoints('astana-pro-team-2016')

print(total_points)

[['1372', '1076', '581', '579', '334', '288', '282', '222', '183', '146', '116', '106', '106', '102', '78', '77', '68', '54', '43', '41', '40', '38', '25', '11', '10', '5', '5'], ['2225', '838', '682', '538', '457', '456', '411', '410', '329', '286', '284', '237', '205', '196', '150', '114', '110', '109', '104', '72', '68', '67', '56', '46', '45', '28', '16', '10', '10'], ['1178', '849', '772', '701', '663', '572', '548', '530', '355', '267', '249', '247', '239', '200', '188', '175', '160', '133', '113', '109', '96', '75', '74', '68', '50', '40', '38', '37', '31', '5', '', '']]


df = pd.DataFrame(total_points)

print(df)

 0     1    2    3    4    5    6    7    8    9   ...  22  23  24  25  
0  1372  1076  581  579  334  288  282  222  183  146  ...  25  11  10   5   
1  2225   838  682  538  457  456  411  410  329  286  ...  56  46  45  28   
2  1178   849  772  701  663  572  548  530  355  267  ...  74  68  50  40   

   26    27    28    29    30    31  
0   5  None  None  None  None  None  
1  16    10    10  None  None  None  
2  38    37    31     5       

  

How can i achieve that every list becomes it's own column with all the rows under it? I would like to have the results like:

column 1 column 2 column 3
row 1    row 1      row 1
row 2    row 2      row 2
row 3    row 3      row 3
row 4    row 4      row 4
etc      etc        etc

So every list in its own column instead of every row in its own column.

Thanks for your answers!

question from:https://stackoverflow.com/questions/65938382/get-results-in-to-df-when-using-function

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you know the column names and their number matches with the number of the inner lists, then you can do as follows.

import pandas as pd

total_points = [
    [1, 2, 3, 4, 5],
    [4, 5, 6, 7, 8],
    [5, 6, 7, 8, 9],
]

col_names = ['col1', 'col2', 'col3']

df = pd.DataFrame(zip(*total_points), columns=col_names)
print(df)

Output

   col1  col2  col3
0     1     4     5
1     2     5     6
2     3     6     7
3     4     7     8
4     5     8     9

Here zip is used to make a transpose operation, so that DataFrame initializer correctly treats your inner lists as columns in the resulting dataframe.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...