Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
331 views
in Technique[技术] by (71.8m points)

python - 网页抓取understats.com“过滤器”(Web scraping understats.com “filter”)

after following this python scrip I tried several ways to save the data just for home games and away games, just like it is shown in https://understat.com/league/EPL , [enter image description here][1]

(遵循此python脚本之后,我尝试了几种方法来保存仅用于主场比赛和客场比赛的数据,就像https://understat.com/league/EPL所示,[在此处输入图像说明] [1])

Basically I want to scrap the data in the blue and away box, in the red this script is already doing the job.

(基本上,我想将数据剪贴在蓝色和远离的框中,在红色中,此脚本已在执行此工作。)

pic: [1]: https://i.stack.imgur.com/G9K4F.png

(图片:[1]: https : //i.stack.imgur.com/G9K4F.png)

code:

(码:)

Based on the structure of the webpage, I found that data is in the JSON variable, under tags (根据网页的结构,我发现数据位于JSON变量的标记下)

scripts = soup.find_all('script')

string_with_json_obj = ''

# Find data for teams
for el in scripts:
    if 'teamsData' in el.text:
      string_with_json_obj = el.text.strip()

#print(string_with_json_obj)

# strip unnecessary symbols and get only JSON data
ind_start = string_with_json_obj.index("('")+2
ind_end = string_with_json_obj.index("')")
json_data = string_with_json_obj[ind_start:ind_end]
json_data = json_data.encode('utf8').decode('unicode_escape')


# convert JSON data into Python dictionary
data = json.loads(json_data)

# Get teams and their relevant ids and put them into separate dictionary
teams = {}
for id in data.keys():
  teams[id] = data[id]['title']

# EDA to get a feeling of how the JSON is structured
# Column names are all the same, so we just use first element
columns = []
# Check the sample of values per each column
values = []
for id in data.keys():
  columns = list(data[id]['history'][0].keys())
  values = list(data[id]['history'][0].values())

  break

# Getting data for all teams
dataframes = {}
for id, team in teams.items():
  teams_data = []
  for row in data[id]['history']:
    teams_data.append(list(row.values()))

  df = pd.DataFrame(teams_data, columns=columns)
  dataframes[team] = df

this script is only scraping for "all" games I would like to create but only for away games and for home games, and export these data to CSV

(该脚本仅适用于我要创建的“所有”游戏,而仅适用于客场游戏和家庭游戏,并将这些数据导出到CSV)

credits: https://towardsdatascience.com/web-scraping-advanced-football-statistics-11cace1d863a

(学分: https : //towardsdatascience.com/web-scraping-advanced-football-statistics-11cace1d863a)

  ask by JoseM117 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...