Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
450 views
in Technique[技术] by (71.8m points)

web scraping - Webscraping website search bars with python

I am trying to write some code for a personal project where i can scrape data from a site while also using that site's query box.

Furthermore, the website i am trying to use is https://www.latlong.net/convert-address-to-lat-long.html and I am trying to have a portion of my program where you input your address.

Then the request goes to the url's address search bar and perfoms the query, and then extracts the lat/lon elements from the site and stores it in a dataframe.

I know i will need to use beautifulsoup and, from what ive read, possibly mechanize and selenium but i got a a little lost trying to read up on mechanize.

question from:https://stackoverflow.com/questions/65894749/webscraping-website-search-bars-with-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You might want to use the backend endpoint.

For example:

import pandas as pd
import requests
from urllib.parse import urlencode

search_query = "Berlin, Germany"

payload = {
    "c1": search_query,
    "action": "gpcm",
    "cp": "",
}

headers = {
    "content-type": "application/x-www-form-urlencoded",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
    "referer": "https://www.latlong.net/convert-address-to-lat-long.html",
    "x-requested-with": "XMLHttpRequest",
    "cookie": "".join(
          f"{k}={v}" for k, v
          in requests.get("https://www.latlong.net").cookies.get_dict().items()
    ),
}

response = requests.post(
      "https://www.latlong.net/_spm4.php",
      data=urlencode(payload),
      headers=headers,
).text

df = pd.DataFrame(
      [[*search_query.split(", "), *response.split(",")]],
      columns=["City", "Country", "Latitude", "Longitude"],
)
print(df)

Output:

     City  Country   Latitude  Longitude
0  Berlin  Germany  52.520008  13.404954

PS. Don't overuse this, as they're going to throttle your requests. Or use a VPN to keep quering.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...