python - Using Beautiful Soup to get the full URL in source code

Question

Welcome To Ask or Share your Answers For Others

python - Using Beautiful Soup to get the full URL in source code

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Using Beautiful Soup to get the full URL in source code

So I was looking at some source code and I came across this bit of code

<img src="/gallery/2012-winners-finalists/HM_Watching%20birds2_Shane%20Conklin_MA_2012.jpg"

now in the source code the link is blue and when you click it, it takes you to the full URL where that picture is located, I know how to get what is shown in the source code in Python using Beautiful Soup I was wondering though how to get the full URL you get once clicking the link in the source code?

EDIT: if I was given <a href = "/folder/big/a.jpg" how do you figure out the starting part of that url through python or beautiful soup?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:55:57+0000

<a href="/folder/big/a.jpg">

That’s an absolute address for the current host. So if the HTML file is at http://example.com/foo/bar.html, then applying the url /folder/big/a.jpg will result in this:

http://example.com/folder/big/a.jpg

I.e. take the host name and apply the new path to it.

Python has the builtin urljoin function to perform this operation for you:

>>> from urllib.parse import urljoin
>>> base = 'http://example.com/foo/bar.html'
>>> href = '/folder/big/a.jpg'
>>> urljoin(base, href)
'http://example.com/folder/big/a.jpg'

For Python 2, the function is within the urlparse module.

Categories

python - Using Beautiful Soup to get the full URL in source code

python - Using Beautiful Soup to get the full URL in source code

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags