Need a way to extract a domain name without the subdomain from a url using Python urlparse.
For example, I would like to extract "google.com"
from a full url like "http://www.google.com"
.
The closest I can seem to come with urlparse
is the netloc
attribute, but that includes the subdomain, which in this example would be www.google.com
.
I know that it is possible to write some custom string manipulation to turn www.google.com into google.com, but I want to avoid by-hand string transforms or regex in this task. (The reason for this is that I am not familiar enough with url formation rules to feel confident that I could consider every edge case required in writing a custom parsing function.)
Or, if urlparse
can't do what I need, does anyone know any other Python url-parsing libraries that would?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…