Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
312 views
in Technique[技术] by (71.8m points)

java - What is the fastest way to get the domain/host name from a URL?

I need to go through a large list of string url's and extract the domain name from them.

For example:

http://www.stackoverflow.com/questions would extract www.stackoverflow.com

I originally was using new URL(theUrlString).getHost() but the URL object initialization adds a lot of time to the process and seems unneeded.

Is there a faster method to extract the host name that would be as reliable?

Thanks

Edit: My mistake, yes the www. would be included in domain name example above. Also, these urls may be http or https

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you want to handle https etc, I suggest you do something like this:

int slashslash = url.indexOf("//") + 2;
domain = url.substring(slashslash, url.indexOf('/', slashslash));

Note that this is includes the www part (just as URL.getHost() would do) which is actually part of the domain name.

Edit Requested via comments

Here are two methods that might be helpful:

/**
 * Will take a url such as http://www.stackoverflow.com and return www.stackoverflow.com
 * 
 * @param url
 * @return
 */
public static String getHost(String url){
    if(url == null || url.length() == 0)
        return "";

    int doubleslash = url.indexOf("//");
    if(doubleslash == -1)
        doubleslash = 0;
    else
        doubleslash += 2;

    int end = url.indexOf('/', doubleslash);
    end = end >= 0 ? end : url.length();

    int port = url.indexOf(':', doubleslash);
    end = (port > 0 && port < end) ? port : end;

    return url.substring(doubleslash, end);
}


/**  Based on : http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.3.3_r1/android/webkit/CookieManager.java#CookieManager.getBaseDomain%28java.lang.String%29
 * Get the base domain for a given host or url. E.g. mail.google.com will return google.com
 * @param host 
 * @return 
 */
public static String getBaseDomain(String url) {
    String host = getHost(url);

    int startIndex = 0;
    int nextIndex = host.indexOf('.');
    int lastIndex = host.lastIndexOf('.');
    while (nextIndex < lastIndex) {
        startIndex = nextIndex + 1;
        nextIndex = host.indexOf('.', startIndex);
    }
    if (startIndex > 0) {
        return host.substring(startIndex);
    } else {
        return host;
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...