I didn't find a standard solution that works with Jsoup. Here's my simple cookie handling using a HashMap. It's probably missing a bunch of functionalities but I hope it'll work well enough for my basic crawler:
private static HashMap<String, HashMap<String, String>> host2cookies = new HashMap<String, HashMap<String, String>>();
public static String[] DownloadPage(URL url) throws Exception
{
Connection con = Jsoup.connect(url.toString()).timeout(600000);
loadCookiesByHost(url, con);
Document doc = con.get();
url = con.request().url();
storeCookiesByHost(url, con);
return new String[]{url.toString(), doc.html()};
}
private static void loadCookiesByHost(URL url, Connection con) {
try {
String host = url.getHost();
if (host2cookies.containsKey(host)) {
HashMap<String, String> cookies = host2cookies.get(host);
for (Entry<String, String> cookie : cookies.entrySet()) {
con.cookie(cookie.getKey(), cookie.getValue());
}
}
} catch (Throwable t) {
// MTMT move to log
System.err.println(t.toString()+":: Error loading cookies to: " + url);
}
}
private static void storeCookiesByHost(URL url, Connection con) {
try {
String host = url.getHost();
HashMap<String, String> cookies = host2cookies.get(host);
if (cookies == null) {
cookies = new HashMap<String, String>();
host2cookies.put(host, cookies);
}
cookies.putAll(con.response().cookies());
} catch (Throwable t) {
// MTMT move to log
System.err.println(t.toString()+":: Error saving cookies from: " + url);
}
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…