• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

Java UrlUtils类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中us.codecraft.webmagic.utils.UrlUtils的典型用法代码示例。如果您正苦于以下问题:Java UrlUtils类的具体用法?Java UrlUtils怎么用?Java UrlUtils使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



UrlUtils类属于us.codecraft.webmagic.utils包,在下文中一共展示了UrlUtils类的15个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: getContent

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
protected String getContent(String charset, HttpResponse response) throws IOException {
	if(charset == null) {
		long contentLength = response.getEntity().getContentLength();
		if(response.getFirstHeader("Content-Type") != null 
				&& !response.getFirstHeader("Content-Type").getValue().toLowerCase().contains("text/html")) 
			throw new IllegalArgumentException("此链接为非html内容,不下载,内容类型:" + response.getFirstHeader("Content-Type"));
		else if(contentLength>value.getMaxDownloadLength())
			throw new IllegalArgumentException("网页内容长度超过最大限制,要求最大长度:" + value.getMaxDownloadLength() + ",实际长度:" + contentLength);
		byte[] contentBytes = IOUtils.toByteArray(response.getEntity().getContent());
		String htmlCharset = UrlUtils.getCharset(response.getEntity().getContentType().getValue());
		if (htmlCharset != null) {
               return new String(contentBytes, htmlCharset);
           } else {
               LOG.warn("自动探测字符集失败, 使用 {} 作为字符集。请在Site.setCharset()指定字符集", Charset.defaultCharset());
               return new String(contentBytes);
           }
	} else 
		return IOUtils.toString(response.getEntity().getContent(), charset);
}
 
开发者ID:TransientBuckwheat,项目名称:nest-spider,代码行数:20,代码来源:ContentDownloader.java


示例2: handleResponse

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
protected Page handleResponse(Request request, String charset, HttpResponse httpResponse, Task task)
		throws IOException {
	String content = IOUtils.toString(httpResponse.getEntity().getContent(), charset);
	Page page = new Page();
	page.setHtml(new Html(UrlUtils.fixAllRelativeHrefs(content, request.getUrl())));
	page.setUrl(new PlainText(request.getUrl()));
	page.setRequest(request);

	// set http response value
	page.putHttpResponse(Constant.STATUS_CODE, httpResponse.getStatusLine().getStatusCode() + "");
	Header[] headers = httpResponse.getAllHeaders();
	for (Header header : headers) {
		page.putHttpResponse(header.getName(), header.getValue());
	}

	return page;
}
 
开发者ID:yuany,项目名称:en-webmagic,代码行数:18,代码来源:HttpClientDownloader.java


示例3: getAll

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * Download urls synchronizing.
 *
 * @param urls urls
 * @param <T> type of process result
 * @return list downloaded
 */
public <T> List<T> getAll(Collection<String> urls) {
    destroyWhenExit = false;
    spawnUrl = false;
    if (startRequests!=null){
        startRequests.clear();
    }
    for (Request request : UrlUtils.convertToRequests(urls)) {
        addRequest(request);
    }
    CollectorPipeline collectorPipeline = getCollectorPipeline();
    pipelines.add(collectorPipeline);
    run();
    spawnUrl = true;
    destroyWhenExit = true;
    return collectorPipeline.getCollected();
}
 
开发者ID:code4craft,项目名称:webmagic,代码行数:24,代码来源:Spider.java


示例4: convertHttpClientContext

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
private HttpClientContext convertHttpClientContext(Request request, Site site, Proxy proxy) {
    HttpClientContext httpContext = new HttpClientContext();
    if (proxy != null && proxy.getUsername() != null) {
        AuthState authState = new AuthState();
        authState.update(new BasicScheme(ChallengeState.PROXY), new UsernamePasswordCredentials(proxy.getUsername(), proxy.getPassword()));
        httpContext.setAttribute(HttpClientContext.PROXY_AUTH_STATE, authState);
    }
    if (request.getCookies() != null && !request.getCookies().isEmpty()) {
        CookieStore cookieStore = new BasicCookieStore();
        for (Map.Entry<String, String> cookieEntry : request.getCookies().entrySet()) {
            BasicClientCookie cookie1 = new BasicClientCookie(cookieEntry.getKey(), cookieEntry.getValue());
            cookie1.setDomain(UrlUtils.removePort(UrlUtils.getDomain(request.getUrl())));
            cookieStore.addCookie(cookie1);
        }
        httpContext.setCookieStore(cookieStore);
    }
    return httpContext;
}
 
开发者ID:code4craft,项目名称:webmagic,代码行数:19,代码来源:HttpUriRequestConverter.java


示例5: getAll

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * Download urls synchronizing.
 *
 * @param urls urls
 * @return list downloaded
 */
public <T> List<T> getAll(Collection<String> urls) {
    destroyWhenExit = false;
    spawnUrl = false;
    startRequests.clear();
    for (Request request : UrlUtils.convertToRequests(urls)) {
        addRequest(request);
    }
    CollectorPipeline collectorPipeline = getCollectorPipeline();
    pipelines.add(collectorPipeline);
    run();
    spawnUrl = true;
    destroyWhenExit = true;
    return collectorPipeline.getCollected();
}
 
开发者ID:hexiaohong-code,项目名称:LoginCrawler,代码行数:21,代码来源:SpiderLogin.java


示例6: addTargetRequests

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * 添加待抓取的链接
 *
 * @param requests 待抓取的链接
 */
public void addTargetRequests(List<String> requests) {
    synchronized (targetRequests) {
        for (String s : requests) {
            if (StringUtils.isBlank(s) || s.equals("#") || s.startsWith("javascript:")) {
                break;
            }
            s = UrlUtils.canonicalizeUrl(s, url.toString());
            targetRequests.add(new Request(s));
        }
    }
}
 
开发者ID:yuany,项目名称:en-webmagic,代码行数:17,代码来源:Page.java


示例7: addTargetRequest

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * 添加待抓取的链接
 *
 * @param requestString 待抓取的链接
 */
public void addTargetRequest(String requestString) {
    if (StringUtils.isBlank(requestString) || requestString.equals("#")) {
        return;
    }
    synchronized (targetRequests) {
        requestString = UrlUtils.canonicalizeUrl(requestString, url.toString());
        targetRequests.add(new Request(requestString));
    }
}
 
开发者ID:yuany,项目名称:en-webmagic,代码行数:15,代码来源:Page.java


示例8: getDomain

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * 获取已设置的domain
 *
 * @return 已设置的domain
 */
public String getDomain() {
    if (domain == null) {
        if (startUrls.size() > 0) {
            domain = UrlUtils.getDomain(startUrls.get(0));
        }
    }
    return domain;
}
 
开发者ID:yuany,项目名称:en-webmagic,代码行数:14,代码来源:Site.java


示例9: SimplePageProcessor

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
public SimplePageProcessor(String startUrl, String urlPattern) {
    this.site = Site.me().addStartUrl(startUrl).
            setDomain(UrlUtils.getDomain(startUrl)).setUserAgent(UA);
    //compile "*" expression to regex
    this.urlPattern = "("+urlPattern.replace(".","\\.").replace("*","[^\"'#]*")+")";

}
 
开发者ID:yuany,项目名称:en-webmagic,代码行数:8,代码来源:SimplePageProcessor.java


示例10: addTargetRequests

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * add urls to fetch
 *
 * @param requests requests
 */
public void addTargetRequests(List<String> requests) {
    for (String s : requests) {
        if (StringUtils.isBlank(s) || s.equals("#") || s.startsWith("javascript:")) {
            continue;
        }
        s = UrlUtils.canonicalizeUrl(s, url.toString());
        targetRequests.add(new Request(s));
    }
}
 
开发者ID:code4craft,项目名称:webmagic,代码行数:15,代码来源:Page.java


示例11: addTargetRequest

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
/**
 * add url to fetch
 *
 * @param requestString requestString
 */
public void addTargetRequest(String requestString) {
    if (StringUtils.isBlank(requestString) || requestString.equals("#")) {
        return;
    }
    requestString = UrlUtils.canonicalizeUrl(requestString, url.toString());
    targetRequests.add(new Request(requestString));
}
 
开发者ID:code4craft,项目名称:webmagic,代码行数:13,代码来源:Page.java


示例12: convertHttpUriRequest

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
private HttpUriRequest convertHttpUriRequest(Request request, Site site, Proxy proxy) {
    RequestBuilder requestBuilder = selectRequestMethod(request).setUri(UrlUtils.fixIllegalCharacterInUrl(request.getUrl()));
    if (site.getHeaders() != null) {
        for (Map.Entry<String, String> headerEntry : site.getHeaders().entrySet()) {
            requestBuilder.addHeader(headerEntry.getKey(), headerEntry.getValue());
        }
    }

    RequestConfig.Builder requestConfigBuilder = RequestConfig.custom();
    if (site != null) {
        requestConfigBuilder.setConnectionRequestTimeout(site.getTimeOut())
                .setSocketTimeout(site.getTimeOut())
                .setConnectTimeout(site.getTimeOut())
                .setCookieSpec(CookieSpecs.STANDARD);
    }

    if (proxy != null) {
        requestConfigBuilder.setProxy(new HttpHost(proxy.getHost(), proxy.getPort()));
    }
    requestBuilder.setConfig(requestConfigBuilder.build());
    HttpUriRequest httpUriRequest = requestBuilder.build();
    if (request.getHeaders() != null && !request.getHeaders().isEmpty()) {
        for (Map.Entry<String, String> header : request.getHeaders().entrySet()) {
            httpUriRequest.addHeader(header.getKey(), header.getValue());
        }
    }
    return httpUriRequest;
}
 
开发者ID:code4craft,项目名称:webmagic,代码行数:29,代码来源:HttpUriRequestConverter.java


示例13: addRequest

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
private void addRequest(Request request) {
    if (site.getDomain() == null && request != null && request.getUrl() != null) {
        site.setDomain(UrlUtils.getDomain(request.getUrl()));
    }
    scheduler.push(request, this);
}
 
开发者ID:hexiaohong-code,项目名称:LoginCrawler,代码行数:7,代码来源:SpiderLogin.java


示例14: test_illegal_uri_correct

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
@Test
public void test_illegal_uri_correct() throws Exception {
    HttpUriRequestConverter httpUriRequestConverter = new HttpUriRequestConverter();
    HttpClientRequestContext requestContext = httpUriRequestConverter.convert(new Request(UrlUtils.fixIllegalCharacterInUrl("http://bj.zhongkao.com/beikao/yimo/##")), Site.me(), null);
    assertThat(requestContext.getHttpUriRequest().getURI()).isEqualTo(new URI("http://bj.zhongkao.com/beikao/yimo/#"));
}
 
开发者ID:code4craft,项目名称:webmagic,代码行数:7,代码来源:HttpUriRequestConverterTest.java


示例15: registerMBean

import us.codecraft.webmagic.utils.UrlUtils; //导入依赖的package包/类
protected void registerMBean(SpiderStatusMXBean spiderStatus) throws MalformedObjectNameException, InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException {
//        ObjectName objName = new ObjectName(jmxServerName + ":name=" + spiderStatus.getName());
        ObjectName objName = new ObjectName(jmxServerName + ":name=" + UrlUtils.removePort(spiderStatus.getName()));
        mbeanServer.registerMBean(spiderStatus, objName);
    }
 
开发者ID:code4craft,项目名称:webmagic,代码行数:6,代码来源:SpiderMonitor.java



注:本文中的us.codecraft.webmagic.utils.UrlUtils类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java EncryptedAssertion类代码示例发布时间:2022-05-22
下一篇:
Java ReflectionPool类代码示例发布时间:2022-05-22
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap