本文整理汇总了Java中crawlercommons.sitemaps.AbstractSiteMap类的典型用法代码示例。如果您正苦于以下问题:Java AbstractSiteMap类的具体用法?Java AbstractSiteMap怎么用?Java AbstractSiteMap使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
AbstractSiteMap类属于crawlercommons.sitemaps包,在下文中一共展示了AbstractSiteMap类的5个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: characters
import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
String localName = super.currentElement();
String value = String.valueOf(ch, start, length);
if ("pubDate".equals(localName)) {
lastMod = AbstractSiteMap.normalizeRSSTimestamp(value);
if ("channel".equals(super.currentElementParent())) {
sitemap.setLastModified(lastMod);
}
} else if ("link".equals(localName)) {
String href = value;
LOG.debug("href = {}", href);
try {
loc = new URL(href);
valid = urlIsValid(sitemap.getBaseUrl(), href);
} catch (MalformedURLException e) {
LOG.trace("Can't create an entry with a bad URL", e);
LOG.debug("Bad url: [{}]", href);
}
}
}
开发者ID:crawler-commons,项目名称:crawler-commons,代码行数:22,代码来源:RSSHandler.java
示例2: getSitemapsForUrl
import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
public List<AbstractSiteMap> getSitemapsForUrl(String sitemapUrl) {
List<AbstractSiteMap> sitemaps = new ArrayList<>();
SiteMapParser siteMapParser = new SiteMapParser();
try {
Uri uri = Uri.create(sitemapUrl);
Blob blob = Requesters.of(uri.getScheme()).get().get(uri);
String contentType = blob.getMetadata().getContentMetadata().contentType() != null ? blob.getMetadata().getContentMetadata().contentType()
: "text/xml";
AbstractSiteMap sitemap = siteMapParser.parseSiteMap(contentType, IOUtils.toByteArray(blob.getPayload().openStream()), new URL(sitemapUrl));
if (sitemap.isIndex()) {
sitemaps.addAll(((SiteMapIndex) sitemap).getSitemaps());
} else {
sitemaps.add(sitemap);
}
} catch (Exception e) {
log.debug("", e);
}
return sitemaps;
}
开发者ID:Treydone,项目名称:mandrel,代码行数:23,代码来源:AnalysisService.java
示例3: getSiteMap
import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
public AbstractSiteMap getSiteMap() {
return sitemap;
}
开发者ID:crawler-commons,项目名称:crawler-commons,代码行数:4,代码来源:RSSHandler.java
示例4: getSiteMap
import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
public AbstractSiteMap getSiteMap() {
if (delegate == null)
return null;
return delegate.getSiteMap();
}
开发者ID:crawler-commons,项目名称:crawler-commons,代码行数:6,代码来源:DelegatorHandler.java
示例5: buildReport
import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
protected Analysis buildReport(Job job, Blob blob) {
Analysis report;
if (blob.getMetadata().getUri().getScheme().startsWith("http")) {
HttpAnalysis temp = new HttpAnalysis();
// Robots.txt
Uri pageURL = blob.getMetadata().getUri();
String robotsTxtUrl = pageURL.getScheme() + "://" + pageURL.getHost() + ":" + pageURL.getPort() + "/robots.txt";
ExtendedRobotRules robotRules = RobotsTxtUtils.getRobotRules(robotsTxtUrl);
temp.robotRules(robotRules);
// Sitemaps
if (robotRules != null && robotRules.getSitemaps() != null) {
Map<String, List<AbstractSiteMap>> sitemaps = new HashMap<>();
robotRules.getSitemaps().forEach(url -> {
List<AbstractSiteMap> results = getSitemapsForUrl(url);
sitemaps.put(url, results);
});
temp.sitemaps(sitemaps);
}
report = temp;
} else {
report = new Analysis();
}
if (job.getDefinition().getExtractors() != null) {
Map<String, Instance<?>> cachedSelectors = new HashMap<>();
// Page extraction
if (job.getDefinition().getExtractors().getData() != null) {
Map<String, List<Document>> documentsByExtractor = job.getDefinition().getExtractors().getData().stream()
.map(ex -> Pair.of(ex.getName(), extractorService.extractThenFormat(cachedSelectors, blob, ex)))
.filter(pair -> pair != null && pair.getKey() != null && pair.getValue() != null)
.collect(Collectors.toMap(key -> key.getLeft(), value -> value.getRight()));
report.documents(documentsByExtractor);
}
// Link extraction
if (job.getDefinition().getExtractors().getOutlinks() != null) {
Map<String, Pair<Set<Link>, Set<Link>>> outlinksByExtractor = job.getDefinition().getExtractors().getOutlinks().stream().map(ol -> {
return Pair.of(ol.getName(), extractorService.extractAndFilterOutlinks(job, blob.getMetadata().getUri(), cachedSelectors, blob, ol));
}).collect(Collectors.toMap(key -> key.getLeft(), value -> value.getRight()));
report.outlinks(Maps.transformEntries(outlinksByExtractor, (key, entries) -> entries.getLeft()));
report.filteredOutlinks(Maps.transformEntries(outlinksByExtractor, (key, entries) -> entries.getRight()));
}
}
report.metadata(blob.getMetadata());
return report;
}
开发者ID:Treydone,项目名称:mandrel,代码行数:53,代码来源:AnalysisService.java
注:本文中的crawlercommons.sitemaps.AbstractSiteMap类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论