• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

Java SimpleNodeIterator类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.htmlparser.util.SimpleNodeIterator的典型用法代码示例。如果您正苦于以下问题:Java SimpleNodeIterator类的具体用法?Java SimpleNodeIterator怎么用?Java SimpleNodeIterator使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



SimpleNodeIterator类属于org.htmlparser.util包,在下文中一共展示了SimpleNodeIterator类的17个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: processNodeList

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
private static void processNodeList(NodeList list, String keyword) {
	// 迭代开始
	SimpleNodeIterator iterator = list.elements();
	while (iterator.hasMoreNodes()) {
		Node node = iterator.nextNode();
		// 得到该节点的子节点列表
		NodeList childList = node.getChildren();
		// 孩子节点为空,说明是值节点
		if (null == childList) {
			// 得到值节点的值
			String result = node.toPlainTextString();
			// 若包含关键字,则简单打印出来文本
			if (result.indexOf(keyword) != -1)
				System.out.println(result);
		} // end if
			// 孩子节点不为空,继续迭代该孩子节点
		else {
			processNodeList(childList, keyword);
		}// end else
	}// end wile
}
 
开发者ID:YufangWoo,项目名称:news-crawler,代码行数:22,代码来源:HtmlParserTest.java


示例2: getGangliaAttribute

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
public List<String> getGangliaAttribute(String clusterName)
		throws ParserException, MalformedURLException, IOException {
	String url = gangliaMetricUrl.replaceAll(clusterPattern, clusterName);
	Parser parser = new Parser(new URL(url).openConnection());
	NodeFilter nodeFilter = new AndFilter(new TagNameFilter("select"),
			new HasAttributeFilter("id", "metrics-picker"));
	NodeList nodeList = parser.extractAllNodesThatMatch(nodeFilter);
	SimpleNodeIterator iterator = nodeList.elements();
	List<String> metricList = new ArrayList<String>();
	while (iterator.hasMoreNodes()) {
		Node node = iterator.nextNode();

		SimpleNodeIterator childIterator = node.getChildren().elements();
		while (childIterator.hasMoreNodes()) {
			OptionTag children = (OptionTag) childIterator.nextNode();
			metricList.add(children.getOptionText());
		}
	}

	return metricList;

}
 
开发者ID:Ctrip-DI,项目名称:Hue-Ctrip-DI,代码行数:23,代码来源:GangliaHttpParser.java


示例3: main

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
public static void main(String[] args) throws Exception {
	Parser parser = new Parser(new URL("http://10.8.75.3/ganglia/?r=hour&cs=&ce=&s=by+name&c=Zookeeper_Cluster&tab=m&vn=&hide-hf=false").openConnection());
	NodeFilter nodeFilter = new AndFilter(new TagNameFilter("select"),
			new HasAttributeFilter("id", "metrics-picker"));
	NodeList nodeList = parser.extractAllNodesThatMatch(nodeFilter);
	SimpleNodeIterator iterator = nodeList.elements();
	while (iterator.hasMoreNodes()) {
		Node node = iterator.nextNode();

		SimpleNodeIterator childIterator = node.getChildren().elements();
		while (childIterator.hasMoreNodes()) {
			OptionTag children = (OptionTag) childIterator.nextNode();
			System.out.println(children.getOptionText());
		}
	}

}
 
开发者ID:Ctrip-DI,项目名称:Hue-Ctrip-DI,代码行数:18,代码来源:TestGangliaHttpParser.java


示例4: run

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@Override
public void run() {
	try {
		parser = new Parser(content);
		logger.info(currentThread().getName() + "开始解析Post请求响应的HTML!,并存储到HBASE中!");
		NodeIterator rootList = parser.elements();
		rootList.nextNode();
		NodeList nodeList = rootList.nextNode().getChildren();
		// System.out.println("===================="+nodeList.size());
		/*
		 * 判断该HTML响应是否有具体的内容,在出错或者到所有数据读取完毕时起效
		 * 如果起效,修改endFlag标志位,停止开启新的线程,结束当前任务!
		 */
		if (nodeList.size() <= 4) {
			program.endFlag = true;
		}
		/*
		 * 找到对应的tag记录,然后解析
		 */
		nodeList.remove(0);
		nodeList.remove(0);
		SimpleNodeIterator childList = nodeList.elements();
		while (childList.hasMoreNodes()) {
			Node node = childList.nextNode();
			if (node.getChildren() != null) {
				toObject(node);
			}
		}
	} catch (Exception e) {
		logger.error(currentThread().getName() + "解析HTML文件出现异常!\n"+e.getMessage()+"\n");
	} finally {
		logger.info(currentThread().getName() + "HTML文件解析结束!");
		store.close();
	}
}
 
开发者ID:husky00,项目名称:worm,代码行数:36,代码来源:PostRequestHtmlParser.java


示例5: listarCidades

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@ApiMethod(name = "listarCidades")
public ListaEstadosCidades listarCidades(@Named("state") String state) throws Exception{
    inicializaMapaEstados();
    if(mapaCidades== null){
        mapaCidades = new HashMap<String,Map<String,String>>();
    }
    if(!mapaCidades.containsKey(state)) {
        Map<String,String> mapa = new HashMap<String, String>();
        mapaCidades.put(state,mapa);
        String responseBody = recuperarDados(mapaEstados.get(state), null);
        NodeList nodeList = filterSelectNode(responseBody);
        Node cidadeNode = nodeList.elementAt(2);
        SimpleNodeIterator iteratorEstado = cidadeNode.getChildren().elements();
        while (iteratorEstado.hasMoreNodes()) {
            OptionTag node = (OptionTag) iteratorEstado.nextNode();
            String cidadeId = node.getValue();
            String cidadeNome = node.getChildren().elements().nextNode().getText();
            if(!(cidadeNome.indexOf("Selecione") != -1)) {
                //System.out.println(cidadeId+","+cidadeNome+","+mapaEstados.get(state));
                mapa.put(cidadeNome, cidadeId);
            }
        }
    }
    ListaEstadosCidades listaEstados = new ListaEstadosCidades();
    listaEstados.setLista(new ArrayList<String>(mapaCidades.get(state).keySet()));
    return listaEstados;

}
 
开发者ID:emivaljr,项目名称:hojenaoapp,代码行数:29,代码来源:MyEndpoint.java


示例6: preencheMapaEstados

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
private void preencheMapaEstados() throws IOException, ParserException {
    String responseBody = recuperarDados(null, null);
    NodeList nodeList = filterSelectNode(responseBody);
    Node estadoNode = nodeList.elementAt(1);
    SimpleNodeIterator iteratorEstado = estadoNode.getChildren().elements();
    while (iteratorEstado.hasMoreNodes()) {
        OptionTag node = (OptionTag) iteratorEstado.nextNode();
        String estadoId = node.getValue();
        String estadoNome = node.getChildren().elements().nextNode().getText();
        //System.out.println(estadoId+","+estadoNome);
        mapaEstados.put(estadoNome,estadoId);
    }

}
 
开发者ID:emivaljr,项目名称:hojenaoapp,代码行数:15,代码来源:MyEndpoint.java


示例7: main

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
public static void main(String[] args) {
	try {
		URL url = new URL(pro.getProperty("mlink"));
		SocketAddress address = new InetSocketAddress(pro.getProperty("host"), Integer.parseInt(pro.getProperty("port")));
		Proxy proxy = new Proxy(Proxy.Type.HTTP, address);
		URLConnection conn = url.openConnection(proxy);
		Authenticator.setDefault(new MyAuthenticator(pro.getProperty("username"), pro.getProperty("password")));
		
		conn.setConnectTimeout(Integer.parseInt(pro.getProperty("timeout")));
		Parser parser = new Parser(conn);
		
		NodeList nodeList = parser.parse(new TagNameFilter("A")); 
		System.out.println(nodeList.size());
		
		for (SimpleNodeIterator it = nodeList.elements(); it.hasMoreNodes(); ) {
			TagNode node = (TagNode) it.nextNode();
			String href = node.getAttribute("href");
			String dhref = URLDecoder.decode(href, "UTF-8");
			if (CommonHelper.checkIsAlink(dhref)) {
				System.out.println(dhref);	
			}
			
		}
		
	} catch (Exception e) {
		e.printStackTrace();
	}
}
 
开发者ID:toulezu,项目名称:play,代码行数:29,代码来源:TestParser.java


示例8: processResponse

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
private boolean processResponse(HttpResponse resp, Document doc, Element root) {
	if(resp.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {
		System.out.println("[INFO] HTTP Status OK.");
		System.out.println("[INFO] Extracting html page...");
		String html = extractHtml(resp);
		if(html == null) return false;
		System.out.println("[INFO] " + html.length() + "B html page extracted.");
		if(html.length() < 500) {
			System.out.println("[INFO] EOF reached, task completed.");
			return false;
		} else {
			System.out.println("[INFO] Parsing html page...");
			try {
				Parser parser = new Parser(html);
				NodeList weibo_list = parser.extractAllNodesThatMatch(
						new HasAttributeFilter("action-type", "feed_list_item"));
				System.out.println("[INFO] " + weibo_list.size() + " entries detected.");
				SimpleNodeIterator iter = weibo_list.elements();
				while(iter.hasMoreNodes()) {
					System.out.println("[INFO] processing entry #" + (++total) + "...");
					Element elem = extractContent(iter.nextNode(), doc);
					if(elem == null) {
						System.out.println("[ERROR] Data extraction failed.");
						return false;
					}
					root.appendChild(elem);
				}
				if(weibo_list.size() != 15) return false;
			} catch (ParserException e) {
				System.out.println("[ERROR] Parser failed.");
				e.printStackTrace();
				return false;
			}
		}
	} else {
		return false;
	}
	return true;
}
 
开发者ID:w1ndy,项目名称:weibo-fetcher,代码行数:40,代码来源:Spider.java


示例9: preencheMapaFeriadosEstaduais

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
private void preencheMapaFeriadosEstaduais() throws IOException, ParserException,ParseException {
    String estadosPage =  recuperarDadosEstado();
    StringBuilder stringBuilder = new StringBuilder(estadosPage);
    stringBuilder.delete(0,estadosPage.indexOf("<h3"));
    NodeList nodeEstadoList = filterTable(stringBuilder.toString());
    String todosMeses[] = {"janeiro", "fevereiro", "março", "abril", "maio", "junho", "julho", "agosto", "setembro", "outubro", "novembro", "dezembro"};
    Map<String,String> mapaMeses = new HashMap<String,String>();
    int i = 1;
    for (String mes:todosMeses){
        String valor = String.valueOf(i++);
        if(valor.length()< 2){
            valor ="0"+valor;
        }
        mapaMeses.put(mes,valor);
    }

    String estado = null;
    for (Node node:nodeEstadoList.toNodeArray()){
        if(node instanceof TableTag){
            NodeList lista = ((TableTag) node).searchFor(TableColumn.class, true);
            SimpleNodeIterator iterator  = lista.elements();
            while (iterator.hasMoreNodes()){
                Feriado feriado = new Feriado();
                Node data = iterator.nextNode();
                String[] dataExtenso = data.toPlainTextString().split(" de ");
                feriado.setData(dataExtenso[0] + "/" + mapaMeses.get(dataExtenso[1]) + "/2015");
                Node nome = iterator.nextNode();
                feriado.setNome(nome.toPlainTextString());
                Node lei = iterator.nextNode();
                if(dataExtenso[0].length()==1){
                    dataExtenso[0] = "0"+dataExtenso[0];
                }
                System.out.println(dataExtenso[0] + "/" + mapaMeses.get(dataExtenso[1]) + "/2015,"+nome.toPlainTextString()+","+mapaEstados.get(estado));
                mapaFeriadosEstado.get(estado).add(feriado);
            }

        }
        if(node instanceof HeadingTag){
            estado =  node.getChildren().toHtml().trim();
            if(node.getChildren().elementAt(0).getChildren() != null){
                estado =  node.getChildren().elementAt(0).getChildren().toHtml().trim();
            }
            mapaFeriadosEstado.put(estado,new ArrayList<Feriado>());
        }

    }
}
 
开发者ID:emivaljr,项目名称:hojenaoapp,代码行数:48,代码来源:MyEndpoint.java


示例10: list

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@SuppressWarnings({ "rawtypes", "unchecked" })
@Action(value = "sdlist", results = { @Result(type = "json", params = {
		"root", "list" }) })
public String list() {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey =domain+listid + page;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		list = (List) ele.getObjectValue();

	} else {
		StringBuffer retstr = fetch(domain+"/"+listid+"/list"
				+ page+".htm");
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		list = new ArrayList<News>();
		try {
			NodeList ls = p
					.extractAllNodesThatMatch(new AttributeRegexFilter(
							"href", ".*/page\\.htm"));
			SimpleNodeIterator i = ls.elements();
			while (i.hasMoreNodes()) {
				Node n = i.nextNode();
				if (n instanceof TagNode) {
					TagNode tn = (TagNode) n;
					News news = new News();
					String href = tn.getAttribute("href");						
					news.setId(href);
					news.setTitle(tn.getAttribute("alt"));
					Node tmp=tn.getParent().getNextSibling();
					while(tmp!=null &&!(tmp instanceof TableColumn))
						tmp=tmp.getNextSibling();
					if(tmp!=null)
						news.setPubdate(tmp.toPlainTextString());
					list.add(news);
				}
			}
			c.put(new Element(ckey, list));
		} catch (ParserException e) {

			e.printStackTrace();
		}
	}
	jsonp(list);
	return NONE;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:46,代码来源:SudyPageAction.java


示例11: list

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@SuppressWarnings({ "rawtypes", "unchecked" })
@Action(value = "newslist", results = { @Result(type = "json", params = {
		"root", "list" }) })
public String list() {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey = "newslist"+listid + page;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		list = (List) ele.getObjectValue();

	} else {
		StringBuffer retstr = fetch(RD+"/news/"+listid+"/"+page+".html");
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		list = new ArrayList<News>();
		try {
			NodeList ls = p
					.extractAllNodesThatMatch(new HasAttributeFilter("class","date"));
			SimpleNodeIterator i = ls.elements();
			while (i.hasMoreNodes()) {
				Node n = i.nextNode();
				if (n instanceof TagNode) {
					TagNode tn = (TagNode) n;
					News news = new News();
					news.setPubdate(tn.toPlainTextString());
					Node tmp=tn.getNextSibling();
					while(tmp!=null &&!(tmp instanceof LinkTag))
						tmp=tmp.getNextSibling();
					if(tmp!=null)
					{
						LinkTag link=(LinkTag)tmp;
						news.setId(link.getAttribute("href"));
						news.setTitle(link.getAttribute("title"));
					}
					list.add(news);
				}
			}
			c.put(new Element(ckey, list));
		} catch (ParserException e) {

			e.printStackTrace();
		}
	}

	return SUCCESS;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:46,代码来源:CampusNewsAction.java


示例12: list

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@SuppressWarnings("rawtypes")
@Action(value = "eventlist")
public String list() throws IOException {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey = "eventlist"+page ;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		list = (List) ele.getObjectValue();

	} else {
		StringBuffer retstr = fetch(RD+"/calendar/?a=list&&m=recent&range=30&_="+System.currentTimeMillis()+"&type=0&place=0&type="+page	);
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		list = new ArrayList<News>();
		try {
			NodeList ls = p
					.extractAllNodesThatMatch(new HasAttributeFilter("class","clear"));
			if(ls.size()==2)
			{
				int tk1=ls.elementAt(0).getEndPosition();
				int tk2=ls.elementAt(1).getStartPosition();
				ServletActionContext.getResponse().setCharacterEncoding("utf-8");
				p=Parser.createParser(retstr.substring(tk1+6, tk2), "utf-8");
				NodeList nl=p.parse(null);
				NodeList links=nl.extractAllNodesThatMatch(new NodeClassFilter(LinkTag.class),true);
				SimpleNodeIterator i=links.elements();
				while(i.hasMoreNodes())
				{
					LinkTag lt=(LinkTag)i.nextNode();
					NodeList ll=new NodeList();
					ll.add(new TextNode(lt.getAttribute("title")));
					lt.setChildren(ll);
					lt.removeAttribute("title");
				}
				
				
				ServletActionContext.getResponse().getWriter().print(nl.toHtml());
			}
		} catch (ParserException e) {
			e.printStackTrace();
		}
	}

	return NONE;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:45,代码来源:CampusEventAction.java


示例13: content

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@Action(value = "eventcontent", results = { @Result(type = "json", params = {
		"root", "en" }) })
public String content() {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey = "eventcontent" + newsid;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		en = (News) ele.getObjectValue();
	} else {
		StringBuffer retstr = fetch(RD+"/calendar/?a=one&evid="
				+ newsid+"&_="+System.currentTimeMillis());
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		try {
			NodeList nl = p.extractAllNodesThatMatch(new OrFilter(
					new TagNameFilter("h1"), new TagNameFilter("table")));
			SimpleNodeIterator i = nl.elements();
			en = new News();
			en.setId(newsid);
			while (i.hasMoreNodes()) {
				Node n = i.nextNode();
				if (n instanceof TagNode) {
					TagNode tn = (TagNode) n;
					if (tn.getTagName().equalsIgnoreCase("h1"))
						en.setTitle(tn.toPlainTextString());
					if (tn.getTagName().equalsIgnoreCase("table")) {
					en.setContent(tn.toHtml());							
						 
						}
						
					}

				}
			 String str=retstr.toString().trim();
			 int tk=retstr.indexOf("imageurl");
			 if(tk>0)
			 {
				 tk=retstr.indexOf("'",tk);
				 int tk1=retstr.indexOf("'", tk+1);
				 
			  String imgurl=RD+str.substring(tk+1,tk1);
				String imgid = EncodeHelper.digest(
						imgurl, "MD5");
				BasicDBObject obj = new BasicDBObject("id",
						imgid);
				DBCollection col = MongoUtil.getInstance().getDB()
						.getCollection("CrawlerImages");							
				DBObject dbo = col.findOne(obj);
				if (dbo == null)
					col.save(obj.append("url",imgurl));
				en.setPubdate(imgid);	
			 }
		} catch (ParserException e) {

			e.printStackTrace();
		}
		if (!CommonUtil.isEmpty(en) && !CommonUtil.isEmpty(en.getContent()))
			c.put(new Element(ckey, en));
	}
	return SUCCESS;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:61,代码来源:CampusEventAction.java


示例14: list

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@SuppressWarnings({ "rawtypes", "unchecked" })
@Action(value = "calist", results = { @Result(type = "json", params = {
		"root", "list" }) })
public String list() {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey = "calist" + page;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		list = (List) ele.getObjectValue();

	} else {
		StringBuffer retstr = fetch(RD+"/announce/announce_list.php?page="
				+ page);
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		list = new ArrayList<News>();
		try {
			NodeList ls = p
					.extractAllNodesThatMatch(new AttributeRegexFilter(
							"href", "announce/\\?announceid=\\d+"));
			SimpleNodeIterator i = ls.elements();
			while (i.hasMoreNodes()) {
				Node n = i.nextNode();
				if (n instanceof TagNode) {
					TagNode tn = (TagNode) n;
					News news = new News();
					String href = tn.getAttribute("href");
					int tk = href.indexOf("=");
					if (tk > 0)
						news.setId(href.substring(tk + 1));
					news.setTitle(tn.toPlainTextString());
					list.add(news);
				}
			}
			c.put(new Element(ckey, list));
		} catch (ParserException e) {

			e.printStackTrace();
		}
	}

	return SUCCESS;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:43,代码来源:CampusAnnouceAction.java


示例15: list

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@SuppressWarnings({ "rawtypes", "unchecked" })
@Action(value = "dstlist", results = { @Result(type = "json", params = {
		"root", "list" }) })
public String list() {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey = "dstlist" + page;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		list = (List) ele.getObjectValue();

	} else {
		try {
		StringBuffer retstr = CommonUtil.postWebRequest(RD+"/news.aspx?info_lb=822", ("__EVENTTARGET=_ctl0$ContentPlaceHolder1$Pager22&__EVENTARGUMENT="+page).getBytes("utf-8"), "application/x-www-form-urlencoded");
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		list = new ArrayList<News>();
		
			NodeList ls = p
					.extractAllNodesThatMatch(new AttributeRegexFilter(
							"href", "show\\.aspx\\?.+"));
			SimpleNodeIterator i = ls.elements();
			while (i.hasMoreNodes()) {
				Node n = i.nextNode();
				if (n instanceof TagNode) {
					TagNode tn = (TagNode) n;
					News news = new News();
					String href = tn.getAttribute("href");
					news.setId(href);
					news.setTitle(tn.toPlainTextString().trim());
					Node tmp=tn.getParent().getNextSibling();
					while(tmp!=null &&!(tmp instanceof Span))
						tmp=tmp.getNextSibling();
					if(tmp!=null){
						String dtstr=tmp.toPlainTextString();
						if(dtstr!=null &&dtstr.length()>2)
						news.setPubdate(dtstr.substring(1,dtstr.length()-1));
					}
					list.add(news);
				}
			}
			c.put(new Element(ckey, list));
		} catch (Exception e) {

			e.printStackTrace();
		}
	}

	return SUCCESS;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:49,代码来源:DstAnnouceAction.java


示例16: list

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
@SuppressWarnings({ "unchecked", "rawtypes" })
@Action(value = "liblist", results = { @Result(type = "json", params = {
		"root", "list" }) })
public String list() {
	Cache c = CacheManager.getInstance().getCache("News");
	String ckey = "liblist" + page;
	Element ele = c.get(ckey);
	if (!CommonUtil.isEmpty(ele)) {
		list = (List) ele.getObjectValue();

	} else {
		StringBuffer retstr = fetch(RD
				+ "/ddlib/getPublishInfoList.shtml?tid=1012&k=&p="
				+ (page - 1));
		Parser p = Parser.createParser(retstr.toString(), "utf-8");
		list = new ArrayList<News>();
		try {
			NodeList ls = p
					.extractAllNodesThatMatch(new AttributeRegexFilter(
							"href", "publishInfo\\.shtml\\?.+"));
			SimpleNodeIterator i = ls.elements();
			while (i.hasMoreNodes()) {
				Node n = i.nextNode();
				if (n instanceof TagNode) {
					TagNode tn = (TagNode) n;
					News news = new News();
					String href = tn.getAttribute("href");
					news.setId(href);
					news.setTitle(tn.toPlainTextString());
					Node tmp = tn.getNextSibling();
					if (tmp != null && tmp instanceof TextNode) {
						if (tmp.getText() != null)
							news.setPubdate(tmp.getText().replaceAll(
									"&nbsp;", ""));
					}
					list.add(news);
				}
			}
			c.put(new Element(ckey, list));
		} catch (ParserException e) {

			e.printStackTrace();
		}
	}

	return SUCCESS;
}
 
开发者ID:BaixiangLiu,项目名称:fudanweixin,代码行数:48,代码来源:LibAnnouceAction.java


示例17: fetchComment

import org.htmlparser.util.SimpleNodeIterator; //导入依赖的package包/类
private void fetchComment(String mid, Document doc, Element parent) {
	int page = 0;
	while(++page > 0) {
		System.out.println("[INFO] Fetching comment of W" + mid + " page " + page + "...");
		String url = String.format(CommentUrl, mid, page);
		HttpResponse resp = connect(url);
		if(resp == null) return ;
		BufferedReader reader;
		try {
			reader = new BufferedReader(new InputStreamReader(
					resp.getEntity().getContent()));
			String raw = "", line;
			while((line = reader.readLine()) != null) {
				raw += line;
			}
			
			JSONParser parser = new JSONParser();
			JSONObject json = (JSONObject)parser.parse(raw);
			
			Parser htmlparser = new Parser((String)((JSONObject)json.get("data")).get("html"));
			NodeList list = htmlparser.extractAllNodesThatMatch(new HasAttributeFilter("class", "S_txt2"));
			SimpleNodeIterator iter = list.elements();
			
			while(iter.hasMoreNodes()) {
				Node n = iter.nextNode();
				Node p = n.getPreviousSibling(), s = n;
				while(p != null && !s.toPlainTextString().startsWith("��")) {
					s = p;
					p = p.getPreviousSibling();
				}
				String comment = "";
				while(s != n) {
					comment += s.toPlainTextString();
					s = s.getNextSibling();
				}
				Node name = n.getParent().getFirstChild().getNextSibling();
				
				Element cmt = doc.createElement("comment");
				cmt.setAttribute("by", name.getChildren().asString());
				cmt.setAttribute("on", n.getChildren().asString());
				cmt.setTextContent(comment.substring(1));
				parent.appendChild(cmt);
			}
			if(list.size() < 20) return ;
		} catch (IllegalStateException | IOException | ParseException | ParserException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}
 
开发者ID:w1ndy,项目名称:weibo-fetcher,代码行数:51,代码来源:Spider.java



注:本文中的org.htmlparser.util.SimpleNodeIterator类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java StatsListener类代码示例发布时间:2022-05-23
下一篇:
Java MarkupLanguage类代码示例发布时间:2022-05-23
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap