I know this is an old question but I ran into it and was looking for something more comprehensive that could detect things like HTML entities and would ignore other uses of < and > symbols. I came up with the following class that works well.
You can play with it live at http://ideone.com/HakdHo
I also uploaded this to GitHub with a bunch of JUnit tests.
package org.github;
/**
* Detect HTML markup in a string
* This will detect tags or entities
*
* @author [email protected] - David H. Bennett
*
*/
import java.util.regex.Pattern;
public class DetectHtml
{
// adapted from post by Phil Haack and modified to match better
public final static String tagStart=
"\<\w+((\s+\w+(\s*\=\s*(?:".*?"|'.*?'|[^'"\>\s]+))?)+\s*|\s*)\>";
public final static String tagEnd=
"\</\w+\>";
public final static String tagSelfClosing=
"\<\w+((\s+\w+(\s*\=\s*(?:".*?"|'.*?'|[^'"\>\s]+))?)+\s*|\s*)/\>";
public final static String htmlEntity=
"&[a-zA-Z][a-zA-Z0-9]+;";
public final static Pattern htmlPattern=Pattern.compile(
"("+tagStart+".*"+tagEnd+")|("+tagSelfClosing+")|("+htmlEntity+")",
Pattern.DOTALL
);
/**
* Will return true if s contains HTML markup tags or entities.
*
* @param s String to test
* @return true if string contains HTML
*/
public static boolean isHtml(String s) {
boolean ret=false;
if (s != null) {
ret=htmlPattern.matcher(s).find();
}
return ret;
}
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…