asp.net正则表达式学习例子

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› ASP.NET›ASP.NET编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

asp.net 获取网页Document时常会用到

1、获取某个class的div内的标签

获取<div class="imgList2">****</div>内的标签

方法一：

 string g = " <div.*?class=\"imgList2\">(?<html>[\\s\\S]*?)</div>";
            Regex reg = new Regex(g, RegexOptions.None);
            MatchCollection mc = reg.Matches(strResult);
            string v = "";
            foreach (Match m in mc)
            {
                v += m.Value + "\r\n";
            }

View Code

方法二(通用方法，获取指定前后内容之间的内容)：

string list_a_group_str = GetValue(strResult.Trim(), "<div class=\"imgList2\">", "</div>");

  public static string GetValue(string str, string start, string end)
        {
            Regex regex = new Regex(string.Concat(new string[]    {
        "(?<=(",
        start,
        "))[.\\s\\S]*?(?=(",
        end,
        "))"
    }), RegexOptions.Multiline | RegexOptions.Singleline);
            return regex.Match(str).Value;
        }

View Code

2、获取所有a标签的href和text

获取<div class="page both\"></div>里所有a标签的href和text

string list_page_group_str = GetValue(strResult.Trim(), "<div class=\"page both\">", "</div>");
            Regex reg = new Regex(@"(?is)<a(?:(?!href=).)*href=(['""]?)(?<url>[^""\s>]*)\1[^>]*>(?<text>(?:(?!</?a\b).)*)</a>");
            MatchCollection mc = reg.Matches(list_page_group_str);
            foreach (Match m in mc)
            {
                string url = m.Groups["url"].Value + "\n";
                string text = m.Groups["text"].Value + "\n";
            }