C#中正则表达式Group分组

设为首页
点击收藏
手机版

手机扫一扫访问
迪恩网络手机版
关注官方公众号

微信扫一扫关注
公众号

登陆注册

快速发帖
客服电话

点击联系客服
在线时间：8:00-16:00

客服电话

132-9538-2358

电子邮件
[email protected]
APP下载

迪恩网络APP

随时随地掌握行业动态
官方微信

扫描二维码

关注迪恩网络微信公众号
问题反馈
返回顶部

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Node.js›Node.js教程

原作者: [db:作者] 来自: [db:来源] 收藏邀请

引用：http://www.cnblogs.com/kiant71/archive/2010/08/14/1799799.html

在一个正则表达式中，如果要提取出多个不同的部分（子表达式项），需要用到分组功能。

在 C# 正则表达式中，Regex 成员关系如下，其中 Group 是其分组处理类。

Regex –> MatcheCollection (匹配项集合)

          –> Match (单匹配项内容)

                –> GroupCollection (单匹配项中包含的 "(分组/子表达式项)" 集合)

                      –> Group ( "(分组/子表达式项)" 内容)

                            –> CaputerCollection (分组项内容显示基础？)

                                  –> Caputer

Group 对分组有两种访问方式：

1、数组下标访问

在 ((\d+)([a-z]))\s+ 这个正则表达式里总共包含了四个分组，按照默认的从左到右的匹配方式，

Groups[0]    代表了匹配项本身，也就是整个整个表达式 ((\d+)([a-z]))\s+

Groups[1]    代表了子表达式项 ((\d+)([a-z]))

Groups[2]    代表了子表达式项 (\d+)

Groups[3]    代表了子表达式项 ([a-z])

00

string text = "1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080"; 

01

Response.Write(text + "<br/>"); 

02
 
03

string strPatten = @"((\d+)([a-z]))\s+"; 

04

Regex rex = new Regex(strPatten, RegexOptions.IgnoreCase); 

05
MatchCollection matches = rex.Matches(text); 

06
 
07
//提取匹配项 

08

foreach (Match match in matches) 

09
{ 

10

    GroupCollection groups = match.Groups; 

11

    Response.Write(string.Format("<br/>{0} 共有 {1} 个分组：{2}<br/>"

12

                                , match.Value, groups.Count, strPatten)); 

13
 
14

    //提取匹配项内的分组信息 

15

    for (int i = 0; i < groups.Count; i++) 

16

    { 

17

        Response.Write( 

18

            string.Format("分组 {0} 为 {1}，位置为 {2}，长度为 {3}<br/>"

19

                        , i 

20

                        , groups[i].Value 

21

                        , groups[i].Index 

22

                        , groups[i].Length)); 

23

    } 

24
} 

25
 
26
/*  

27

 * 输出： 

28

 1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080 

29
 
30
1A 共有 4 个分组：((\d+)([a-z]))\s+ 

31
分组 0 为 1A ，位置为 0，长度为 3 

32
分组 1 为 1A，位置为 0，长度为 2 

33
分组 2 为 1，位置为 0，长度为 1 

34
分组 3 为 A，位置为 1，长度为 1 

35

36

 .... 

37

38

 */

2、命名访问

利用 (?<xxx>子表达式) 定义分组别名，这样就可以利用 Groups["xxx"] 进行访问分组/子表达式内容。

00

string text = "I've found this amazing URL at ; 

01

Response.Write(text + "<br/>"); 

02
 
03

string pattern = @"\b(?<protocol>\S+)://(?<address>\S+)\b"; 

04

Response.Write(pattern.Replace("<", "&lt;").Replace(">","&gt;") + "<br/><br/>"); 

05
 
06
MatchCollection matches = Regex.Matches(text, pattern); 

07

foreach (Match match in matches) 

08
{ 

09

    GroupCollection groups = match.Groups; 

10

    Response.Write(string.Format( 

11

                    "URL: {0}； Protocol: {1}； Address: {2} <br/>"

12

                    , match.Value 

13

                    , groups["protocol"].Value  

14

                    , groups["address"].Value)); 

15
} 

16
 
17
/*  

18

 * 输出 

19

 I've found this amazing URL at  

20

    \b(?<protocol>\S+)://(?<address>\S+)\b 

21
 
22

    URL: ； Protocol: http； Address: www.sohu.com  

23

    URL: ftp://ftp.sohu.comisbetter； Protocol: ftp； Address: ftp.sohu.comisbetter  

24
 
25

 */