Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
555 views
in Technique[技术] by (71.8m points)

regex - 正则表达式中的非捕获组是什么?(What is a non-capturing group in regular expressions?)

非捕获组(即(?:) )如何在正则表达式中使用,它们有什么用?

  ask by never_had_a_name translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Let me try to explain this with an example.

(让我尝试用一??个例子来解释一下。)

Consider the following text:

(考虑以下文本:)

http://stackoverflow.com/
https://stackoverflow.com/questions/tagged/regex

Now, if I apply the regex below over it...

(现在,如果我在下面套用正则表达式...)

(https?|ftp)://([^/
]+)(/[^
]*)?

... I would get the following result:

(...我将得到以下结果:)

Match "http://stackoverflow.com/"
     Group 1: "http"
     Group 2: "stackoverflow.com"
     Group 3: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "https"
     Group 2: "stackoverflow.com"
     Group 3: "/questions/tagged/regex"

But I don't care about the protocol -- I just want the host and path of the URL.

(但我不在乎协议-我只想要URL的主机和路径。)

So, I change the regex to include the non-capturing group (?:) .

(因此,我将正则表达式更改为包括非捕获组(?:) 。)

(?:https?|ftp)://([^/
]+)(/[^
]*)?

Now, my result looks like this:

(现在,我的结果如下所示:)

Match "http://stackoverflow.com/"
     Group 1: "stackoverflow.com"
     Group 2: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "stackoverflow.com"
     Group 2: "/questions/tagged/regex"

See?

(看到?)

The first group has not been captured.

(第一组尚未被捕获。)

The parser uses it to match the text, but ignores it later, in the final result.

(解析器使用它来匹配文本,但是稍后在最终结果中将其忽略。)


EDIT: (编辑:)

As requested, let me try to explain groups too.

(根据要求,我也尝试解释组。)

Well, groups serve many purposes.

(好吧,团体有许多目的。)

They can help you to extract exact information from a bigger match (which can also be named), they let you rematch a previous matched group, and can be used for substitutions.

(它们可以帮助您从更大的匹配项(也可以命名)中提取确切的信息,可以让您重新匹配先前匹配的组,并可以用于替换。)

Let's try some examples, shall we?

(让我们尝试一些例子,对吧?)

Ok, imagine you have some kind of XML or HTML (be aware that regex may not be the best tool for the job , but it is nice as an example).

(好的,假设您有某种XML或HTML(请注意, 正则表达式可能不是完成这项工作的最佳工具 ,但它很好地举例说明了这一点)。)

You want to parse the tags, so you could do something like this (I have added spaces to make it easier to understand):

(您想解析标签,因此可以执行以下操作(我添加了空格以使其更易于理解):)

   <(?<TAG>.+?)> [^<]*? </k<TAG>>
or
   <(.+?)> [^<]*? </1>

The first regex has a named group (TAG), while the second one uses a common group.

(第一个正则表达式具有一个命名组(TAG),而第二个正则表达式使用一个公共组。)

Both regexes do the same thing: they use the value from the first group (the name of the tag) to match the closing tag.

(这两个正则表达式执行相同的操作:它们使用第一组中的值(标签名称)来匹配结束标签。)

The difference is that the first one uses the name to match the value, and the second one uses the group index (which starts at 1).

(区别在于,第一个使用名称来匹配值,第二个使用组索引(从1开始)。)

Let's try some substitutions now.

(现在让我们尝试一些替换。)

Consider the following text:

(考虑以下文本:)

Lorem ipsum dolor sit amet consectetuer feugiat fames malesuada pretium egestas.

Now, let's use this dumb regex over it:

(现在,让我们在它上面使用这个愚蠢的正则表达式:)

(S)(S)(S)(S*)

This regex matches words with at least 3 characters, and uses groups to separate the first three letters.

(此正则表达式匹配至少包含3个字符的单词,并使用组来分隔前三个字母。)

The result is this:

(结果是这样的:)

Match "Lorem"
     Group 1: "L"
     Group 2: "o"
     Group 3: "r"
     Group 4: "em"
Match "ipsum"
     Group 1: "i"
     Group 2: "p"
     Group 3: "s"
     Group 4: "um"
...

Match "consectetuer"
     Group 1: "c"
     Group 2: "o"
     Group 3: "n"
     Group 4: "sectetuer"
...

So, if we apply the substitution string:

(因此,如果我们应用替换字符串:)

$1_$3$2_$4

... over it, we are trying to use the first group, add an underscore, use the third group, then the second group, add another underscore, and then the fourth group.

(...在它上面,我们尝试使用第一组,添加一个下划线,使用第三组,然后使用第二组,添加另一个下划线,然后是第四组。)

The resulting string would be like the one below.

(结果字符串类似于下面的字符串。)

L_ro_em i_sp_um d_lo_or s_ti_ a_em_t c_no_sectetuer f_ue_giat f_ma_es m_la_esuada p_er_tium e_eg_stas.

You can use named groups for substitutions too, using ${name} .

(您也可以使用${name}来使用命名组进行替换。)

To play around with regexes, I recommend http://regex101.com/ , which offers a good amount of details on how the regex works;

(要使用正则表达式,建议使用http://regex101.com/ ,它提供了大量有关正则表达式工作原理的详细信息。)

it also offers a few regex engines to choose from.

(它还提供了一些正则表达式引擎供您选择。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...