Attention: We are retiring the ASP.NET Community Blogs. Learn more >

It's good to be efficient but it's better to be explicit

Recently, I've started using ExplicitCapture a lot more in my patterns.  Explicit capture can be turned on via its RegexOptions.ExplicitCapture option or via the (?mode-mode) inline mode modifier - "m".  One of the benefits of doing this is that "raw" parenthesis are not captured!

As an example, consider the following pattern that uses raw parenthesis for alternation:

source: "hello world.  how are you"
pattern: "(?'sentence'hello (jane|bob|world)\.  How are you)"

The ensuing Match would contain 3 groups: Group(0) contains the value of the full match, Group(1) is a raw group with the value "world" and Group(2) is a named group (sentence) with the same value as the full match.  **Remember that named groups always appear after raw groups!

Obviously, for performance reasons, Group(1) can be discarded as the value is not required for backreferencing and was simply there to assist with alternation.  In the next sample, I've turned off capturing in that group by using the Grouping only syntax (?:...):

source: "hello world.  how are you"
pattern: "(?'sentence'hello (?:jane|bob|world)\.  How are you)"

This has the desired effect as there are now only 2 Groups: Group(0) and Group(1).  My only concern with this s that, given a moderately complex pattern, this leads to the potential for reduced readability due to all the additional ?: characters embedded within the string.  The following snippet uses the (?n...) modifier to turn on ExplicitCapture at the beginning of the pattern.  ExplicitCapture ensures that only named Groups are captured as it turns off capturing for raw parenthesis:

source: "hello world.  how are you"
pattern: "(?n)(?'sentence'hello (jane|bob|world)\.  How are you)"

This has the same effect as the previous pattern.  Naturally, setting the option via the RegexOptions would remove the (?n) characters as well!

No Comments