Named Groups, Unnamed Groups and Captures
I see this question come up a bit in regex so, I thought that I'd blog about it. It has to do with 2 things: named groups and captures. First, an example...
I have a set of attributes ascribed to a value and I want to match each of them and then write them out - this is similar to matching attributes off of an xml or html element:
Example text:
Attributes=(Animal=cat; Human=paul;
Car=ford; Color=green;)
Sample
pattern:
Attributes=\(((?'type'\w+)(=)(?'value'\w+)\;\s?)+\)
Problem 1: Named and Unnamed Groups
This pattern uses 2 named groups - "type" and "value" - to store each of the attributes; it also has 2 unnamed groups, one which matches the entire attribute string and one which matches the "=" sign between type and value.
Looking at that pattern, you know that there's going to be 4 groups and, using logic you would probably expect them to appear in the following order:
- Group 0 : The unnamed entire match
- Group 1 : The named "type" group
- Group 2 : The unnamed "=" group
- Group 3 : The named "value" group
Unnamed Groups always come first
The first important rule of .NET regex's is that unnamed groups always come before named groups when you are enumerating over a Groups collection. So, the order of our groups will be:
- Group 0 : The unnamed entire match
- Group 1 : The unnamed "=" group
- Group 2 : The named "type" group
- Group 3 : The named "value" group
Problem 2: Groups and Captures
Another gotcha with this example arises when a user is attempting to write out all of the results to the screen. As you can see, there will be:
- 1 Match - The entire string
- 4 Groups - as we've already seen
- and 4 instances of the attributes.
The question is, how to get each of those 4 attribute values? The answer is that each Group has a Captures collection to store each "capture". So, the idea is to get a count of the captures for a group and then display the value at each index between 0 and the count of captures for that group.
Here's some sample code which demonstrates how you'd do that for the example shown above:
string pattern = @"Attributes=\(((?'type'\w+)(=)(?'value'\w+)\;\s?)+\)" ; string input = @"Attributes=(Animal=cat; Human=paul; Car=ford; Color=green;)" ; Match m = Regex.Match(input, pattern); if( m.Groups["type"].Success ) { // this will tell us how many captures we have... int matchedItems = m.Groups["type"].Captures.Count ; // now, enumerate the Captures and render the groups for each Capture... for( int i=0; i<matchedItems; i++ ) { string name = m.Groups["type"].Captures[i].Value ; string val = m.Groups["value"].Captures[i].Value ; Console.WriteLine("{0} = {1}", name, val) ; } } Console.ReadLine() ;
And here's the output generated by the above example...
Animal = cat
Human = paul
Car = ford
Color = green