Attention: We are retiring the ASP.NET Community Blogs. Learn more >

Match opening tags improvements

I've made a couple of alterations to the opening tag regex.

I've changed the non-capturing group that previously grouped attributes to use Atomic Grouping (?>...) so that states are not saved and therefore reducing any uneccessary backtracking. I've also added a negative lookahead assertion at the beginning of that group so that, if the ending tag is found the whole group will fail and I can step out of the group immediately. This also allows me to remove the lazy (...)*? quantifer at the end of the group and allow it to become (...)*

The entire pattern now looks like this:

(?'openTag'<)
    \s*?
    (?'tagName'\??\w+)
    (?>
	(?!=[\/\?]?>)
        \s*?
        (?'attribName'\w+)
        (?:\s*(?'attribSign'=)\s*)
        (?'attribValue'
            (?:\'[^\']*\'|\"[^\"]*\"|\w+)
        )
    )*
    \s*?
    (?'closeTag'[\/\?]?>)

Or, rendered as a single string...

(?'openingTag'<)\s*?(?'tagName'\??\w+)(\s*?(?>(?!=[\/\?]?>)(?'attribName'\w+)(?:\s*(?'attribSign'=)\s*)(?'attribValue'(?:\'[^\']*\'|\""[^\""]*\""|\w+))))*\s*?(?'closeTag'[\/\?]?>)

No Comments