Attention: We are retiring the ASP.NET Community Blogs. Learn more >

Fred : Lazy or just plain Greedy?

The question of greediness and backtracking has come up around here quite a bit of late - as they are the essence of what regex is all about - and, while I don't plan on ranting about either just now, it's worth a reminder as to what can cause backtracking can occur.

Given the following source and pattern:

    source = "Friendship"
    pattern = "F.*r"

...it's worth, first of all noting that a Match will definitely occur and that a result of "Fr" will be returned. The important part is really to do with "how" it matches, and for that you must understand that * is Greedy, and because of that, roughly 19 decisions needed to be made - as opposed to 2!

Greedy characters
*, +, and {min,max}
Greedy metacharacters will ALWAYS attempt the optional part first.

Greedy characters made Lazy
*?, +?, and {min,max}?
Greedy metacharacters will ALWAYS skip the optional part first.

My favourite diagram of how Greediness and Laziness work comes from classic text by Jeffrey Friedl:

Understanding Regular Expressions

So, while I won't write about the semantics of the governing ? metacharacter - as it's all well and truly covered in the above article - I will mention that my rule of thumb for a match such as the one above is that, when I know which character(s) I can match to, I will repeat a negated character class, like so:

    source = "Friendship"
    pattern = "F[^r]*r"

...or, in a more abstract scenario such as where I need to match up to a block of text (as with a closing Html tag), I'll use laziness to govern the behaviour of the dot-star combination:

    source = "<b>Hello</b>"
    pattern = "<b>(?'text'.*?)<\/b>"

No Comments