Fred : Lazy or just plain Greedy?
The question of greediness and backtracking has come up around here quite a bit of late - as they are the essence of what regex is all about - and, while I don't plan on ranting about either just now, it's worth a reminder as to what can cause backtracking can occur.
Given the following source and pattern:
source = "Friendship" pattern = "F.*r"
...it's worth, first of all noting that a Match will definitely occur and that a result of "Fr" will be returned. The important part is really to do with "how" it matches, and for that you must understand that * is Greedy, and because of that, roughly 19 decisions needed to be made - as opposed to 2!
Greedy characters
*, +, and {min,max}
Greedy metacharacters will ALWAYS attempt the optional part first.
Greedy characters made Lazy
*?, +?, and {min,max}?
Greedy metacharacters will ALWAYS skip the optional part first.
My favourite diagram of how Greediness and Laziness work comes from classic text by Jeffrey Friedl:
So, while I won't write about the semantics of the governing ? metacharacter - as it's all well and truly covered in the above article - I will mention that my rule of thumb for a match such as the one above is that, when I know which character(s) I can match to, I will repeat a negated character class, like so:
source = "Friendship" pattern = "F[^r]*r"
...or, in a more abstract scenario such as where I need to match up to a block of text (as with a closing Html tag), I'll use laziness to govern the behaviour of the dot-star combination:
source = "<b>Hello</b>" pattern = "<b>(?'text'.*?)<\/b>"