Why not an Expression Query Language?
Regular Expressions are extremely powerful and ugly as all hell.[1] Even with comments and a good RegEx IDE like the Regulator , they're total gibberish. Why not a RegEx 2006 with a more readable syntax?
For instance, take a look this recent code snip from Eric Gunnerson's recent RegEx 101 article[2]:
\d{3} # three digits
- # literal '-'
\d{2} # two digits
- # literal '-'
\d{4} # four digits
$ # end of string
The comments are helpful, but why couldn't those comments be the regular expression? They exactly describe the pattern we're matching, so there's not real reason the parser couldn't compile those comments, or at least those comments be converted to the regex behind the scenes.
The Regulator has
a cool Regex Analyzer feature that does something similar; here's what it does
with "
Any digit
Exactly 3 times
-
Any digit
Exactly 2 times
-
Any digit
Exactly 4 times
$ (anchor to end of string)
This, again, shows exactly what we want to match, but in a more human readable form. There's no reason this couln't be the expression itself. Now, of course, it's easier to include a one line regex inline with your code, but I don't think that's worth the tradeoff. A more verbose Expression Query Language could be included inline, and would be much more readable. If needed, it could be a separate file - we've got piles of xml, xsd, config, resx, etc. files now, and a regex file or two that was actually readable would be much simpler than including cryptic strings in our code. Why don't we treat these things like small stored procedures?
I found a
thread on the Python newsgroups discussing an improved RegEx syntax. One
interesting idea is RegEx Builder
(RXB) - it lets you build RegEx's using verbose language:
digit +
some(whitespace) + exactly('example')
which would generate to \d\s+example.
Wrappers, utility classes, and copious
comments are a step in the right direction, but magic strings like
"\w?<\s?\/?[^\s>]+(\s+[^"'=]+(=("[^"]*")|('[^\']*')|([^\s"'>]*))?)*\s*\/?>"
shouldn't be anywhere near professional development languages circa 2005,
especially when compilers are capable of doing things like LINQ. We need an
Expression Query Language. How about Language Integrated Expressions
(LINE)?
[1] Yes, Jeff, that's an intentional GoogleBomb.
[2] That's a
simple RegEx for the point of illustration. Read Jeff's post on RegEx Abuse if
you don't see the problem. I've written my share of complex regex's and I bet
you have, too, if you've read this far. Sure, we can write code in assembly
language, but it's not productive or maintainable.