Making RegEx more readable
Note: this entry has moved.
Compare the following code statements defining the same regular expression in .NET:
static readonly Regex ParameterReference = new Regex(@"(?\<\>)|\<(?[^\<\>]+)\>|(?\<[^\<\>]*(?!\>))",
RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
static readonly Regex ParameterReference = new Regex(@"
# Matches invalid empty brackets #
(?\<\>)|
# Matches a valid parameter reference #
\<(?[^\<\>]+)\>|
# Matches opened brackes that are not properly closed #
(?\<[^\<\>]*(?!\>))",
RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
While the former is still understandable for a fairly regex-aware developer, the later is far more explicit about the purpose of each part of it. The ability to place comments inside the expression is enabled by the RegexOptions.IgnorePatternWhitespace
, which is not used enough by developers. In the case of this pretty simple expression this may seem unnecessary, but imagine a regex-based parser that processes (CodeSmith-like) template files:
static Regex CodeExpression = new Regex(@"
# First match the full directives #
<\#\s*@\s+(?\w*)(?.*?)\#\/>(?:\W*\n)?|
# Match open tag #
(?<\#)|
# Match close tag #
(?\#\/>)|
# This is a simple expression that is outputed as-is to output.Write(
It's pretty obvious that not commenting such complex expressions makes them almost unreadable except for the guy who wrote them (and even to him after some time!). Bottom line: ALWAYS comment your expressions in-line!!!