Regex Reminders [1] - Replacement Operations
A common application for regular expressions is to find and replace strings within a body of text. These operations range from something as simple as finding some text and removing it to locating and re-writing Hmtl tags. Let's take a quick look at the signatures of the .NET Regular Expression Replace overloads.
[VB]Instance -------- Replace( ByVal input As String, ByVal replacement As String ) As String Replace( ByVal input As String, ByVal replacement As String, int count ) As String Replace( ByVal input As String, ByVal replacement As String, int count, int startat ) As String Replace( ByVal input As String, ByVal evaluator As MatchEvaluator ) As String Replace( ByVal input As String, ByVal evaluator As MatchEvaluator, ByVal count As Integer ) As String Replace( ByVal input As String, ByVal evaluator As MatchEvaluator, , ByVal count As Integer, ByVal startat As Integer ) As String Shared ------ Replace( ByVal input As String, ByVal pattern As String, ByVal replacement As String ) As String Replace( ByVal input As String, ByVal pattern As String, ByVal replacement As String, ByVal options As RegexOptions ) As String Replace( ByVal input As String, ByVal pattern As String, ByVal evaluator As MatchEvaluator ) As String Replace( ByVal input As String, ByVal pattern As String, ByVal evaluator As MatchEvaluator, ByVal options As RegexOptions ) As String[C#]
instance -------- string Replace( string input, string replacement ) string Replace( string input, string replacement, int count ) string Replace( string input, string replacement, int count, int startat ) string Replace( string input, MatchEvaluator evaluator ) string Replace( string input, MatchEvaluator evaluator, int count ) string Replace( string input, MatchEvaluator evaluator, int count, int startat ) static ------ string Replace( string input, string pattern, string replacement ) string Replace( string input, string pattern, string replacement, RegexOptions options ) string Replace( string input, string pattern, MatchEvaluator evaluator ) string Replace( string input, string pattern, MatchEvaluator evaluator, RegexOptions options )Msdn documentation
So, as you can see, there are instance and static (Shared) methods that are available to call when performing replacement operations. The most significant differences between the options are that you can either use a static string or a MatchEvaluator for the replacement argument. You would use a MatchEvaluator when you need to perform additional operations on the matched item - which we'll look at in a moment.
Simple Replacement - straight replace
A straight replacement is useful when you want look for patterns of text and simply replace them with another string (perhaps a zero length string). For example, you might want to find specific words and CAPITALIZE them or replace them with another word, or, just plain-old remove them. Here are some examples of simple, straight replacements:
Find a word and Capitalize it (using an instance method):
[VB]
Dim source As String = "This is a body of text"
Dim find As String = "body"
Dim re As New Regex( find, RegexOptions.IgnoreCase )
Dim result As String = "Result: " & re.Replace( source, find.ToUpper() )
[C#]
string source = @"This is a body of text" ;
string find = "body" ;
Regex re = new Regex( find, RegexOptions.IgnoreCase ) ;
string result = "Result: " + re.Replace( source, find.ToUpper() ) ;
Trim leading and trailing whitespace (using a static method):
[VB]
Dim source As String = " foo "
Dim find As String = "(^\s+)|(\s+$)"
Dim result As String = "Result: " & Regex.Replace( source, find, "", RegexOptions.IgnoreCase )
[C#]
string source = " foo " ;
string find = @"(^\s+)|(\s+$)" ;
string result = "Result: " + Regex.Replace( source, find, "", RegexOptions.IgnoreCase )
Simple Replacement - referring to captured items
Another common use of Replace involves referring to the captured groups using the special $n or ${groupName} notation, for example:
Swap names around (using ordinal notation):
[VB]
Dim source As String = String.Format("Mr. Darren Neimke{0}Mr. John Doe{0}Ms. Jane Doe", Environment.NewLine )
Dim find As String = "(\w+\.)\s+(\b\w+\b)\s+(\b\w+\b)"
Dim result As String = Regex.Replace( source, find, "$3, $1 $2" )
[C#]
string source = String.Format("Mr. Darren Neimke{0}Mr. John Doe{0}Ms. Jane Doe", Environment.NewLine ) ;
string find = @"(\w+\.)\s+(\b\w+\b)\s+(\b\w+\b)" ;
string result = "Result: " + Regex.Replace( source, find, @"$3, $1 $2" ) ;
Format .com urls into hyperlinks (using named groups notation):
[VB]
Dim source As String = "http://www.regexlib.com"
Dim find As String = "(?'Url'http\:\/\/www\.\w+\.com)"
Dim result As String = "Result: " & Regex.Replace( source, find, "<a href=""${url}"">${Url}</a>" )
[C#]
string source = @"http://www.regexlib.com" ;
string find = @"(?'Url'http\:\/\/www\.\w+\.com)" ;
string result = "Result: " + Regex.Replace( source, find, @"<a href=""${url}"">${Url}</a>" ) ;
Simple Replacement - limiting the scope of replacement
You can use the overloaded instance method that takes a startat position argument to direct the regex to miss a section and only match within a given section of text:
Emboldens the word "blah" but only when found in the body of the document:
[VB]
Dim source As String = "<html><head><title>blah</title></head><body>blah</body></html>"
Dim find As String = "(?'theWord'blah)"
Dim re As New Regex( find, RegexOptions.IgnoreCase )
Dim result As String = "Result: " & re.Replace( source, "<b>${theWord}</b>", 9999, source.IndexOf("<body>") )
[C#]
string source = @" <html>
<head>
<title>blah</title>
</head>
<body>
blah
</body>
</html>" ;
string find = @"(?'theWord'blah)" ;
Regex re = new Regex( find, RegexOptions.IgnoreCase ) ;
string result = "Result: " + re.Replace( source, @"<b>${theWord}</b>", 9999, source.IndexOf("<body>") ) ;
MatchEvaluator - using a pointer as a replacement argument.
By using an overload that takse a MatchEvaluator for the replacement argument you can point to a delegate method to dynamically build the replace string. This is especially useful if you need to perform more complicated logic on the Match'ed text before inserting its replacement. Each match in the replace operation is handed off to the method that you define for your MatchEvaluator and, that method can do whatever it likes to the match and simply return a string that is then used as the replacement value. Here's an example using a MatchEvaluator that emboldens the word "foo" but only when NOT found as part of ANCHORS:
Match "foo" not in anchors using MatchEvaluator
[VB]
Dim source As String = "<a >foo</a>foo"
Dim find As String = "(?'Url'<a [^>]*>.*?</a>)|(?'theWord'foo)"
Dim re As New Regex( find, RegexOptions.IgnoreCase )
' use a MatchEvaluator with a pointer to the delegate method
Dim result As String = re.Replace( source, New MatchEvaluator( AddressOf FormatLinkBits ) )
' Delegate method
Private Function FormatLinkBits( ByVal m As Match ) As String
If m.Groups("theWord").Success Then
Dim theWord As String = m.Groups("theWord").Value
Return "<b>" & theWord & "</b>"
Else
Return m.Value
End If
End Function
[C#]
string source = sourceTextBox.Text ;
Regex re = new Regex( @"(?'Url'<a [^>]*>.*?</a>)|(?'theWord'foo)" ) ;
// use a MatchEvaluator with a pointer to the delegate method
string result = re.Replace( source, new MatchEvaluator( FormatLinkBits ) ) ;
// delegate method
private string FormatLinkBits( Match m )
{
if( m.Groups["theWord"].Success )
{
string theWord = m.Groups["theWord"].Value ;
return "<b>" + theWord + "</b>" ;
}
else
return m.Value ;
}
Here's another example that filters bad words and replaces them with censored text:
[VB]
Dim source As String = "poo and browner are swearing expressions!"
Dim find As String = "(shit|poo|crap|turd|browner)"
Dim re As New Regex( find, RegexOptions.IgnoreCase )
' use a MatchEvaluator with a pointer to the delegate method
Dim result As String = re.Replace( source, New MatchEvaluator( AddressOf FormatSwearWords ) )
' delegate method
Private Function FormatSwearWords( ByVal m As Match ) As String
Dim l As Integer = m.Value.Length
Return m.Value.Substring(0, 2) & new String("#"c, l - 2)
End Function
[C#]
string source = "poo and browner are swearing expressions!" ;
Regex re = new Regex( "shit|poo|crap|turd|browner" ) ;
// use a MatchEvaluator with a pointer to the delegate method
string result = re.Replace( source, new MatchEvaluator( FormatSwearWords ) ) ;
private string FormatSwearWords( Match m )
{
int len = m.Value.Length ;
return m.Value.Substring(0, 2) + new String('#', len - 2) ;
}