ÜberUtils - Part 3 : Strings
ÜberUtils Series posts so far :
- Part 1 : Cryptography - Hashing
- Part 2 : Cryptography (Continued) - Encryption
- Part 3 : Strings
- Part 4 : Collections
So every developer has (or should have) a utilities class for strings. It seems the built-in string class never has enough (well for me in any case). So I hereby introduce my string utils class. It actually comprises of 3 files which are :
- Strings.cs (the actual string utils)
- SafeConvert.cs (a class for doing common conversions)
- Extensions/Strings.cs (extension methods using the string utils)
Here is the class diagram of the Strings class :
As you can see it has a nested class Regex which is also static. More on this later. Lets cover the string utility methods first (in 'logical' order):
- IsEmpty - returns true if the object passed in is either null or has a length of zero (exactly like string.isNullOrEmpty but can take an object as input)
- IsNumeric - returns true if we are dealing with a numeric value. Uses the regular expression : @"^\-?\(?([0-9]{0,3}(\,?[0-9]{3})*(\.?[0-9]*))\)?$". This matches a positive or negative value with any precision and scale (whole number or decimal). It also allows for left-padded zeros, commas as group separators or parenthesis to indicate negative number
- IsEmail - returns true if an email. Uses the regular expression : @"([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)"
- Trim - exactly like "abc".Trim() but adds checking for nulls
- CutWhitespace - cuts all whitespace from a string aswell as trims it
- eg. Strings.CutWhitespace(" 12 34 5 6 7 ") == "12 34 5 6 7"
- CutEnd - chops the end n chars off the end of a string
- eg. Strings.CutEnd("1234567890", 3) == "1234567"
- CutStart - chops the first n chars off the beginning of a string
- eg. Strings.CutStart("1234567890", 3) == "4567890"
- Start - returns the first n chars of a string
- eg. Strings.Start("1234567890", 3) == "123"
- End - returns the last n chars of a string
- eg. Strings.End("1234567890", 3) == "890"
- GetOccurences - returns an array of strings that are found within another string based on a regular expression
- eg. Strings.GetOccurences("say day bay toy", "[sdbt]ay") == new string[] {"say" , "day" , "bay"}
- eg. Strings.GetOccurences("123asdasd 1sk 555 sdkfjsdfkl999", "\\d+") == new string [] {"123" , "1" , "555" , "999"}
- OccurenceCount - returns the count of strings found within another string based on a regular expression
- eg. Strings.OccurenceCount("the cat sat on the mat", "at") == 3
- eg. Strings.OccurenceCount("abcabc", "a") == 2
Combine - combines a string array by a delimeter (or not)(DEPRICATED - read update and comments)eg. Strings.Combine(Strings.GetOccurences("123asdasd 1sk 555 sdkfjsdfkl999", "\\d+"), ",") == "123,1,555,999"eg. Strings.Combine(new string[] { "a", "b", "c", "d" }, ";") == "a;b;c;d"ToPaddedNumber - returns a zero padded number(DEPRICATED - read update and comments)eg. Strings.ToPaddedNumber("123", 5) == "00123"- XOR - performs a binary XOR operation on each char in the input string based on a key. Very simple form of encryption where XOR(XOR(input)) == input
- eg. Strings.XOR(Strings.XOR("test", "key"), "key") == "test"
- ToTitleCase - returns the title case of a string
- eg. Strings.ToTitleCase("this is a title") == "This Is A Title"
- ToFriendlyName - returns what I call a "friendly" version of a string. I use this mainly for converting a database field name into a user friendly name
- eg. Strings.ToFriendlyName("IAmNotFriendly") == "I Am Not Friendly"
- eg. Strings.ToFriendlyName("SomePrimaryKeyId") == "Some Primary Key"
Now onto the Regex class. The static Regex class just wraps regular expression functionality and contains a few commonly used expressions as constants. Here is the run down :
- IsExactMatch - returns true if a string is an exact match for a pattern
- eg. Strings.Regex.IsExactMatch("test@google.com", Strings.Regex.REGEX_EMAIL) == true
- Contains - returns true if a string contains a pattern
- eg. Strings.Regex.Contains("here is my email : test@google.com", Strings.Regex.REGEX_EMAIL) == true
- Replace - returns a string with a pattern replaced by another string
- eg. Strings.Regex.Replace("1 23 a 456", @"\d+", "!") == "! ! a !"
- GetMatch - returns the first match of pattern within a string
- eg. Strings.Regex.GetMatch("Subject: Test Subject\r\n", @"Subject\s*\:\s*(?<SubjectReturn>.*)\r\n", "SubjectReturn") == "Text Subject"
Now onto the SafeConvert class. It contains the following methods :
- ToBoolean - returns a boolean value from an object
- ToInt - returns an integer value from an object
- ToDecimal - returns a decimal value from an object
- ToDouble - returns a double value from an object
- ToHexString - returns a hexidecimal string representation of a byte array. This is used from Extensions\ByteArray.cs
- ToStream - returns a System.IO.MemoryStream from a string
So thats version 1 of the strings utilities. I say version 1 because I will no doubt add to this over the next couple of posts.
Oh yes, and again we have a whole bunch of new extension methods :
- Start
- End
- CutStart
- CutEnd
- OccurenceCount
- GetOccurences
- ReplaceAll - similar to Replace, but uses a regular expression to do the replacement
- Split - similar to Split(char c) but takes a string pattern to split using regular expressions
Combine(DEPRICATED - read update and comments)- Join - an extension method for string arrays wrapping the string.Join method
Now I know some people might argue that this is extension method abuse, but look at how much more power my strings have :
... and anything that helps me code quicker and smarter is not abuse in my book - its smart coding!
Download the source code and unit tests here
UPDATE - thanks to Dan's comments we found a bug in the email regular expression whereby it would not allow the domain ".museum" so I changed the regex to
@"([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,8}|[0-9]{1,8})(\]?)" (changes in bold)
Please note that email validation seems to be a touchy point for many developers as can be seen over at haaked.com . I would suggest not to use ANY email validation like this for restricting comments or purchases online, as you would be limiting your site's reach. Source code and unit tests have been updated.
UPDATE - thanks to Scott Hanselman for pointing out that ToPaddedNumber is redundant as the string class has a PadLeft (as well a PadRight) method - DOH! Source code and unit tests have been updated.
UPDATE - thanks to Don and John for pointing out the fact that my Combine method is redundant as the string.Join method does the exact same thing. - oops ;)
I then renamed my extension method Combine to Join and changed it to wrap the string.Join functionality. Again Source and tests have been updated.
NOTE - I renamed the static extension classes so that you could include both the Utils and Utils.Extensions namespaces without getting the build error : 'Strings' is an ambiguous reference between 'Utils.Strings' and 'Utils.Extensions.Strings'. Please get the latest source.
Thanks for all comments and feedback and please keep it coming. Collaboration and a LOT of testing is the only way to produce robust,useful code!