Removing diacritics (accents) from strings
It's often useful to remove diacritic marks (often called accent marks) from characters. You know: tilde, cédille, umlaut and friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à' becomes 'a'. This could be used for indexing or to build simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can play with String.Replace or regular expressions... But do you know .NET 2 has all that is required to make this easier?
You should use this kind of code for example:
public static String RemoveDiacritics(String s)
{
String normalizedString = s.Normalize(NormalizationForm.FormD);
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
Char c = normalizedString[i];
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
This piece of code comes from Michael Kaplan's blog. I've been using it to clean my URL keys. It works perfectly and I suspect it's more efficient than other solutions.