Cleaning invalid characters from SharePoint
I stumbled onto one of those "gotchas" you get with SharePoint. We were creating new document libraries based on user names in a domain. A change came in and we had to support multiple domains so a document library name would need a domain identifier (since you could have two of the same user names in two different domains). During acceptance testing we found that document libraries created with dashes in the names (as we were creating them using [domain]-[username] pattern) would strip the dash out (without telling you of course). This caused a bit of a headache with the email we send out with a link since the URL was invalid.
I remember this from a million years ago (as I'm replacing a few SharePoint brain cells with Ruby ones lately) so after a bit of Googling I found a great article by Eric Legault here on the matter.
Here's a small method with a unit test class to handle this cleansing of names.
public static string CleanInvalidCharacters(string name)
{
string cleanName = name;
// remove invalid characters
cleanName = cleanName.Replace(@"#", string.Empty);
cleanName = cleanName.Replace(@"%", string.Empty);
cleanName = cleanName.Replace(@"&", string.Empty);
cleanName = cleanName.Replace(@"*", string.Empty);
cleanName = cleanName.Replace(@":", string.Empty);
cleanName = cleanName.Replace(@"<", string.Empty);
cleanName = cleanName.Replace(@">", string.Empty);
cleanName = cleanName.Replace(@"?", string.Empty);
cleanName = cleanName.Replace(@"\", string.Empty);
cleanName = cleanName.Replace(@"/", string.Empty);
cleanName = cleanName.Replace(@"{", string.Empty);
cleanName = cleanName.Replace(@"}", string.Empty);
cleanName = cleanName.Replace(@"|", string.Empty);
cleanName = cleanName.Replace(@"~", string.Empty);
cleanName = cleanName.Replace(@"+", string.Empty);
cleanName = cleanName.Replace(@"-", string.Empty);
cleanName = cleanName.Replace(@",", string.Empty);
cleanName = cleanName.Replace(@"(", string.Empty);
cleanName = cleanName.Replace(@")", string.Empty);
// remove periods
while (cleanName.Contains("."))
cleanName = cleanName.Remove(cleanName.IndexOf("."), 1);
// remove invalid start character
if (cleanName.StartsWith("_"))
{
cleanName = cleanName.Substring(1);
}
// trim length
if(cleanName.Length > 50)
cleanName = cleanName.Substring(1, 50);
// Remove leading and trailing spaces
cleanName = cleanName.Trim();
// Replace spaces with %20
cleanName = cleanName.Replace(" ", "%20");
return cleanName;
}
[TestFixture]
public class When_composing_a_document_library_name
{
[Test]
public void Spaces_should_be_converted_to_a_canonicalized_string()
{
string invalidName = "Cookie Monster";
Assert.AreEqual("Cookie%20Monster", SharePointHelper.CleanInvalidCharacters(invalidName));
}
[Test]
public void Remove_invalid_characters()
{
string invalidName = @"#%&*:<>?\/{|}~+-,().";
Assert.AreEqual(string.Empty, SharePointHelper.CleanInvalidCharacters(invalidName));
}
[Test]
public void Remove_invalid_underscore_start_character()
{
string invalidName = "_CookieMonster";
Assert.AreEqual("CookieMonster", SharePointHelper.CleanInvalidCharacters(invalidName));
}
[Test]
public void Remove_any_number_of_periods()
{
string invalidName = ".Co..okie...Mon....st.er.";
Assert.AreEqual("CookieMonster", SharePointHelper.CleanInvalidCharacters(invalidName));
}
[Test]
public void Names_cannot_be_longer_than_50_characters()
{
string invalidName = "CookieMonster".PadRight(51, 'C');
Assert.AreEqual(50, SharePointHelper.CleanInvalidCharacters(invalidName).Length);
}
[Test]
public void Leading_and_trailing_spaces_should_be_removed()
{
string invalidName = " CookieMonster ";
Assert.AreEqual("CookieMonster", SharePointHelper.CleanInvalidCharacters(invalidName));
}
}
I'm not 100% happy with the method as that whole "remove invalid characters" block is repetetive and I know it's creating a new string object with each call. I started to look at how to do this in a regular expression, but frankly RegEx just frightens me. I cannot for the life of me figure out the gobbly-gook syntax and if I do need it, I'll Google for an example and then cry and curl up into a fetal position. I even tried firing up Roy's Regulazy but that didn't help me. I'm just stumbling in the dark on this. If some kind soul wants to convert this into a regular expression for me I'll buy you a beer or small marsupial for your effort.
BTW, this would make for a nice 3.5 string extension method (string.ToSharePointName), but alas I'm stuck in 2.0 land for this project.
Enjoy!