Characters filteration or validation for XML document.

I was having hard time to filer form input from a asp.net page. This page accept text as input and send to database in terms of xml formats.
All are ok when the text doest not include any of the following character. It gone for toss when you use these charactes

 

Character Name

Entity Reference

Character Reference

Numeric Reference

Ampersand

&

&

&

Left angle bracket

<

< 

&#38;#60;

Right angle bracket

&gt;

> 

&#62;

Straight quotation mark

&quot;

"

&#39;

Apostrophe

&apos;

'

&#34;

Microsoft has one suport article but it does not give sufficent help for this problem.
How to locate and replace special characters in an XML file with Visual C# .NET

I tried different way to workaround this and I could be following any of these
1 - You simple filter the text using c# or VB.Net Replace method
Like

// >

newString = newString.Replace(">","&gt;");

//  <

newString = newString.Replace("<","&lt;");

//&

newString = newString.Replace("&","&amp;");

//Double Quote " –This does not work here…

// newString = newString.Replace(ControlChars.Quote,"&quot;");

 Here C# does not have any class called ControlChars

// newString = newString.Replace(CHR(32),"&quot;");

Replace method accept either both character or both string not one character and another string. So this also does not work.

 

//Single Quote '
newText = newText.Replace("'","&apos;");

 2. You can use Regular expression to filter all you special character.It might be something like this @"a1/}{]yryr23dsdhds%$#yytr^&uut887611oiuif():><?jfhgg";

3. You can write stored procedure to replace all these special characters.

Regular expression was good choice but it was killing lots of time to make the expression. I am not so good on regular expression or might not be wanted to waste much time for this simple problem. If somebody could write the expression it would be great and helpful for others

How To Locate and Replace Special Characters in an XML Document with Visual Basic

 

Some extra reading
XML Syntax and Parsing Concepts
Manipulating Strings in C#

 

Happy Coding

 

Suresh Behera

2 Comments

Comments have been disabled for this content.