LINQ to XML for Better Maintainability
Today I was trying to solve a simple technical problem. Given a specific XML, needed to clean it up by removing any elements of a particular type.
<Attachment>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"> <span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span>file1.pdf<span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"> <span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">></span>1<span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"> <span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span>file2.pdf<span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"> <span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">></span>2<span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">></span></pre>
Result had to be without Id elements
<Attachment>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"> <span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span>file1.pdf<span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"> <span style="color: rgb(0, 0, 255);"><</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span>file2.pdf<span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">></span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);"></</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">></span></pre>
A few choices for implementation:
- Regex
- XmlDocument
- LINQ to XML
- XSLT (as suggested in comments)
Regex option is probably the most efficient, but not the most maintainable. Myself, looking sometimes at the solutions with Regex I ask “what the heck did I try to do here”. So much for “code doesn’t lie”.
XmlDocument is more expressive than Regex option, but way too chatty.
LINQ to XML same as XmlDocument, expressive. As well as very clear and fluent. I picked this option not for performance, but for maintainability sake. I know it will take less developer type to understand and/or modify code when it’s time to change it. And it documents itself very well, with no need to write any comments.
var xdoc = XDocument.Load(new StringReader(received_content));
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><b>xdoc.Descendants().Where(element => element.Name == <span style="color: rgb(0, 96, 128);">"Id"</span>).Remove();</b></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">return</span> xdoc.ToString();</pre>
Note: this is a very specific case, which does not indicate it’s a solution to all kinds of problems. Regex / XmlDocument are valid tools for all sorts of other problems.