LINQ to XML for Better Maintainability

Today I was trying to solve a simple technical problem. Given a specific XML, needed to clean it up by removing any elements of a particular type.

<Attachment>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;">  <span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span>file1.pdf<span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;">  <span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">&gt;</span>1<span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;">  <span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span>file2.pdf<span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;">  <span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">&gt;</span>2<span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Id</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>

Result had to be without Id elements

<Attachment>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;">  <span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span>file1.pdf<span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;">  <span style="color: rgb(0, 0, 255);">&lt;</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span>file2.pdf<span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Name</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">&lt;/</span><span style="color: rgb(128, 0, 0);">Attachment</span><span style="color: rgb(0, 0, 255);">&gt;</span></pre>

A few choices for implementation:

  1. Regex
  2. XmlDocument
  3. LINQ to XML
  4. XSLT (as suggested in comments)

Regex option is probably the most efficient, but not the most maintainable. Myself, looking sometimes at the solutions with Regex I ask “what the heck did I try to do here”. So much for “code doesn’t lie”.

XmlDocument is more expressive than Regex option, but way too chatty.

LINQ to XML same as XmlDocument, expressive. As well as very clear and fluent. I picked this option not for performance, but for maintainability sake. I know it will take less developer type to understand and/or modify code when it’s time to change it. And it documents itself very well, with no need to write any comments.

var xdoc = XDocument.Load(new StringReader(received_content));
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: rgb(244, 244, 244); width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><b>xdoc.Descendants().Where(element =&gt; element.Name == <span style="color: rgb(0, 96, 128);">"Id"</span>).Remove();</b></pre>
<pre style="border-style: none; margin: 0em; padding: 0px; overflow: visible; text-align: left; line-height: 12pt; background-color: white; width: 100%; font-family: 'Courier New',courier,monospace; direction: ltr; color: black; font-size: 8pt;"><span style="color: rgb(0, 0, 255);">return</span> xdoc.ToString();</pre>

 

 

 

Note: this is a very specific case, which does not indicate it’s a solution to all kinds of problems. Regex / XmlDocument are valid tools for all sorts of other problems.

2 Comments

  • Why not using an xslt sheet for this?

    // Load the style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    xslt.Load("output.xsl");

    // Execute the transform and output the results to a file.
    xslt.Transform("input.xml", "output.xml");

    and with the following xslt (from top of my head, check for errors)





    using an xslt shields your code better from changes in xml formats (and thus makes it more maintainable, even for non-developer types :-)).



  • I think you made the right decision with LINQ to XML. I would seldom even consider using a regular expression to parse XML in a production application. Regex's are a great solution to a particular class of problems; parsing XML/HTML is not in that class. Using a DOM parser as you did will work in every case, even edge cases (nested tags, entities, etc.).

    And I've gone down the XSLT rabbit-hole before. While it has its uses, it left me with a complex, difficult to read stylesheet that no one else wanted to maintain.

Comments have been disabled for this content.