Cool StringBuilder Tip (Rob McLaws)

Rob McLaws offers a few thoughts about StringBuilder.  I've posted it here because I had a couple of things to add, and Rob's pointing out one of those classic "issues" with .NET.  The Framework often does so many useful things for us that we don't necessarily consider exactly how they work.

By default, StringBuilder (like ArrayList) will double its capacity when it's capacity is reached.  For example, if it has 32 bytes in its internal array, when you add byte 33, the storage will be doubled to 64, then 128, then 256, and so on.  The doubling usually improves performance as there's a considerable amount of memcpy-ing going under under the covers to deal with the dynamic nature of these storage structures.

Now imagine you're storing a large XML file in a StringBuilder.  That XML file is 1MB in size.  Add byte number 1MB+1byte, you're now using 2MB of storage.  In effect, you could end up being extremely wasteful of space.  Rob's technique here of guestimating how much capacity the builder would require is useful for a) reducing the number of operations that happen to grow the array and b) decreasing the memory footprint of the application.

-----------------

One of the great things about the StringBuilder is it's ability to dynamically resize itself for situations where it is dealing with large strings. It was very helpful in building GenX.NET, especially since I wasn't always going to be writing to the file system anymore. Back in the 2.0 days, each time I loaded up a new line, I wrote it to the file system, so performace wasn't really a factor. Now it is. The problem is, however, that this dynamic resizing can sometimes come at a performance hit IF you are adding to your string beyond 1000x.

Enter StringBuilder.EnsureCapacity(Integer)

For GenX.NET, it is quite conceivable that the StringBuilder may append new data over a million times. Well, I want to make sure that it resizes itself as few times as possible. So, before I output any data, I cycle thru the Tables, Rows, and Columns collections of the DataSet, and get a total cell count. Then, I multiply that number by 30, and I have a fairly rough approximation of how much data the StringBuilder will be holding when finished. This saves me from most of the resizing that will take place on large DataSets.

What did it do for performance? Well, small files are returned almost instantaneously. Larger files are generated in almost half the time it took before I added that simple algorithm. I'm sure it will definitely have an effect when the server is under a heavier load too.

So there you have it. StringBuilders run faster if you call .EnsureCapacity first.

(Original post)

No Comments