O/R Mappers: Maximum Performance
I now have my WilsonORMapper (v1.0.0.3) performance very comparable to that of DataSets. In some cases I am actually beating DataSets, with or without private field reflection.
My tests compared DataReaders, DataSets, and my ObjectSets, both using my new optional IObjectHelper interface for direct access, as well as testing private field reflection. Each run consisted of a number of repetitions of loading the appropriate structure from a database table and then looping through the resulting structure to access the values. The database table consisted of 10,000 records filled with random data that I generated, with the table fields consisting of 2 ints, 3 strings, 1 float, 1 datetime, and 1 bit. The numbers posted all represent loading 10,000 records, but cases varied from 1 record repeated 10,000 times, to 100 records 100 times, and finally 10,000 records only 1 time. The tests were ran many different times, and the numbers were always consistently similar. I also tested a 100,000 record table, and the numbers were similar, just 10 times bigger.
Notice first that hitting the database many times for one record is noticeable slower. Next, note that DataSets are pretty much always twice as slow as using a DataRepeater. If you want to load a single record then my WilsonORMapper beats a DataSet hands down. This remains true even in the case where I continued to use private field reflection. On the other hand, my O/R mapping was 50% slower than the DataSet loading 100 records, and 75% slower than the DataSet when 10,000 records were loaded, using direct access. The numbers were another 2 times slower when I allowed the private field reflection. So performance varies depending on the number of records, although keep in mind that my WilsonORMapper supports paging to make the larger number of records easily manageable. I also added a new GetDataSet method that returns DataSets and performs just as good.
Why does my O/R mapper still perform a little slower than DataSets with many records? No matter what I did, almost every millisecond could be attributed to the fact that my mapping framework stores a second copy of each value in the manager for its state. This state allows you to check if the values of any entity object has been updated, as well as giving you the ability to actually cancel all the changes made to an object. I may also someday use these extra state values for more specific dynamic sql updates. On the other hand, large DataSets load faster initially since they don't load twice, but they also may have larger overhead in the long run as they track all later changes. DataSets also have a considerably larger serialization when they are remoted, so you should also consider this additional overhead that occurs in distributed environments.
What did I do, other than implementing the IObjectHelper interface, for performance? The biggest performance change I made, far bigger than reflection, was changing all of my foreach loops over hashtables to instead be regular for/next loops over typed arrays. The next biggest performance gain was changing a hashtable key from a struct to a string, which could not be just a regular object since each object instance was always different. Next, and still making a slightly better performance impact than private reflection, was accessing each datareader value only one time, even though I had to store it twice. I also now save the reflected FieldInfo, for the cases when reflection is still used, which did make a small but measureable difference, contrary to Steve Eichert's report. And of course, you can now implement the IObjectHelper interface to avoid reflection.
I also made a few other observations in the course of all this performance testing. Most surprising to me was that I could not find any significant difference between accessing datareader fields by index or by name -- it was barely measureable at all. I also confirmed that there was no measureable difference between sql and stored procs. Next, while Steve Maine noted the huge differences that private reflection can make, it is still a relatively small part of the bigger picture that includes data access. This is in agreement with several comments I received that there is a whole lot more going on in an O/R mapping framework than just worrying about how the values are set. Also note that public and protected field reflection was hardly noticeable in tests. But overall, the little things like foreach and boxing were the worst offenders.
So if you were first concerned that my WilsonORMapper didn't have adequate performance, then think again and download the demo for yourself (the demo only runs in the debugger).
# Records | 1 | 100 | 10,000 |
# Repetitions | 10,000 | 100 | 1 |
| |||
DataReader | 1.91 | 0.14 | 0.11 |
DataSet | 3.69 | 0.21 | 0.21 |
OR Direct | 2.29 | 0.31 | 0.37 |
OR Reflect | 2.78 | 0.75 | 0.81 |