O/R Mappers: Base Class or Not ?
One of the main criticisms I've seen of most O/R mappers is that they typically require you to have your entity objects inherit from a base class provided by the mapping framework. This can be a headache if you have your own class framework, and it can also contribute to other problems if you aren't careful. I started out to also require a base class, since it does make things easier for the mapping framework, but I noticed that Microsoft does not have this restriction in ObjectSpaces.
Why does a base class make things easier? It easily allows you to maintain any state required by the mapping framework in the entity objects themselves. What type of state do you need to track? You at least need to track whether or not the object is new or existing so you can decide to do an insert or an update. You also need to track if the object has been marked for deletion if you don't process that immediately. Why would you not want to process a deletion immediately? Think transactions and rollbacks mainly. Finally, you also need to track the original values of the entity object's members if you are going to do anything fancy. So far I only use the original values for checking if an object has changed and to enable cancelling changes (which seems to be missing from ObjectSpaces). I can also change my implementation easily to generate more specific updates or introduce optimistic concurrency in the future since I've tracked the original values.
So what can go wrong with a base class? First, some customers do have their own class framework, like it or not, and is it really the O/R mapping framework's business to dictate their own base class if its avoidable. Next, if you agree that the Manager Design Pattern is the approach to take, and you don't plan on actually having Save and Delete methods in your base class, then you also have to question if the state of the entity objects required by the mapping framework belongs in the objects or the manager. I'm not saying its wrong to put the state in the objects here, just that its a fair design question to be asking at this point. OK, but I asked if anything can actually go "wrong" with the base class approach? Well, how many copies of a single instance of an object do you think are valid? A base class approach lets you retrieve multiple copies of the same instance of an object, each with it own state, adding overhead and possibly causing concurrency issues. On the other hand, relying on the manager to track state, without a base class, means that there can only be one copy of each instance of an object that is actually being tracked. Another possible problem with the base class approach is that it may (or may not) make your entity objects non-serializable and/or non-remotable. I suppose that problem goes away if the base class inherits from ContextBoundObject, but that adds its own rather heavy overhead, so one has to ask if that's really the best answer.
So what can go wrong without a base class? Now you have a central manager object that is tracking the state of all your entity objects, but how does it know when to release all this state its tracking? This seems to be the biggest problem I found as I started down my implementation witout a base class, and I think I got it right. First, you need to track whether or not there is actually any actual object that still exists that needs its state to be tracked, without actually holding a reference to the object itself. This is exactly what the WeakReference type is for in the .NET framework. It lets you hold a "reference" to an object while still allowing the garbage collector to ignore your "reference" to it. I certainly can't begin to understand half the code actually used in ObjectSpaces when I look at it, but I did confirm that this was also the solution that Microsoft was using, which makes me feel a little more comfortable. Next, that doesn't actually remove the state that the manager is tracking -- it simply makes it possible to determine the state is no longer needed. So we now have to introduce a thread that will periodically go out and actually check on these WeakReferences and remove the state that is no longer necessary. I created a simple timer for this, and I made the timer's interval configurable -- something that I don't see in ObjectSpaces, although it may be there somewhere. One issue that remains is what to do about server based systems (web, web services, remoting) where the manager will not be able to ever keep WeakReferences? My solution is to also track the time of last access and only remove state that is older than a configurable session time, which defaults to zero for normal in-process desktop applications. I have no clue whether or not Microsoft has also thought about this in ObjectSpaces, although I certainly hope they have if they are going to have ObjectSpaces work in distributed systems.
So, I chose the no base class approach for my O/R mapper, since its what many people want, its what ObjectSpaces will be delivering, and I think its the overall more sound approach. I'm still pretty new at this, so maybe I've missed something, but I think I'm on to something since Microsoft is also taking this approach, although I'm sure everyone can agree they've been known to have missed the boat at times. What do think?