Solving the Data Access problem: to O/R map or not To O/R map
On the www.asp.net forums (the architecture section), a person asked in the 'Your favorite O/R mapper' thread, why someone would use a 3rd party component for data-access and why would that be an O/R mapper and if so, which one? I've tried to answer these questions in that thread, but because I think it can be of benefit for more people than just the readers of that long forum thread, I've reworked the text into an article you'll find below. Keep in mind I've tried to keep things simple to understand, so perhaps I've left out a detail here and there, however I don't think these details will matter much to the overall conclusions and descriptions. As I've addressed a couple of questions, which I think are related to each other, I've re-written the forum response as a Q & A.
Q: why would I go out and buy a 3rd party component / library?
A: With every task you have to perform during a software development project, you have to make a calculation: if I perform this task myself, how much time will that take, and given my hourly fee, how much money is involved in it, minus the knowledge I gain from doing it myself and the insight it gives me. The number resulting from that calculation is compared to what a 3rd party component costs + how much time it will take to get used to the 3rd party component + the time to figuring out which component is good + some risk calculation (because a 3rd party component can turn out to be a bad choice after all after a month or so).
This sounds awkward, but it's common sense. It's not always more efficient to go out and buy a component to do things for you, like it's not always more efficient to do things yourself. However without making a simple calculation, it's hard to tell in which situation you're in. Software projects are hard to manage, and without tight cost control, or better: cost insight, it's hard to make a project be run efficient and profitable.
So even if it's tempting to go out and buy a component or use an open source component, is it really more efficient to do so? Often it is, don't get me wrong on that, but don't forget the costs of using a 3rd party component, especially when it's a freebee without any documentation and just a raw example program without a lot of comments.
Q: why should I implement an O/R Mapper in my projects?
A: O/R mapping is in theory very simple: you have a table field and you have an entity field and you define a connection between them and use that connection in your logic to provide functionality like load a class' data, save it, etc. etc. However using solely the terms 'O/R mapping' and 'O/R mapper' is only making things more complicated. The problem description is:
"I have to make a connection between my business logic and my persistent storage, how do I do that?".
The answer: "use an O/R mapper" is not helpful, as it would require knowledge about what an O/R mapper is. If you don't know what it is, how can you judge if an O/R mapper is helpful and if that answer holds some truth? You can't.
The right answer is a question: "how do I see my data?". It's the cornerstone of the answer leading to the solution of the dreaded Data-Access problem. There are a couple of different views on 'data' which result in different ways of how the DataAccess problem is solved. You have:
1) table approach
2) entity (Chen/Yourdon) approach
3) domain model (Fowler/Evans) approach
(these are the top 3. There are others, most of them fall in either of these 3 categories though). 1) and 2) look the same, but aren't. Let's discuss these 3 views more in detail.
1) Table approach
The table approach is the plain 'I use tables and query them' approach. No theory used, just a set of tables, not based on any abstract model, they're created right there in DDL. The developer uses tables and is expecting to work with tables in memory as well, so a plain dataset/datatable approach with stored procedures or VS.NET-generated SQL statements is an appealing approach. Typically, the developers using this approach use terms like 'rows' and 'customer record'. It might sound odd, but this is a very widely used approach on .NET. The reasons for that are that Microsoft preaches it through VS.NET designers and -examples and because in the pre-.NET period, ADO with recordset objects was the way to go.
2) Entity (Chen / Yourdon) approach
The entity approach is different. The relational model is build with an abstract model and is based on theory. This means people speak of entities (or if you want to go really deep into theory, relation) and attributes. An approach with solely DataTables / DataSets is often not appealing, as the relational model speaks of Customer Entity and not about Customer record. Developers using this approach want to use these type of elements also in their code. As they use a relational model as their base of their thinking, the entities by definition don't contain behavior/rules or just low level behavior/rules, like the checkconstraints/unique constraints and other constraints defined like 'shippingdate >=orderdate' or 'id >=0'.
Also important is the way the developers want to utilize the relational model. They understand that the data in the database is just data, and an entity is just a relation based on attributes, which can be constructed dynamically as well, with a select statement. This is important for lists of combined attributes from different entities and reporting functionality. The entity approach uses a combination of O/R mapping for the entity data and generic data functionality like Datasets / DataTables for the dynamic data retrieval requests. The entity approach is also widely used, you see it more in the larger applications as these applications often require a system architect and data analyst. It's proven technology which exists since the late '70-ies of the past century.
3) Domain model (Fowler / Evans) approach
The Domain model is an approach which is the most used approach for solving the Data Access problem in the Java world, but interesting enough rather rare in the microsoft world. This is not that surprising, as in the Microsoft world it was simply unknown: Microsoft never talked about it, the techniques mostly used by the tools used by developers didn't support it very well, so running into it was not that common, only perhaps when you talked about Data Access with Java developers for example. Another reason it is not that widely used, is that it requires an OO approach which wasn't often possible with COM objects and/or VB5/6.
The domain model focusses on domains, like the Customer domainor the Order domain. It starts with classes, like a Customer class, which contains the data for a customer but also all behavior for the customer, so all business rules for the customer are stored there. (This is somewhat simplistically said, there are a couple of variants of course, but for the sake of the argument, let's keep it with this description). Through inheritance you can create a compact model of classes and store the behavior you have to define in the class it belongs in, using polymorphism to override/modify behavior through the hierarchy. The classes / class hierarchy is then stored in a persistent storage, typically a database.
This is a fundamental difference with 2) : with the Domain model, the relational model follows classes, classes don't follow the relational model. Typically, behavior in 2) (and also 1)) are stored in functionality objects like CustomerManager, which embeds the customer functionality, and which is applied to behaviorless entity objects. In 3) you have the behavior in the class, no manager classes. 3) requires an O/R mapper to work with the data in the persistent storage or better: the O/R mapper is required to (re-)instantiate entity objects from their persistent state in the persistent storage. Because the system focus on data is through objects, working with data like in 2) and 1) is not available, it's working with objects.
What's the best approach?
Hard to say. 25 years of 2) in millions of software systems around the world can't be wrong, however millions of software systems in Java using approach 3) can't be wrong either. I think it largely depends on what you think is more logical, how you want to work with data. I'm in camp 2), and our product LLBLGen Pro is a tool which tries to help with 2) by offering both O/R mapping and flexible relational model data access power. It's therefore not a pure O/R mapper as it doesn't fit that much in 3), it offers more functionality, to help with 2) than with 3). Also Paul Wilsons WilsonORMapper is more of a category 2) than category 3) application. More pure O/R mappers, like EntityBroker, DataObjects.NET, NHibernate and others focus on 3) (most of the time).
Don't think lightly about this, the differences are fundamental and will influence how your system structure is designed for a great deal. So it's important to pick the approach which fits your way of thinking. To test how you think about data, ask yourself: "A customer gets the Gold status when the customer has bought at least 25,000$ worth of goods in one month. Where is that logic placed? in which class/classes?". Inside the Customer object, reading inside the customer object order data to test the rule? Or in a CustomerManager which executes rules and consumes customer and order objects?
Also don't let your decision be influenced by "but this example proves x is better than y!": at the end of the day, data is data and not information. Information is data placed into context, and it requires interpretation to give it any value/meaning. How you do that is not important, as long as you meet requirements as: maintainability, scalability and efficiency in development, deployment and perhaps (but not necessarily) performance.
So if your way of writing software is clearly in the Fowler/Evans camp, 3), don't use datasets, don't use a Data Access solution targeting 2) because it will be a struggle: the way of thinking doesn't fit the tool used: you want to drive in a nail with a screwdriver, you should either switch the nail with a screw or use a hammer instead of a screwdriver. So if you're in camp 3), use a pure O/R mapper, it will fit like a glove.
If your way of thinking is clearly in the 2) camp, using a pure O/R mapper can give you headaches when you want to write a lot of reports, you want to use a lot of lists combined from attributes of multiple entities, you need functionality which allows you to perform scalar queries, and an approach which allows you to think from the relational model, so an application which has an approach tailored on starting with the relational model.
Update: Paul Wilson explained that his mapper is more of a category 2) than category 3) application. I've changed that in the article.