Thoughts on how to bind to databases, or: Databases as services
There´s much talk about O/R mapping tools lately in the .NET world. ObjectSpaces are slowly materializing and more and more 3rd party vendors are offering their solutions. But when looking at the different solutions I´m still feeling kind of uneasy. But today I have been able to lay my finger at least on part of the reasons for my uneasyness: I´m not yet satisfied with the way we bind our programs to databases.
We are used to binding components to other components. And this has become even easier and more robust with the .NET Fx. We simply reference a component in our project (within VS.NET) and immediately the types defined therein become visible/usable, as if they were defined within our project.
Now the same works even for Web Services: We point to some metadata file (WSDL) - and presto, we get a proxy class generated automatically and from then on it seems, as if the services was part of our project.
So my question is: Why are we not able to do right the same with a database?
Why can´t I reference a database in my project - and presto, I get classes generated automatically? Or why can´t I generate some kind of assembly from a DB schema which I can then link into my project?
Why not have something like WSDL for databases? Let´s call it DBDL. Or maybe we have it already and it´s RSD? Ok, that would be fine for me (for now). So why then is there no tool generating classes from RSD? (No, Typed Datasets don´t do the job.)
Let´s see, what do we have for Web Services:
-WSDL describes methods exposed by a service.
-WSDL describes the data passed to/from those methods.
-A proxy class is generated upon referencing a WSDL document.
-SOAP defines the wire format for messages passed to/from services.
-System.Net is the API behind a Web Service proxy class.
So why not deal with databases in a similar way? (At least in most cases.)
-DBDL could describe the "methods" exposed by a database, i.e. stored procs.
-DBDL could describe the data passed to/from those methods.
-DBDL could describe the data accessible by standard methods (SQL select, insert, delete, update), i.e. tables/views, relationships.
-Persistent classes could be generated from the DBDL, and their implementation could differ depending on the datasource or on the persistence technology.
-SQL would be the wire format for messages to the database.
-ADO.NET would be the API behind the persistent classes.
For the same db schema, different DBDL documents could be written, depending on what a DBDL consumer needs to do with the database.
With Web Services nobody wants to link an arbitrary existing class to a Web Service. Instead Web Service proxy classes have just this single purpose of communicating with the remote service. Nobody really cares about their implementation or which class they derive from.
Why should it be different for database access? Why would I want to persist an arbitrary existing class? (Of course there are scenarios where you want/need to do it. But many, many developers will never encounter them.) So I think, generating persistent classes from DBDL would be perfectly fine.
Of course in the end DBDL would not be enough. There still is the question of the persistence API (e.g. what method to call on which object to persist an object?). But I think, that´s easy to solve.
So what I´m looking for is simplicity. I´m a friend of Occam´s Razor. So less entities is better than more. If there is an established programming model to bind to functionality, then why not use it for databases too? Why not view databases as a kind of service? Since we´re not talking about scenarios where cursors are needed, but disconnected data handling, a table definition or a SQL select statement is nothing more than a definition of a collection of objects of a certain type, e.g.
select customerid, companyname from customers ...
can be expressed as
class customers
public customerid as string
public companyname as string
end class
class customersCollection
implements IList
...
public property item(index as integer) as customers
...
end property
...
end class
Quite a few O/R mapping tools are of course doing a great job - but (as far as I know) they still make database access something very special. Of course, there needs to be a special API, persistence is not transparent, but getting to the point when I can start programming in a strongly typed manner against a certain database still takes very long.
So my bottom line is: Not all our tools need to change. But our view of how to work with databases.
Maybe it´s not a large step, but still, I think, some step(s) need to be taken.
PS: By the way, one of my premisses is, that there are several kinds of database access needs/strategies. Some things can only be done with serverside cursors, for example (e.g. reporting on huge amounts of data). Some things, on the other hand, can be done in a disconnected manner by caching data (e.g. with DataSet). I´m concerned with the latter, which makes up a large part of many business applications. The former does not need an object oriented access to data.
Another of premise of mine is, that O/R mapping should not try to accomodate any and all existing database schemas. O/R mapping should limit itself to a number of structures/patterns. If, then, for some reason (e.g. performance) a database needs to go beyond them, it cannot be handled/mapped to objects. Or to say it the other way round: databases to be O/R mapped to objects should adhere to some rules (e.g. every single table has an explicit surrogate primary key). Since rules for database modelling are not new (we all know normalization rules), a couple of more rules should not appall db modellers.