Developing Linq to LLBLGen Pro, Day 4
(This is part of an on-going series of articles, started here)
I hear you thinking... "In the Netherlands, days apparently have 168 hours" . Well... no (really?). Today is officially my 4th day I work on Linq support. The past couple of days since the previous post I've been working on Derived Table support in LLBLGen Pro, so Linq queries are creatable without excessive re-formulating of the expression tree (for what this is all about, see the post on day 2)
Did that take 7 days? Yes, more or less. Of course, writing the final design and writing the final code didn't take a whole week, perhaps a couple of hours. The main thing is that adding such a feature which potentially affects a big deal of the inner workings of an O/R mapper can be challenging. As I discussed in the post on Day 3, LLBLGen Pro's query API has some nice things like it can figure a lot of things out for you, so you don't have to specify them so you avoid potential bugs. The downside is that it can sometimes be a challenge for the inner code if the query API has to figure out something which isn't specified. Adding Derived Table support brought one problem to the table: the select list fields have to be able to use a source alias to be able to make a projection in SQL from the Derived Table data. LLBLGen Pro's API didn't allow you to specify this alias as it was unnecessary to do so before, till now.
Changing an O/R mapper's core isn't about firing up VS.NET and hammering out unit tests and altering code here and there. It's about defining a couple of things before even looking at the code:
- What the feature is all about. This means: which changes are to be expected in the public API. This can be tested up-front with mocks and unittests. This is the easy part and is often over before you know it.
- Which parts of the design are involved in the API elements which will be changed? This requires the developer to look at the design of the whole O/R mapper pipeline to see where things happen, why they happen there and which elements will be affected if the API change will be applied to the codebase
- Which steps to take to migrate the codebase to the version with the new feature/change. This step involves careful planning what to change and why. This step takes the main chunk of the time. The main issue is that an API change in an O/R mapper is often affecting multiple elements in the system, and if you're unlucky elements on all levels of the pipeline. To be successful in this, you have to know up front what you're going to change. The reason is simple: if you don't know what you're going to change, the chances are you'll change a lot more than you initially might have expected, and in the code editor it's not always possible to overview the scope of a code change you've made right there in the editor, as the change might affect a lot of call chains.
The Agile troopers might call me a lame duck now, because they'll say they have the unit-tests to proof they made the right changes. However this is pretty meaningless in a system with long call chains like in an O/R mapper, which has several sub-systems, all involved in the complete process. Because of these call chains, the number of different call chains you can have is practically unlimited. This means that there are never enough unit-tests to test everything: even with 100% code coverage, you can never be sure the system works, simply because the number of scopes the system can be used in are unlimited (think of all the possible different queries one can write in SQL for example). You need different methods to be sure your code works: proof that your code is correct. Sure, unit-tests are necessary, and we have a truckload of them. They might spot a change you shouldn't have made. But they also might not: a unittest only tests what is in the unit-test. Testing a routine is OK, however testing a whole call chain of routines to be correct, requires for every use-case they might be used in a different test. Can you imagine how many tests that will be? That's right, an unlimited number of tests.
It's key that the changes to apply are known when you will apply them. This makes the whole migration process deterministic, and that's preferable, because you then know what the end result will be. So, we thought we had a solution to our problem, we defined the steps to take, which code to alter, and altered the code. Tests were run. They succeeded. All is well, you might think? Well.... not quite. As discussed previously, our inner code needed to find the alias to use for a field in the select list as that alias wasn't specified. However, there are queries thinkable where this is impossible because there is ambiguity: which column to pick from the total set of columns in the FROM clause of the SELECT statement if the column appears twice? So, after 5 days or so of work, which looked like they delivered the right code, we had to roll back our changes and start over. Our unit-tests proved our code was OK, but as I described above: there are so many many different queries thinkable which all will affect their own order of events inside the O/R mapper, it's undoable to write for every one of them a couple of unit-tests. So you write the tests for the more common ones and proof that the code is correct for the rest. Trying to proof the code was correct made me ran into the problem of having multiple times the same column in the FROM clause.
Is proving that the code is correct hard? No. It's sometimes tedious work, but it's not hard. It comes down to two things: a) reading the code written and checking if that code is indeed correctly implemented and b) proving that the algorithms used are the right ones for the job.
This last part might be hard, but it's not that hard. The trick is to think in 'What will make this algorithm do the wrong thing'. When you think about O/R mapper query pipelines, it's obviously in the line of 'what will make this query pipeline deliver the wrong results'. Often developers are solely focussed on 'what will make this algorithm/code do the right thing', but that's often much harder to do, as 'the right thing' might be a solution space with an unlimited number of elements. So I looked for queries which could break what we had designed. It took a while but we found at least one (in this case it's not that obvious as it might look like). As soon as you find a way to break your design, go back to the design and reason why the design breaks. One of the biggest mistakes you can make is to 'patch' the design for the situation you found that breaks it, because what if there's another way to break it as well? If you reason why the design breaks, you will often realize what the real flaw is and will come up with the right solution. It's like scientific proof: scientific proof isn't focussed on proving that something is correct, it's focussed on proving that something can't be wrong.
It turned out what we wanted wasn't possible: the query we found was proof that our solution to the alias problem was broken. So we had to alter our design so that the developer has to specify the alias for the select. Let's go back to our migration plan step. Because this migration plan was thought out, we knew what we had to roll back and what we had to alter instead to make it reflect our design. The fun part is: this design is provable to be correct in all situations. All that's left after that, is manual labor: typing code in an editor: making the code changes and writing the tests to see if the code changes were made correctly. Mind you: the tests aren't meant to proof the design works as expected: as said before, that's impossible in code because there are an unlimited number of different call chains. The tests are meant to proof that the code works OK, i.e. if we typed it in correctly. Confusing? Perhaps I wrote it down in a not that clear way, but what I wanted to explain was that the code isn't the design, the code is the result of the design. If the design has flaws, you can test the code to be correct, and it will be correct with respect to the design, but will still be wrong with respect to reality.
Let me show the final code how Derived Tables will work in LLBLGen Pro v2.6 (the release with Linq support).
// C# // first specify the elements in the derived table select (which is a dyn. list) // these are: fields list, filter, relations, group by, sort expression, limiter and // the alias for the table. No paging. ResultsetFields dtFields = new ResultsetFields(2); dtFields.DefineField(OrderDetailFields.OrderId, 0); dtFields.DefineField(new EntityField2("Total", (OrderDetailFields.Quantity * OrderDetailFields.UnitPrice), AggregateFunction.Sum), 1); GroupByCollection dtGroupBy = new GroupByCollection(dtFields[0]); DerivedTableDefinition dtDefinition = new DerivedTableDefinition(dtFields, "OrderDetailTotals", null, dtGroupBy); // then specify the relation between entity and derived table, in code. DynamicRelation relation = new DynamicRelation(dtDefinition, JoinHint.Inner, EntityType.OrderEntity, "O", (new EntityField2(OrderDetailFieldIndex.OrderId.ToString(), "OrderDetailTotals") == OrderFields.OrderId.SetObjectAlias("O"))); // then specify the rest of the query elements RelationPredicateBucket filter = new RelationPredicateBucket(); filter.Relations.Add(relation); // alias for the fields which will be used for the select list to fetch orders. filter.SelectListAlias = "O"; filter.PredicateExpression.Add(new EntityField2("Total", "OrderDetailTotals") > 5000); // then fetch the data EntityCollection<OrderEntity> orders = new EntityCollection<OrderEntity>(); using(DataAccessAdapter adapter = new DataAccessAdapter()) { adapter.FetchEntityCollection(orders, filter); } Assert.AreEqual(38, orders.Count);
Now, this might look a little cumbersome, with the relation defined in code. However, it's a way to define a relation between unrelated elements in a typed way, compile time checked (except for the derived table alias of course). The key was the filter.SelectListAlias. I had to make just 1 tiny breaking change to make this all work, a change hardly anyone will run into. Defining the relation in code isn't normally necessary: LLBLGen Pro generates the relations for you, so it's just picking a property on a static class, however with derived tables joined to entities, these aren't generated up front of course so the relation has to be written out in code. The upside is that as it's an object structure, everything is very flexible and there aren't any limits on how the derived table looks like, how many relations are in the join list etc.
Linq providers and the Visitor Pattern
So, now these Derived Table fellows are on board, we can again start focusing on Linq code! After setting up the projects properly, I was ready to dig deeper into Linq territory. It turned out that, no surprise there, the key object to focus on is the IQueryProvider implementing class. Some sites already reported that the pattern to use is the Visitor pattern. The visitor pattern is a pattern which is normally waved away with the words "I understand all the GoF patterns and their usage, except this visitor pattern, I never saw a use case scenario for it". The main reason for this is of course that the scenario to use it in is uncommon for most of the developers out there: traversing trees with nodes of various types where both the node type and the visitor (the node traverser of the tree) are under control of the developer. Linq's Expression trees however seem like the perfect candidate for the Visitor pattern.
Looking at the pattern as defined in the GoF book, you'll quickly see that what should be used with Expression trees isn't really the visitor pattern: the visitor isn't passed to the node: the dispatching of which handler to call based on the type of the node is outside the tree, not inside the node. This seems odd at first and one could quickly dive into the seemingly infinite "Microsoft again didn't understand a pattern and cooked up its own hocuspocus-pattern instead" way of thinking. However that's not really fair: Microsoft couldn't do any other thing here: the thing is: if they add a new node type, the main visitor class also has to be adjusted, but that class isn't Microsoft's, its the class of the provider writer. So to avoid this problem, one could argue that Microsoft should have provided a generic Visitor class so it would be their class. They decided not to do that apparently so the only other way is to force the provider writer to have a dispatch routine outside the nodes.
The outside dispatcher has to check which Expression type it deals with and call the right handler. Now, MS has created several Expression classes so you could use polymorphism here. However, there are more expression tree node types than there are expression class types: the BinaryExpression for example handles a lot of node types: all of them are BinaryExpression instances. So what should be done instead is having a big switch statement on the node type and call the right handler routine for the expression class. Big switch statements are often a red flag, a code smell.
The reason for this is: the expression node type is an enum. This means that if you want to invent your own Expression classes, you can't extend the mechanism used here, as you can't inherit from an enum to extend it. I don't quite understand why they went the enum route instead of creating subclasses of the main expression classes for the expression node types, as that would have made the system more extensible. However how many people will write their own Expression classes? Not that many I think (only provider writers). So for this, one could argue it doesn't really matter much.
So I'm currently looking at a nice switch statement which calls handler routines, all that's left is filling in those handler routines. The fun can finally begin!