Handling Updates in Service Oriented Architectures
The title should be ‘updates in disconnected distributed applications’, but it will probably get more hits with ‘service oriented’ in the title ;).
I had an interesting conversation with one of the guys who coded Shadowfax about how to define messages in a SOA. It was clear for him that the right way was to write simple .net classes that mapped to the messages. That was the ‘clean’ solution.
I agreed with his point of view, but I wondered which was the right way to handle updates if you wanted to work with simple messages and types. I know that using DataSets (diffgrams really) works well, but I see the problems with diffgrams from an interop point of view (yes, it’s just XML but is not very friendly to not .NET platforms).
From a purist point of view, the order message should be as simple as possible, but to be able to handle updates correctly I need two kinds of ‘out of band’ information. First, if I want to use optimistic concurrency, I need the previous values or a timestamp (depending on the case one could be better than the other). Second, as the Order can have multiple lines, I need a way to know which lines were added, modified or deleted. Without this information I cannot have a generic way to handle updates.
On the other hand, getting optimistic concurrency exceptions is really a bad thing. If the service call is synchronous and the exception goes up to the end user, then she needs to deal with that message and he really does not have a good way to handle it (he’ll probably retry). If the call is asynchronous, then you need to redirect somewhere so it gets reviewed by someone/something.
Let’s see what kind of optimistic concurrency errors we can get. Some interesting examples are discussed here.
There are some cases where the most reasonable thing to do is to take a ‘the last win’ approach. If two users change the name for a user, then the ‘last win’ approach seems reasonable. If one of the users gets a concurrency exception, he’ll probably confirm his changes anyway, so the last will win.
When the fields are involved in business rules (like the Item price) or when they are fields that have aggregated values (like an Item Inventory) then things are different, but we probably don’t need the values read from the database to perform the right action.
For example, if you want to update the Inventory for an item that is sold frequently, is very likely that you’ll get an optimistic concurrency exception, because the inventory in the database is probably different than the one you read. The fact is you really don’t care if the Inventory is different than the one you read. What you need to ensure is that there’s enough inventory to sell the item. In this case, we don’t need the old value for the ItemInventory to handle it.
Another example could be if the Item price is different than the one I read when I save the invoice. If the price is less than the original, then there’s probably no reason to throw an exception. If it’s greater, then I should. But in this case I still don’t need the two copies of the ItemPrice. As the Invoice cannot change the Item Price, getting the value that was read is enough to check this.
Intuitively I think that in most cases (I’d say in every case but I don’t dare ;), using a ‘last-win’ approach for ‘descriptive’ attributes and applying logic for attributes involved in business logic should work.
If this is true, the good news is that we don’t need the previous value or a timestamp field in the message to be able to save the field consistently, and we’ll greatly reduce the number of concurrency exceptions. The problem is that it requires thinking harder about the concurrency scenarios that could be present in the application. If not, you’ll probably get inconsistent data. It’s a dangerous approach. Having the old values or using a timestamp is a safer approach, and it works in most of the cases (unless you are in the high concurrency case).
The other problem we have with pure messages is that in hierarchical messages we don’t know what was added/changed/deleted. I don’t see a good way to handle this. Thinking outloud:
• I could delete all the child records and add the new ones. This has performance problems, and more important, deleting a row can trigger a cascading delete in a table that is not included in the message, and that row cannot be recreated.
• I can try to infer the state for each row:
o Rows with an invalid primary key value are new rows (i.e, rows with negative ids)
o Rows that exist in the database need to be updated (this requires checking if every line in the message exists in the database ).
So, I still don’t have an acceptable solution for this problem if I want to stick with ‘pure’ messages.
Of course this won’t be a problem if we don’t update the database. This seems to be the way that is used by some of the SOA gurus. Working with a database with only inserts solves this problem, but it adds significant complexity to the application (instead of joining by foreign key values you need to join by foreign key + date), and only works if the database was designed that way. Is not easy to apply this approach with existing databases and applications.
As a summary, if we stick with simple classes for the messages, we’ll need to:
• Add logic to handle potential concurrency problems that won’t be handled by optimistic concurrency
• Find a way to know what operation to apply to each row, or have a flag in each row indicating what happened to that row (and we don’t have simple classes anymore).
• Only insert in our database, adding a lot of work at the application level.
Another solution could be to build a(nother) WS* standard for serializing diffgrams and use them. Even if there are going to be scenarios where other solutions would be better, the diffgrams way makes everything easier. And that’s what we need.