Software Transactional Memory VII - Automatic retry of failed transactions

Sunday, August 5, 2007

My previous posting on Software Transactional Memory (STM) I concluded with the remark, NSTM was not finished. How true! Here is the next release of NSTM with a couple of improvements. You can download it from Google´s project hosting site. Here´s what´s new:

Validation matrix

As mentioned in an earlier posting I was not quite satisfied with the validation strategy of NSTM. Even to me it was not entirely clear, when a transactional object (txo) would be validated. I improved on this situation in the lastest release of NSTM (rel. 1.0.0.222) by implementing this validation matrix:

When	Condition (isolation level + clone mode)	What (read mode of txo)
Validate on read and on commit	serializable + cloneOnWrite	ReadOnly (on read), ReadOnly+ReadWrite (on commit)
Validate on commit only	serializable + cloneOnRead	ReadOnly+ReadWrite
Validate on commit	readCommitted + cloneOnRead	ReadOnly
no validation	readCommitted + cloneOnWrite	-

Now you can tweak a transaction´s independence of others very clearly: You can make it "subordinate" and fail as early as possible, i.e. as soon as it detects another transaction has changed a value it has read. Or you can make it very "dominant" by not caring for changes by other transactions and even rigorously overwriting them.

Automatic retry of failed transactions

With databases collisions of transactions are pretty rare. It´s unlikely that two transactions change data in the intersection of their working sets. There´s usually so much data so the intersection is very small or even non-existend. When you commit a database transaction you can be quite sure it will succeed. That´s why optimistic locking has become the only "locking strategy" left in ADO.NET.

In-memory transactions, though, are different. The total amount of data on which concurrent transactions work is much smaller than with databases. In addition, in-memory data structures like a queue or stack or list hinge on just a few data items (e.g. references to the first/last element) which are under heavy pressure if several transactions concurrently add/remove elements.

If for example two transactions concurrently add elements to a queue thereby updating the same data item (reference to head of queue) it´s likely that one of them fails due to a validation error. However, had this failed transaction been run just a couple of milliseconds later, it would have succeeded, since it would have read its values after the other transaction´s commit.

Since collisions/invalid transactions seem to be more likely with NSTM compared to databases, and because the remedy is easy - just repeat the failed transaction -, a remedy should be developed. This remedy is automatically running transactions again on failure due to invalidity, i.e. automatically retrying them.

That´s why I added ExecuteAtomically() to NstmMemory. In its simplest form it just executes a delegate synchronously within a transaction (the delegate does not take any parameters and is of type System.Threading.ThreadStart):

    1 NstmMemory.ExecuteAtomically(
    2                 delegate
    3                 {
    4                     ...
    5                 }
    6                 );

It´s the same as if you wrote

    1 using (INstmTransaction tx = NstmMemory.BeginTransaction())
    2 {
    3     ...
    4     tx.Commit();
    5 }

However, if you want to guarantee the success of a transaction, then you force it to be retried indefinitely:

    1 NstmMemory.ExecuteAtomically(
    2     true,
    3     delegate
    4     {
    5         ...
    6     }
    7     );

true as the first parameter tells ExecuteAtomically() to execute the delegate again, if the Commit() on the internal transaction failed during validation. A validation failure is recognized by ExecuteAtomically() by receiving a NstmValidationFailedException from Commit().

By just requesting automatic retry like above there is no limit on the number of re-executions. So be careful! Some pattern of concurrent transactions might cause one of them to retry again and again and thus cause a thread to hang.

If you want to avoid such pitfalls, call the method with fine grained parameters to tweak the retries. Here´s the definition of the most comprehensive method overload:

    1 public static void ExecuteAtomically(
    2     NstmTransactionScopeOption scope,
    3     NstmTransactionIsolationLevel isolationLevel,
    4     NstmTransactionCloneMode cloneMode,
    5     int maxRetries,
    6     int sleepAfterRetryMsec,
    7     int maxProcessingTimeMsec,
    8     System.Threading.ThreadStart task
    9     )

In addition to the usual transaction properties you can tell ExecuteAtomically() to...

retry just a fixed number of times (maxRetries),
or to retry and arbitrary number of times, but not to try longer than maxProcessingTimeMsec milliseconds.

If you want to give other threads time to do their job between a transaction´s retries, pass a delay to the method (sleepAfterRetryMsec).

To switch of any of these constraints, assign int.MaxValue to maxRetries and System.Threading.Timeout.Infinite to the millisecond parameters.

Of course you can also combine the parameters, e.g. by limiting retries to a maximum number and a maximum processing time. No further executions will be tried if either of these limits is reached.

If an exception is thrown during the transaction, the retries are aborted and the execption passed up to your code. If the retry limit is reached without a valid commit a NstmRetryFailedException is thrown.

Enjoy!

Hi Ralf, I've been investigating STM's and am planning on writing some articles on the topic for Code Project, in addition to continuing my article series (2 so far) on inter-task messaging. I would be interested in having you participate in the articles on the STM, and I'm also interested to know if the source is available for your work, as it looks quite excellent. Please feel free to contact me at marc[dot]clifton[at]gmail[dot].com. Thank you.

Marc Clifton - Thursday, January 17, 2008 1:04:37 PM

Sleep after retry or before retry?

1. First try
2. Retry immediately
3. Sleep
4. Retry
5. Sleep
6. ...

or

1. First try
2. Sleep
3. Retry
4. Sleep
5. Retry
6. ...

Ast - Saturday, January 26, 2008 9:38:24 PM

Validation matrix

Automatic retry of failed transactions

2 Comments