Bend Message Deduplication on Azure Service Bus to Your Will
Duplicates detection functionality provided by Azure Service Bus can automatically remove duplicate messages sent to a queue or topic. Deduplication is always based on the value of the MessageId property. No other property can participate in deduplication.
In the real world, message deduplication can often depend on things that are part of the message payload itself. Let's say we process orders*. Deduplication would rather be based on the order ID and not message ID. There are a few creative solutions that allow custom deduplication. For example, perform deduplication outside of ASB broker by manually inspecting message payload and marking it as a duplicate. For example, using Azure Functions and Storage tables**. While approach like this one works, it has several drawbacks:
- Unnecessary intermediate steps
- Performance decrease
- No ability to take advantage of highly optimized and performant native deduplication
What's the solution?
Use native deduplication!
Wait, but isn't native deduplication limited to solely message ID?
Glad you've asked. Absolutely. It is. Though let's look at the MessageId
property of the BrokeredMessage
. It's a read/write property, meaning we can set it to custom values.
Custom value you said?
Let's read a bit more of the ASB documentation on deduplication.
Solved! To deduplicate order messages on OrderId
, we'll assign brokered message MessageId
property the value of OrderId
. Done. Now order messages will be deduplicated on order IDs***.
Hold your horses! What if I need to deduplicate based on several values from a message?
Same as with order id. Combining all property values and assigning as MessageId
. Except that there might be a size issue.
Size issue?! Yes. BrokeredMessage.MessageId
is limited to 128 characters. Would that be a deal breaker if generated ID needs to be more than 128 characters? Not at all. As a matter of fact, the entire payload could be used for deduplication. Here's an example:
var payload = serializerOfYourChoice.Serialize(payloadObject);
var msg1 = new BrokeredMessage(payload);
msg1.MessageId = CreateDeterministicIdFromHash(payload);
msg1.Label = "1st";
await sender.SendAsync(msg1).ConfigureAwait(false);
var msg2 = new BrokeredMessage(payload);
msg2.MessageId = CreateDeterministicIdFromHash(payload);
msg2.Label = "2nd";
await sender.SendAsync(msg2).ConfigureAwait(false);
The sample creates a GUID like ID by making an object hash using serialized object. For example, using JSON.Net you could get the serialized object and pass it to CreateGuidLikeIdFromHash
to provide the deterministic ID that is based on a hash. As result of this snippet, there will be only one message received when a queue has deduplication enabled.
CreateGuidLikeIdFromHash
method could be implemented in the following way:
static string CreateDeterministicIdFromHash(string input)
{
var inputBytes = Encoding.Default.GetBytes(input);
// use MD5 hash to get a 16-byte hash of the string
using (var provider = new MD5CryptoServiceProvider())
{
var hashBytes = provider.ComputeHash(inputBytes);
return new Guid(hashBytes).ToString();
}
}
[Update: as Clemens Vasters pointed out correctly, MD5, or any other cryptography hashes, should not be used for non-cryptographic purposes. Data.HashFunction library offers number of non-cryptographic hashes that can be used instead.]
Et voilĂ . Now you can leverage native ASB deduplication using your custom data from the message itself without unnecessary intermediaries or performance impact.
Stay deduplicated!
* as pointed out by one of the readers, order might not be the best example. Keep in mind, this is just to serve an example, not solve world's problems :)
** deduplication with Azure Functions sample by Michael Stephenson
*** RequiresDuplicateDetection
needs to be set to true
along with DuplicateDetectionHistoryTimeWindow
set to the time span duplicates detection is taking place per message