Simple way to cache objects and collections for greater performance and scalability

Sunday, November 1, 2009

.net C# LINQ

Caching of frequently used data greatly increases the scalability of your application since you can avoid repeated queries on database, file system or to webservices. When objects are cached, it can be retrieved from the cache which is lot faster and more scalable than loading from database, file or web service. However, implementing caching is tricky and monotonous when you have to do it for many classes. Your data access layer gets a whole lot of code that deals with caching objects and collection, updating cache when objects change or get deleted, expire collections when a contained object changes or gets deleted and so on. The more code you write, the more maintenance overhead you add. Here I will show you how you can make the caching a lot easier using Linq to SQL and my library AspectF. It’s a library that helps you get rid of thousands of lines of repeated code from a medium sized project and eliminates plumbing (logging, error handling, retrying etc) type code completely.

Here’s an example how caching significantly improves the performance and scalabitlity of applications. Dropthings – my open source Web 2.0 AJAX portal, without caching can only serve about 11 request/sec with 10 concurrent users on a dual core 64 bit PC. Here data is loaded from database as well as from external sources. Avg page response time is 1.44 sec.

After implementing caching, it became significantly faster, around 32 requests/sec. Page load time decreased significantly as well to 0.41 sec only. During the load test, CPU utilization was around 60%.

It shows clearly the significant difference it can make to your application. If you are suffering from poor page load performance and high CPU or disk activity on your database and application server, then caching Top 5 most frequently used objects in your application will solve that problem right away. It’s a quick win to make your application a lot faster than doing complex re-engineering in your application.

Common approaches to caching objects and collections

Sometimes the caching can be simple, for example caching a single object which does not belong to a collection and does not have child collections that are cached separately. In such case, you write simple code like this:

Is the object being requested already in cache?
- Yes, then serve it from cache.
- No, then load it from database and then cache it.

On the other hand, when you are dealing with cached collection where each item in the collection is also cached separately, then the caching logic is not so simple. For example, say you have cached a User collection. But each User object is also cached separately because you need to load individual User objects frequently. Then the caching logic gets more complicated:

Is the collection being requested already in cache?
- Yes. Get the collection. For each object in the collection:
  - Is that object individually available in cache?
    - Yes, get the individual object from cache. Update it in the collection.
    - No, discard the whole collection from cache. Go to next step:
- No. Load the collection from source (eg database) and cache each item in the collection separately. Then cache the collection.

You might be thinking why do we need to read each individual item from cache and why do we need to cache each item in collection separarely when the whole collection is already in cache? There are two scenarios you need to address when you cache a collection and individual items in that collection are also cached separately:

An individual item has been updated and the updated item is in cache. But the collection, which contains all those individual items, has not been refreshed. So, if you get the collection from cache and return as it is, you will get stale individual items inside that collection. This is why each item needs to be retrieved from cache separately.
An item in the collection may have been force expired in cache. For ex, something changed in the object or the object has been deleted. So, you expired it in cache so that on next retrieval it comes from database. If you load the collection from cache only, then the collection will contain the stale object.

If you are doing it the conventional way, you will be writing a lot of repeated code in your data access layer. For example, say you are loading a Page collection that belongs to a user. If you want to cache the collection of Page for a user as well as cache individual Page objects so that each Page can be retrieved from Cache directly. Then you need to write code like this:

public List<Page> GetPagesOfUserOldSchool(Guid userGuid)
{
    ICache cache = Services.Get<ICache>();
    bool isCacheStale = false;
    string cacheKey = CacheSetup.CacheKeys.PagesOfUser(userGuid);
    var cachedPages = cache.Get(cacheKey) as List<Page>;
    if (cachedPages != null)
    {
        var resultantPages = new List<Page>();
        // If each item in the collection is no longer in cache, invalidate the collection
        // and load again.
        foreach (Page cachedPage in cachedPages)
        {
            var individualPageInCache = cache.Get(CacheSetup.CacheKeys.PageId(cachedPage.ID)) as Page;
            if (null == individualPageInCache)
            {
                // Some item is missing in cache. So, the collection is stale. 
                isCacheStale = true;
            }
            else
            {
                resultantPages.Add(individualPageInCache);
            }
        }

        cachedPages = resultantPages;
    }

    if (isCacheStale)
    {
        // Collection not cached. Need to load collection from database and then cache it.
        var pagesOfUser = _database.GetList<Page, Guid>(...);
        pagesOfUser.Each(page =>
        {
            page.Detach();
            cache.Add(CacheSetup.CacheKeys.PageId(page.ID), page);
        });
        cache.Add(cacheKey, pagesOfUser);
        return pagesOfUser;
    }
    else
    {
        return cachedPages;
    }
}

Imagine writing this kind of code over and over again for each and every entity that you want to cache. This becomes a maintenace nightmare as your project grows.

Here’s how you could do it using AspectF:

public List<Page> GetPagesOfUser(Guid userGuid)
{
    return AspectF.Define
        .CacheList<Page, List<Page>>(Services.Get<ICache>(), 
                     CacheSetup.CacheKeys.PagesOfUser(userGuid), 
                     page => CacheSetup.CacheKeys.PageId(page.ID))
        .Return<List<Page>>(() => 
            _database.GetList<Page, Guid>(...).Select(p => p.Detach()).ToList());
}