304 Your images from a database
I was reading somewhere about some anecdotal evidence that Google doesn't like to index images that don't have some kind of modification time on them. When I relaunched CoasterBuzz last year, I moved all of my coaster pr0n to the database, and I've since noticed that none of the images are in fact indexed. Bummer.
This also pointed out to me that I was doing something annoying. I was reading the data out every time for every image request. Not exactly the most efficient use of resources. Static files come down with information in the headers indicating when they were last modified (IIS, and presumably any other Web server does this), so the next time the browser makes the request, it the server compares the time in the request header with that of the file, and returns a 304 "not modified" response, and no file.
That seemed like an obvious thing to do, even if it has no impact on Google indexing. Fortunately, it just required some refactoring of the IHttpHandler I had doing the work.
Sidebar: This is probably the point at which some people will make a big stink about serving images out of a database, and how it's bad for performance or scalability. That's a fine argument to make, but outside of doing obviously stupid things, this is not an issue here. I'd prefer to address performance and scalability problems if I have them, not when I might have them, or never have them. Seriously, this is a site that does somewhere between a half-million and a million page views a month depending on the season. There are no performance issues here.
So anyway, assuming for a moment that "photo" is a business object in this code, and it was determined by a query value to the handler, this is the meaty part of the ProcessRequest() method of the handler:
if (!String.IsNullOrEmpty(context.Request.Headers["If-Modified-Since"]))
{
CultureInfo provider = CultureInfo.InvariantCulture;
var lastMod = DateTime.ParseExact(context.Request.Headers["If-Modified-Since"], "r", provider);
if (lastMod == photo.SubmitDate.AddMilliseconds(-photo.SubmitDate.Millisecond))
{
context.Response.StatusCode = 304;
context.Response.StatusDescription = "Not Modified";
return;
}
}
byte[] imageData = GetImageData(photo);
context.Response.OutputStream.Write(imageData, 0, imageData.Length);
context.Response.Cache.SetCacheability(HttpCacheability.Public);
var adjustedTime = DateTime.SpecifyKind(photo.SubmitDate, DateTimeKind.Utc);
context.Response.Cache.SetLastModified(adjustedTime);
Yes, it probably needs to be refactored, and yes, it should probably be used in an IHttpAsyncHandler. But let's go through what's happening, starting at the bottom.
The last few lines write out the actual bytes of the image (MIME type was set in previous code), then set the cacheability and the modification time of the image, which in my case is stored with the bits. The goofy part is where we create a new DateTime to make its kind known. If you don't explicitly state that it's a UTC time, the SetLastModified() method apparently adjusts it. I happen to store most times as UTC, so that was one less thing to worry about. This adds a header in the response called Last-Modified, and gives it a value that looks something like "Sun, 22 Jun 2003 16:27:19 GMT" (note that it truncates milliseconds and ticks, as you may expect).
Now, on subsequent requests for the same image, the browser adds an If-Modified-Since header to the request, with the same date and time as the value. Here we're checking to see if the value is present on the request, and if so, let's see if we should do a 304. If it's there, we parse it into a DateTime and compare the time with the one stored in the business object. We're stripping off the milliseconds because the database will fill them in on our DateTime, and the incoming request doesn't have the same high resolution. If we have a match, we send out the 304 and return, not sending any more data or reading the bytes from the database.
You can do this pretty easily in ASP.NET MVC as well.
public ActionResult Image(int id)
{
var image = _imageRepository.Get(id);
if (image == null)
throw new HttpException(404, "Image not found");
if (!String.IsNullOrEmpty(Request.Headers["If-Modified-Since"]))
{
CultureInfo provider = CultureInfo.InvariantCulture;
var lastMod = DateTime.ParseExact(Request.Headers["If-Modified-Since"], "r", provider).ToLocalTime();
if (lastMod == image.TimeStamp.AddMilliseconds(-image.TimeStamp.Millisecond))
{
Response.StatusCode = 304;
Response.StatusDescription = "Not Modified";
return Content(String.Empty);
}
}
var stream = new MemoryStream(image.GetImage());
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetLastModified(image.TimeStamp);
return File(stream, image.MimeType);
}
Let me start by saying that this was something I just prototyped. It's in dire need of refactoring, as much of the logic isn't stuff you'd normally put in a controller action. I think there's a method for returning nothing on the Controller base, but I don't remember off the top of my head. If there is, you'd use that instead of Content() in the 304 case.
I hope this helps someone out!