At Monetate, we serve a lot of content. To handle it all, our pipeline for processing requests has to be fast and therefore it is difficult to introduce any time-consuming processing directly into the response-handling path. Some lengthy processes can be triggered offline (for example, long-running reports), or can be cached in various ways (via Akamai Edge caching or local nginx content caching). This deals with the bulk of the problem.

But some content is dynamic. For example, Monetate personalized email campaigns can generate targeted images for users. Handling requests for this sort of content can take advantage of some of the traditional caching methods mentioned above but there are situations in which contentgeneration might take far too long — 2 seconds, or even longer — to be able to handle inline with normal request processing. The extra time could be spent inside lengthy image-processing routines or spent bottlenecked by the response times of other third-party servers. To keep the user experience uniform and speedy we need a better way of handling these sorts of delays.

Cache Control Extensions

RFC 5861 describes two HTTP cache-control extensions for stale content. We are most interested in stale-while-revalidate which allows us to quickly continue serving (slightly) stale content while simultaneously triggering a background refresh on the server. In exchange for tolerating slightly out of date content on a small percentage of requests we get incredibly fast response times on all requests. Subsequent requests for the same content will begin responding with fresher data once background processing completes. Clients implementing this extension can help “hide” any requests that occur during the “stale” time period (and still show the older content) but enable the server the chance to kick off any backend processing that might be necessary to generate fresher content.

Server-Side Stale-While-Revalidate

We can’t rely on robust client support though. Especially not in the email world. A similar strategy can be implemented on the server regardless of the client’s caching behavior in order to get the same benefit. Monetate uses CherryPy extensively for serving various types of content and we can implement this pattern quite readily with some straightforward Python. Let’s start with a simple server that responds to requests with an image file. In our example let’s assume the processing time required to generate the image is a big bottleneck which taked several seconds:

All requests to http://localhost:8010 will hang for three seconds due to the simulated bottleneck. In order to serve stale content from the cache we simply need to expand the request handling to return cache headers and only deal with the expensive processing bottleneck when the TTL for the image has expired. Some synchronization must be added (a semaphore from the Python threading module) in order to make sure only one request handler is responsible for refreshing the cache when it becomes stale. This avoids a potential stampede of requests when the TTL expires.

A few module-level variables are defined above, including TIMEOUT and STALE_WHILE_REVALIDATE_TIMEOUT. After TIMEOUT and before STALE_WHILE_REVALIDATE_TIMEOUT is the period during which requests will still be served from the cache but will also trigger background processing to refresh the cache. Clients will see slightly stale content during this timeframe. Once background processing completes, fresher content will begin being served.

The image_processor function handles the background processing. Again we’re using an artificial bottleneck time.sleep(3). In real application code, this function could invoke other services, handle complex database processing, or do other lengthy tasks. Before this function is invoked SEMA must first be aquired to ensure only one thread is performing the work. Other threads will continue to serve stale content in the meantime. Once the work has been completed, two memcache keys are set with the appropriate timeouts. These keys are used by the server to handle requests:

This CherryPy server implements stale-while-revalidate. On the very first request both the primary cache and the secondary (stale) cache are empty so the server is forced to invoke the time-consuming code to create the resource. Subsequent requests will return the cached value immediately. Once the primary cache has expired the server continues to return the stale image while at the same time launching the time-consuming routine to refresh the cache in the background. Since a semaphore is used only one of the CherryPy threads will be busy doing this work while the other threads continue serving stale content as quickly as possible. When the time-consuming routine completes the cache will be refreshed and more up-to-date content will begin to be served.

Shortcut, with 202 Accepted

In the example above only the very first request suffers a long wait time while every other response is immediately served from the cache. To prevent any client requests from having to wait (even the first one) there is another possible approach. When the cache is empty we can quickly return a different HTTP status while the work is performed in the background by the server. For example:

The only important difference here is that no client ever waits for a response. The very first request (when the cache is empty) is handled using 202 Accepted to indicate processing has yet to be completed. It is often useful to include a ‘Location’ header field indicating where the image can be found later or where clients can monitor the progress of the background task. Using 202 Accepted may not make much sense to an end-user in their browser. However if this is an internal service this would be a well-formed and fast way to warm the cache.

When Speed Is Paramount

If the priority is to keep response times low and it is also acceptable to serve slightly stale content a small percentage of the time then implementing stale-while-revalidate on the server is a great method for reducing latency for clients while still handling large and time-consuming backend processes.