First, a little background #

One of the key ways in which we configure and control our caching is through determining the length of time that content should be considered ‘valid’. This time, often referred to as the time-to-live or TTL, determines how long that content will live in a user’s local cache and how long the CDN will keep the previously retrieved content instead of going back to your server for an updated copy. Picking the right amount of time can be a challenge. What you want to think about is “how long am I ok with users continuing to get this version of this resource”? If the content is ‘immutable’, which is to say that it never changes, then you should set the cache time very high (a year for example).

This post is one of a series about CDNs, starting with an overview of what they are and how they work.

This advice is often given for a site’s CSS and JavaScript, but it is hardly believable that you will never change it, so how does that work in practice? Caches, including CDNs, use the URL of content online such as https://mydomain.com/main.css to uniquely identify it, so one technique is to change that URL whenever your content updates. One version of your CSS could be referred to as https://mydomain.com/main.css?v=1, and when you update it, the URL changes to https://mydomain.com/main.css?v=2. Those two URLs are considered different pieces of content to a cache, so the user or the CDN will need to fetch the v=2 content as if it has never seen this content before. This method, changing URLs, can be done through a query string, filename or path (/main-v2.css, /2020-02-02/main.css, etc.), and is generally referred to as ‘cache busting’. Often, changing the URL of these resources, and updating all the places on your pages where you use it, is part of the build and deployment step for your site. Without that, it would be a tedious and error prone technique to update everything manually.

The dark side of a long cache time #

Ok, so long cache times are good (less need to come back to your origin, quicker performance for repeat visitors to your site), you can force new content to be used by changing the URL, and that’s generally built-in to the build/deploy process. What about URLs you cannot change? For example, what if the root HTML of your site (https://mydomain.com/index.html) generally only changes once a week, so you set the TTL for that page to be 7 days. That’s awesome most of the time, the page will be served by the CDN and your server will have very little load, but what if you publish out an embarrassing typo, or a blog post with a broken link? In the case of a page that doesn’t support caching or has a very short TTL, you can update that HTML and visitors a few minutes later will get the new version, embarrassing content removed. With a 7-day long cache, you can do the update as quickly as you want, but people could end up getting the old version for a full week. This is the type of problem that can get people fired.

Assuming you can’t change the URL (it’s your home page in this example, you can’t ask people to go to https://mydomain.com/newpage.html), what you need to do is to tell the CDN to please get rid of that old copy. This is known as ‘purging’ that content, and most CDNs will have a mechanism to do this through their web UI and/or through an API. In the single embarrassing case, the web UI is probably fine, but if this is a regular issue then the API could come in handy. So, all good, bad content gone? Well, not quite. When we setup our rules around caching, we are dealing with a couple of different aspects:

  • How long the CDN should cache our content
  • How long other caches, such as the user’s browser cache, should hold onto our content

In a simple setup, these two values are the same. We tell the CDN to ‘honor the origin cache headers’ (which means to use whatever cache settings we output as headers from our server), and we return headers like cache-control: max-age=604800. This tells the CDN it can use this version of our content for up to 7 days (the max-age value is in seconds). It is “up to 7 days” because the CDN might clear it out sooner if it isn’t getting a lot of requests; this sets the maximum time. Sticking with the idea of a simple setup, the CDN will generally pass that same header value (cache-control: max-age=604800) on in the response it sends to user browsers. This tells those browsers that they can hang on to this response for 7 days too. This is generally awesome, because if it is already cached at the browser, the user doesn’t have to make any request for it when the user comes back to the page. If only the CDN has it, we avoid the request back to our server, but the user still has to request it from the CDN. The issue though, is that we can ask the CDN to get rid of its copy, we cannot ask the user to do the same.

There are a few solutions to this, each with positive and negative aspects.

Short TTL on content that can be updated #

First, you could only set long cache times on URLs that you update whenever they change (css, javascript, anything you can generate a unique URL for as part of the update), and you set only a very short TTL on anything else. This is what we do on Docs, the HTML pages have a 10 minute max-age, while we set the max-age of our theme files (all of which have unique URLs generated when we update them) to around a year. So, content (HTML) changes, which happen frequently and could be time-sensitive, are going to show up for the user within approximately 10 minutes after they are published. This works, we avoid ever having a change out there that we can’t force to update, while still getting the benefit of long cache times on some of our content (the CSS/JS files). The downside is that we don’t get as much benefit on our HTML, even though most pages are only updated every few days at the most, we only set a 10 minute cache on them to avoid stale content being out there after an update.

Short client TTL with a long CDN cache and purging #

The second option takes advantage of the fact that we can configure the CDN with a different cache time than what is sent to the user’s browser. We tell the CDN to cache everything for a long time (a year or more), but for content that might change (HTML pages in the docs example) we send a much smaller cache time to the user’s browser (10 minutes). This means that browsers will still need to request updated content after ten minutes, but the CDN will hang onto that content for a long time, so our server still won’t get the request. When the content does change, we purge that specific piece of content from the CDN, and within ten minutes users will be seeing the update. This is how caching is handled here on duncanmackenzie.net, but in a bit of a brute force fashion. When I publish a new blog post, I would likely only need to update a few pages (the home page, maybe some pages that list posts on a specific topic), but I’ve set it up so that every update just purges all of the content at the CDN. Not a huge issue for me, since I only have a few 1000 pages, but it would be an issue for Docs that has many millions of pages. This technique gets you the most offload of traffic from your server (good for load and cost), but has the downside of being more complex and needing an automatic way to purge the appropriate pieces of content from the CDN when they are updated.