Sep 032017
 

Today I’ll take a moment to expound on how web development has changed over the last two decades. Long ago, when we started back in the 90s, connections were slow and web pages didn’t change much.

In the design of the internet itself is something you should be familiar with if you are reading this post: browser caching.

Let’s do a quick recap to get ourselves oriented: When a browser gets a page from a server, the two talk to one another using special messages sent up-front, known as ‘headers.’ The headers the client (browser) sends to the server are called the ‘request headers’ and the ones the server responds with are called, appropriately, ‘response headers.’

Contained in the response headers from the server is a special one, the Cache-control header. This tells the browser whether it should hang onto a cached (saved) version of the response.

(Cache, which in english means ‘a hiding place’ or a ‘small storage place for quick retrieval,’ has come to been officially recognized as a computer term, having its own entry that derives from this original english meaning. The cache in a browser memory is a storage place of content retrieved previously, so that the browser doesn’t have to fetch it again.)

A normal cache-control response might be one of:

Cache-Control: max-age=0, must-revalidate

or

Cache-Control: max-age=86400

In the former scenario, the server is telling the client not to hang onto this page and re-ask the server again the next time the user visits this page. In the latter, it is telling the server the response is good for 1 day (86400 seconds), so if the user comes back less than 1 day later, the content does not need to be re-fetched.

You’ll find a great discussion of the specific difference on this Stack Overflow post

In the early days of the web when connections were slower and pages changed less frequently, it made sense to tell the browsers they didn’t need to reload the pages if they had a cached version.

In theory, great. But two decades later, this is actually no longer true. In most apps, and particularly companies with a marketing-driving agenda, modern Rails apps should have page responses that are not cached.

The chart below explains some even further nuance to why this is 90% of the time what you want to do. Go ahead and cache your images, your CSS and Javascript (Sprockets does this for you beautifully with a thumbprint to obviate the need to worry about cache control headers. On all of those Sprockets-generated assets, go ahead and cache. We have ours set to 1 year.)

But your Rails app is special and needs special concerns.

It’s important to understand I’m talking about two nuanced things: (1) To cache or not to cache (more details below), and (2) the defaults are wrong.

You’ll need to figure out the answer to question #1 on your own. The table below offers a basic guide to how to start to ask the right questions.

Remember too this decision can be made on a controller-by-controller (even action-by-action) basis, although this is tedious, a hybrid public facing/private facing app could use a hybrid caching strategy, especially if one area of the app relied on forms and another didn’t

The Problematic Default Cache-Control Headers

First of all, let’s discuss the default cache control headers for Rails:

Cache-Control: max-age=0, private, must-revalidate

I checked brand new test apps created with Rails 4.1, 4.2, 5.0, and 5.1 and this has been the default cache-control headers for at least these versions of Rails.

In theory, this is telling the client it shouldn’t cache the pages, and that it must-revalidate (that is, re-ask the server) for at least the headers of the document again when it wants it again.

Problem is, these default headers cause a problem on apps with forms (CSRF tokens), which is nearly all Rails apps. The user must request the token (CSRF) first before being allowed to submit the form.

This token, when used past its expiration date, forms a mismatched between what the client sends and what the server has in its session information. As a result,
Rails will raise an exception ActionController::InvalidAuthenticityToken.

Newer clients particularly observed in IE, Edge, Android Chrome, and mobile clients, appear not to respect the max-age=0 and must-revalidate directives at all, and seem to pass expired tokens from previously fetched pages, even with these cache-control defaults.

Your session expiry is up to you, but whatever you set it at, if any page that contains with the default Rails Cache-control headers, newer clients (notably, Android Chrome browsers, some recent versions of desktop Chrome, IE 11 and new Edge) hang onto the CSRF token across visits in such a way that old, invalid ones are submitted to the server. I documented this CSRF issue that appears to affect newer browsers here.

So, once you’ve decided if you’re going to cache at all (and which controllers you want to cache and not cache), you actually need to fix Rails defaults for the non-cached responses. (To do this globally, see the bottom of the this post. Modify the code below to do it on a specific controller.)

If you take the hybrid approach and have some responses non-cached and other with caching, you’re also going to be implementing a caching stagey (that is, non-zero responses) anyway, so you need to set the cache-control headers anyway (in the non-cached case, to a max-age that is greater than 0).

If you have been following along, this means that the default Rails headers are useless – you never want them! – because of the known-issue with the clients above, and if you are implementing a caching strategy, you’re overriding the defaults anyway.

Marketing-Driven Company?

If you work in a company with a marketing department, or any department responsible for pushing out new content, I guarantee probably don’t want to cache your pages. Yes, you can cache pages at URLs where the content of the page doesn’t change.

These days, unlike 1997, if a visitor is coming back to your website, it’s probably because they are coming to see something new. Think about it, when was the last time you visited a site and visited it a week later just to look at the same content? Doesn’t make sense.

To Cache Or Not To Cache

When… You should…
POST and PUT requests These are never cached by design. You don’t even have to worry about these, the browser just automatically knows that responses to non-idempodent requests are not to be cached.
GET requests that contain a form token (like a CSRF token) You probably want to be no-cache, because of the CSRF bug described in this post.
GET requests that contain a API token in the response You probably want this to be no-cache, because of the CSRF bug described in this post.
Your company has ‘marketing department,’ and puts up new content onto the home page frequently. You probably don’t need or want caching, because if someone is coming back to your website again they have come to see new content
You are a news website and your news articles have unique URLs. Here, I could see good arguments for caching. For one thing, caching does improve proxy server performance, as well as is needed by CDNs.
The content you are delivering is actually accessed behind a CDN. Here, you have to cache, because if you set your responses to all no-cache, your CDN will probably not cache it either (unless your CDN ignores your cache-control headers, which is possible but makes for a more confusing CDN strategy.)
Your JS, images, CSS, and Fonts as compiled by Sprockets (asset pipeline) Yes, please, cache this. 

Would the Real Cache-Busting Headers Please Stand Up

To properly no-cache your pages, do not use the Rails defaults.

You actually need these four markers in your cache-control header, and even then you need another header too.

Cache-Control: no-cache, no-store, max-age=0, must-revalidate

As well, even though this is supposedly a deprecated header, I’ve read some newer clients need this header to force the page to be fetched again from the server:

Pragma: no-cache

Here’s actually how to do it in Rails:

Add this to your application_controller.rb :

 before_action :set_cache_headers
 def set_cache_headers
  response.headers[“Cache-Control”] = “no-cache, no-store, max-age=0, must-revalidate”
  response.headers[“Pragma”] = “no-cache”
  response.headers[“Expires”] = “Mon, 01 Jan 1990 00:00:00 GMT”
 end

If you want to do restricted no-caching, that is, apply no-cache to only some controllers (like login pages, or pages with forms), you’ll want to modify this to call set_cache_headers only on those forms and a different method one that sets a max-age and allows caching, on different controllers. (If that’s you, I’d suggest two methods: rename the one above to set_no_cache_headers and another one set_cache_header to use allow caching.)

 Leave a Reply