What the Fastly outage can teach us about writing error messages

In case you missed it, for about 15 minutes on June 8, 2021, Fastly's CDN had an outage, taking some of the internet's largest websites down (including the BBC, UK government, Reddit, and the New York Times - Amazon.com also had its CSS fail to load).

If you happened to visit those websites during Fastly's outage, you saw the relatively unhelpful error message below:

Unhelpful Fastly Error Message

As a frontend developer, my eyes scan error messages like these for numbers - in this case, the "503" - indicating that the error isn't my fault, and I can move on with my life.

Unfortunately the majority of internet users aren't trained in the art of reading HTTP status codes, so this error message wasn't particularly useful to them. Particularly when a solid portion of the error message was an in-joke.

We can write better error messages

The majority of internet users aren't developers, so just writing the error code and its name (503 Service Unavailable) just isn't good enough.

The Nielsen Norman Group (back in 1998!) provided us with some basic guiding principles for writing better error messages:

  • Write in plain English (or whichever language you're supporting)
  • Tell the user exactly what went wrong
  • Tell the user how the problem can be fixed

More concretely, we can write better error messages by answering the following four questions:

  1. Who caused the error?
  2. What happened, and why?
  3. When will it be fixed?
  4. How can the user respond to the error?

If your error message covers those four points, then you can think about adding humour and some brand identity.

Who caused the error?

The last thing you want to do is make your users feel dumb, or as though they're at fault for an issue with the service. Communicating who caused the error helps clear up any confusion.

Explicitly focus on "we" when the error is caused by an issue on your end (typically HTTP status codes in the 5xx range).

An error message such as

Our service is down for maintenance

is infinitely better than:

Uh-oh!

For errors caused by the user (typically HTTP status codes in the 4xx range), be explicit about that too. For example, a 403 Forbidden error (where you know the user isn't authorized to view content) could be communicated as:

Access Denied. You do not have permission to view this page.

What happened, and why?

While users may not be technical, they still need an explanation of why they're seeing your error screen.

Take the classic 404 error message: "404 Not Found". You can make it significantly better for non-technical users with a single word:

Page not found.

adding a "why", makes it even better, giving them a way to fix the issue:

Page not found. You might have mistyped the URL.

When will it be fixed?

It's relatively difficult to keep an error message updated with with details of your outage, and when you expect the service to become available again.

A better approach would be to link to either your status page, or Twitter account, or both, as in GitHub's case:

An example of GitHub's service error page

How can the user respond to the error?

In the case of a 404, you might want to list some steps the user can take to fix the issue, such as:

  • Going back to the home page
  • Using your search bar to find the page if it's been moved
  • Contacting support

Whereas in the case of a 5xx server error, you want to communicate to the user that there isn't much they can do, and that it's not their fault.

My favourite example of a company doing this well is Airbnb:

An example of Airbnb's service error page

They tell users:

  • there's definitely an issue, and they're working on it,
  • to check out their Twitter account for updates,
  • a way to get support for urgent issues,
  • and they set the expectation that they may be slow to respond while the site is experiencing downtime

Summary

We as developers should take the Fastly outage as an opportunity to improve our error messages. You never know when your witty in-joke might end up being seen by hundreds of millions of internet users, so just try to be helpful, and explain:

  1. Who caused the error?
  2. What happened, and why?
  3. When will it be fixed?
  4. How can the user respond to the error?

Of course, if you're running a CDN like Fastly is, you'll likely still need request IDs and other diagnostics in your error message to help your support staff debug the issue. A human-readable error message doesn't have to come at the expense of removing all technical information.

Do you and your team dread going on-call?

I send one email every month with an article like this one, to help you improve the time you spend on-call, and keeping your service online in general.

Lots of folks in charge of keeping their apps running like them, and I'd love to hear what you think as well. You can always unsubscribe.