Strategy

Not all websites are created equal: learning from the London Marathon let down

By Sven Hammar, Apica
Strategy
Published: 26 April 2017

Sunday was not a good day for e-commerce and e-giving, with eBay.co.uk reporting outages across large parts of the UK lasting over four hours, and the Virgin Money Giving website crashing the night before the 2017 London Marathon.

The crash, which was partially resolved but continued to affect site performance and donation submission well into the morning of the marathon, left thousands of people unable to make last-minute donations to runners’ fundraising pages.

While eBay.co.uk has not revealed the cause behind its prolonged outage, Virgin Money has apologised for its website crash, citing the website performance issues a ‘result of high demand.’

So when faced with running the official fundraising machine behind the biggest annual fundraising event on the planet (last year the event raised over £59.4m), how does an organisation make sure its website can stand the test of pre-race day and race day peak volume?

1. Traffic is not going to be the same every year. Check last year’s peak numbers, and gauge estimated growth – will it be 20, 50 or 100% growth? In most cases when sites crash, it’s because the growth factor has not been taken into consideration. The most important thing in website and app stress testing is to set and test for the expected maximum traffic plus an addition margin based on historic data.

For example, we know that the Virgin Money Giving site was able to facilitate over £23m in donations on the day of the 2016 London Marathon. This was a useful benchmark.

How many simultaneous users can your site handle, and how many does it need to handle? There are plenty of load testing tools on the market that can simulate serious peak environments quickly and easy. If your site fails when going through the load test, most of these tools will tell you exactly what went wrong.

2. Get real when it comes to use cases. As much as it would have been more manageable for site users to donate in a phased, orderly fashion, it was always going to be a case of the day before the race, the morning before the race, and the time during the race being peak times. During these peak times, sites must be able to handle the highest possible volume of users and to process their search and donation submission requirements.

3. Test your site during non-peak times, but make sure you take ads and any other graphics into account. Testing your production site can give you security and comfort before the real event. Doing load testing 3-4 months before the big race is a good idea to make sure that you have time to adjust any identified weakness in the system. When testing, make sure comparable graphic and other real-time race day elements are in place. Otherwise these graphic elements could contribute to an extra and unexpected total volume you weren’t expecting.

4. Determine what it takes for your site to crash. Modern transactional websites have various components, all of which must interact and which may begin to behave differently under heavy loads. Ideally, when faced with heavy loads, some components can ‘carry the load’ so as not to introduce bottlenecks. However, in some sites, when one or more components fail, the site crashes. It’s up to companies to determine the weakest links by through load testing, and to identify exactly the load levels that will cause the site to be slow and finally lead to a complete crash.

Note that being temporarily slow over maximum loads is usually okay as long as sites can bounce back again. Even when faced with above the odds loads, the worst case scenario should be that your site is unresponsive very temporarily rather than crashing completely, and that it is fully functional as soon as load is reduced.

5. Test and build for resilience. Having your website crash is bad enough, but having it come back at reduced capacity will only annoy your users even more. Part of understanding the working components behind your site is understanding how to get the site back up and running quickly after overload. In this case, it’s important to test and determine what you need to change to avoid a complete shutdown, and what you need to do in order to achieve a full recovery in a full load volume environment.

Lastly, if your site does crash during peak times, make sure you follow crisis PR protocol and admit to and apologise for the problem. To make up for the lack of site performance, Virgin Money Giving boosted all donations made to charities on 23 and 24 of April by 10 per cent – a very welcome gesture indeed.

Sven Hammar is the founder and Chief Strategy Officer of Apica