Backup Collaboration Mobile Security Storage Strategy Virtualisation

DR: Test and test again

Article Type:          Published: 11-2016         Views: 1650   



A recent roundtable gathered business leaders to debate the findings of iland's 'State of IT Disaster Recovery Amongst UK Businesses' survey; Monica Brink, EMEA Marketing Director, summarises the conversation.

Outages happen more frequently than we think. In the month prior to the roundtable British Airways lost its check-in facility due to a largely unexplained 'IT glitch', ING Bank's regional data centre went offline due to a fire drill gone wrong (reports suggest that more than a million customers were affected), and Glasgow City Council lost its email for three days after a fire system blew in its data centre.

Our survey showed that 95% of companies surveyed had faced an IT outage in the past 12 months. We looked at some of the reasons for those outages and top of the list were system failure and human error. It is often not the headlines we see such as environmental threats, storms or terrorism that brings our systems down, but more day-to-day mundane issues. The group suggested that it was often at the application level that issues occur rather than the entire infrastructure being taken down.

We discussed the importance of managing expectations and how DR should be 'baked in' rather than seen as an add-on. Most businesses have a complex environment with legacy systems so can't really expect 100% availability all of the time. DR isn't about failing over an entire site any more; it's about pre-empting issues, for example testing and making sure that everything is going to work before you make changes to a system.

When we asked respondents about the impact of downtime and how catastrophic this was, 42% said near seconds would have a big impact. This statistic rose to nearly 70% when it came to minutes. The group's advice was that businesses really need to focus on recovery times when looking at a DR solution.

The roundtable discussed 'over-confidence' in DR solutions: the survey found that 58% had issues when failing over despite 40% being confident that their disaster recovery plans would work. Only 32% executed a failover and were confident and it all worked well. There appears to be a gap between believing your business is protected in a disaster and having that translate to a successful failover.

The bottom line is that DR strategies are prone to failure unless failover systems are thoroughly and robustly tested. Confidence in failover comes down to how often IT teams actually perform testing, and whether they are testing the aspects that are really important, such as at the application level. Equally are they testing network access, performance, security and so on? If testing only takes place once a year or once every few years then how confident can organisations be?

The group agreed that the complex web of interlocking IT systems is one of the biggest inhibitors to successful testing. While testing may be conducted on one part of a system in isolation, if that fell over this can often trigger a chain of events in other systems that the organisation wouldn't be able to control.

There appears to be an intrinsic disconnect between what management wants to hear in terms of DR recovery times and what management wants to spend.

In conclusion, we discussed the need to balance downtime versus cost: nobody has an unlimited budget. Many of the issues raised in the survey can be traced directly back to simply not testing enough or not doing enough high quality testing. The overall advice that iland recommends from the survey is to test, test and test again.
More info:

Like this article? Click here to get the Newsletter and Magazine Free!

Email The Editor!         OR         Forward ArticleGo Top

PREVIOUS ARTICLE