O2 failure and RBS/Natwest bank crisis highlights the reputational damage caused when IT fails

Andrew Sinclair, Head of Business Continuity Consultancy, Onyx Group, explains how these incidents demonstrate why risk needs to be addressed at Board level and managed from the top down to ensure any business continuity plan is effective.

This week has seen another calamitous I.T. failure, following possibly the worst and most visible there has been yet, we’re referring to the problems suffered by the RBS/NatWest Bank.  For 24 hours, O2 customers have been unable to make or receive calls or text messages after the network suffered a major failure across the country yesterday afternoon.

This comes after the RBS bank failure, where tens of thousands of their customers were unable to have confidence in the condition of their bank accounts. Surely this is the most fundamental and core service provided by any bank?    Aside from the immediate problems caused for both O2 and RBS customers, we have yet to fully understand the impact on customer confidence. It probably couldn’t have occurred at worse time for the banking industry.

Yet this failure follows previous very high profile I.T. problems. Remember back in October 2011 when RIM had an almost complete failure of services for Blackberry users across much of Europe?   Blackberry’s problem was subsequently found to have been caused by a failure of a relatively low cost component where as the RBS problems began when a software change went wrong, causing a failure in the automated system that processes payments overnight.

What do these events tell us about our reliance on IT that we didn’t already know?  Well perhaps it’s the realisation that we’ve now got systems where we don’t really understand the risks and impacts of failures. We’ve built services which span the globe from single locations. And when they go wrong, it’s spectacular and can seriously damage an organisation’s reputation.

There’s never been a greater need for directors, senior management and owners of organisations to stop and look again at all the links in the chain of how they use or rely on IT to provide services to their customers.  It’s nowhere near enough now to say: “It’s resilient, IT won’t fail”.  How can you be sure? How can your customers be sure? For once, the international standards might have an answer.

In May this year, the international standard ISO22301, which covers business continuity, was ratified and released. So there is a yardstick (or perhaps meterstick) which can be used to assess an organisation’s readiness to recover from something going wrong. The standard covers more than simply IT. It requires the organisation to be able to demonstrate that they have developed contingency plans and crucially to be able to prove that they have tested them.

Maybe our societal reliance on IT systems like banking systems ought to prompt legislation to ensure the operators of such systems really are taking their responsibilities seriously. Adhering to an international standard and being audited by an external business to prove that they comply with this standard is probably the best way forward for any business. Although the root of the problems suffered by RBS/NatWest, O2 and RIM are IT related, they compounded these problems with weak communications to their customers. Fortunately, this is another area addressed by the standard.

In addition to adopting the ISO 22301 standard, there are now many companies for who their core business is keeping other organisation’s IT systems operational. Much has been written about datacentres and cloud computing services. These change the risk profile of how IT is used to deliver a service. Ensuring that your organisation’s IT is being managed in a suitable environment is crucial.

Once you have IT systems hosted in professional datacentres or are using cloud services from a leading supplier, what can an organisation do next?  Test, Test, Test and Test again.  Run a continual programme of exercises involving everyone in the organisation until recovering from a major incident becomes something that your organisation has the ability to do – test until disaster recovery is embedded in the DNA of your organisation.

The position of risk manager needs to be reinforced and given teeth in all organisations. If a board of directors has previously paid lip service to risk management and risk reports surely the spectacular and damaging problems suffered by O2 and now by RBS/NatWest are enough to convince them.

We’re all relying in very complex systems these days. The humble ‘phone has come a long, long way since Alexander Graham Bell made his first call in 1876.  Nowadays we spend more time using the “phone” to check our email, to send instant messages, to check our Facebook or to tweet. It’s no longer a device just for talking to each other.

So the systems that allow this to happen have become ever more complex. And yet we have one of the largest suppliers of these services being unable to provide the very part of the service that their customers rely on most. How could this happen? Surely the supplier has a large and well tested Business Continuity and Recovery plan.  If they do, then it’s not working for their customers.

With today’s complex systems, it’s vital that they are all exhaustively tested and tested again.  If you can imagine it going wrong, then it should be tested to prove that your imagination is wrong.  There is no substitute that will give Directors confidence that their business will keep supplying the services their customers are paying for. The environment these complex systems are contained in is a vitally important part of the service. The datacentres and network links between them must be designed, built and tested to the highest possible levels.  It’s no comfort to customers if you have to turn around and say “We’re sorry, we didn’t think that could happen.”

I’d welcome all CEOs to ask just this simple question at their next board meeting: “How would we recover from a similar IT failure?” The very visible evidence is that the reputational damage done when IT systems, which support core services, fail just isn’t being thought through by the right people. Maybe the good to come from this latest calamity will be that all business continuity and recovery plans will now be taken as seriously as the finances of the company. Perhaps risk will finally be accepted at board level as it is evident by the RBS/Natwest and O2 incident that this is where responsibility lies if it is not managed adequately.

Comments are closed.