Dealing with digital disruption
- Creating a Resilient Organisation
- Resolving Critical Issues and Crises
- Creating a Compliant Organisation
- Creating a Secure Organisation
Dealing with digital disruption
When you read about “digital disruption”, the term almost invariably refers to the impact that new market entrants or emergent technologies are having on more traditional organisations and industries. This could take the form of emergent technologies, such as blockchain – the digital ledger in which transactions are recorded chronologically, publicly and securely, which by design makes them resistant to change or alteration –, or new technologies that are affecting activities traditionally completed by people, such as artificial intelligence for data analytics.
Almost every aspect of how organisations conduct their business is now dependent on some form of technology, with some being wholly dependent on their digital platforms for critical operations. In essence, innovation has created a major and, in some cases, all-encompassing single point of failure. In other words, there is no alternative to the activities being delivered through these businesses’ technology, so if the technology fails, the organisation fails. Organisations’ digital platforms are exposed to increasing threats, putting them in a situation where having an integrated process for dealing with digital disruption becomes essential.
Over the past ten years, organisations have become more aware of threats to their systems. Although this is an oversimplification, previously all they needed to worry about was operational failures in their technology, such as the loss of a data centre, hardware failure or human error. They addressed this by setting up backup systems to copy and store data offline elsewhere. In the event of a complete loss of technological capacity, a replica environment could be created and a backdated copy of the data loaded.
However, this solution meant that a fixed amount of data was lost and, in the event that backups were taken every day, the restored data would not include a single day’s transactions. If we assume that an organisation had a contracted recovery service to bring its systems back and load the data copy, this would mean that they could bring back yesterday’s data tomorrow, effectively losing 48 hours of data and processing capability.
As businesses’ dependency on technology increased, and the availability of improved solutions to replicate data became available, this old-school approach to providing a recovery capability based on a static data copy became obsolete. Organisations are now able to copy data in real time into a duplicated secondary environment. This removes the need for businesses to recover their systems and allows users to experience service continuity, meaning that companies can achieve seamless service delivery even in the event of a complete loss of their primary systems.
Although the technology environment is now protected from the technical loss of user-facing systems (the production environment) and the single point of failure mentioned earlier has been removed, this brave new world presents another, perhaps larger, problem.
Computer users became aware of viruses in the late 90s. These were a type of code originally written into software applications by the developers to ensure that they had some recourse in the event of licence fees not being paid. Developers could activate viruses remotely, disabling the application or system in question. These types of attacks, which include viruses, Trojans, worms, DDOS attacks and BOTNETs, all have one thing in common: they consist of malicious code that is specifically designed to cause harm or stop an application or system working.
Unlike in the old-school disaster recovery service, in the new enhanced service continuity model the introduction of malicious code into the production systems is replicated immediately into the secondary environment. This means that an organisation will be left with no IT capability or unaffected backups at all.
And it is not just deliberate or malicious acts that present this problem, as human error can have the same impact: a well-intentioned, but ultimately damaging, change to the production environment is immediately replicated into the secondary environment, ensuring that both are disabled. This can happen irrespective of whether the systems are delivered via the cloud or sitting in the organisation’s proprietary data centre, and constitutes a major single point of failure.
Let’s look at what happened to a high-profile financial institution. When a corrupted patch (or fix) was introduced into its production environment, this was immediately replicated in the secondary environment, leaving the organisation with no operable production system and no disaster recovery capability. This error took its core banking system offline, meaning that the bank’s customers had no access to electronic or telephone banking systems for weeks. This had a significant reputational and financial impact on the organisation.
Although it was accidental and more a failure of the change management process, rather than the result of malicious intent, this incident highlighted the problem with the immediate replication of data from production to the secondary environment.
This bring us to ransomware, which was made famous by the WannaCry cyber-attack that affected over 300,000 machines globally and significantly hit the UK’s National Health Service in 2017. Ransomware, which locks users out of their files or systems and only allows them to regain access through payment, continues to be the most popular form of malicious software, or malware.
Although organisations fall victim to random ransomware attacks, more recently we have seen that these attacks can be targeted. In one case, an attacker notified an organisation that they were inside their systems by emailing a time-stamped snapshot of one of the latter’s critical databases. Providing further proof of the breach, they changed some of the senior executives’ email addresses and installed a running clock and provided a crypto-currency wallet into which the ransom needed to be paid before the clock ran out.
Although the organisation had a disaster recovery procedure, the instant replication solution that it also had in place meant that the ransomware contagion had spread to that secondary environment as well. This left the business with few options and little time, making paying the ransom a more compelling proposition.
How should organisations’ protect their systems and data?
When the head of production at one of the world’s largest petrochemical companies was asked which of the organisation’s refineries was the most secure from an information security perspective, his answer was stark and simple: “the oldest”. When asked why it was that one, he said that that specific refinery’s refining process was not managed by computer systems, but by engineers and mechanical controls. This gave the production manager more confidence that the facility would remain secure, unlike the newer and more automated environments at other sites, whose industrial control systems could be manipulated by an attacker.
This reflects the growing trend in returning to old-school disaster recovery methods as a worst-case recovery solution. Organisations are reviewing their business continuity programmes to ensure that they are protected from cyber attacks or accidental outages. They are also reevaluating their organisational recovery point objectives for data, opting to take a clean, time-stamped and static copy of data and store it offline. (A recovery point objective is the age of the data you get back when your system is restored. For example, a 24-hour recovery point means that, if your systems failed today, when you got your systems back, yesterday’s data would be the latest data you could recover.)
The reasoning behind this is that, in the event of a cyber outage or ransomware attack, organisations can restore an old, clean version of their data. This ensures that, although some data will be lost due to the lag between when the old data was backed up and when the outage occurred, organisations can avoid the situations illustrated above, where they incurred significant costs, damage to their reputation or had little choice but to pay the ransom.
In the financial sector, this so-called third-data capability is becoming more established and, although organisations will still experience loss of data, they will have a disaster recovery option available based on a static copy of their data that would not exist in an instant replication environment.
How to deal with digital disruption
The key to a successful response is early recognition that you are experiencing an incident or attack: as a forward-looking measure, you should define the key risk indicators that will provide early warning in the event of an incident.
There have been recent cases during which IT teams have attempted to manage a breach without involving or informing their senior management or crisis management teams. In one example of a ransomware attack, the ransom was paid without the senior management team being informed. Unfortunately for the individual who paid the ransom, it soon became clear the data affected was not retrievable, at which point the crisis management team was made aware of the breach. This delay created significant problems in terms of managing the incident and maintaining customer confidence.
Plan for the worst
Ensure that you have an integrated incident response capability in place. This needs to involve your senior management team, your information security/IT team, your corporate communications team and your legal counsel. There are five key steps to an effective incident response:
- Rapid recognition of the incident – ensure that you have clearly identified trigger points and an escalation process to involve the appropriate stakeholders.
- Investigation and containment – ensure that your team can identify and control the incident.
- Threat eradication – once the threat has been identified, enhance controls and renew or update passwords and encryption keys or lock down access points.
- Recovery – restore data from a previously unaffected backup and initiate an appropriate crisis communications process.
- Resolution and lessons learned – learn from incidents and mistakes and improve your readiness and defenses.
All of this needs to be supported by a comprehensive communications plan with messaging tailored to your respective internal and external audiences.
Having an established capability in place that incorporates these elements will significantly reduce the impact of an attack and potentially provide you with a competitive advantage.
Be resilient: enhance your awareness and be prepared.