10.03.2016 : Richard Donnelly

Turning black swans white

If the truly unexpected happens, the backup is to make sure you can satisfactorily and efficiently deal with the consequences through emergency preparedness, contingency, business continuity planning, and succession planning.

In New Zealand’s new environment of increased accountability for worker safety and system security, what keeps you and your directors awake at night? It is probably not BAU risks, but rather the things you don’t know about; the hidden vulnerabilities in your infrastructure, systems and practices that expose you to the possibility (however unlikely) of disruption, major incident or loss of life.

So how can you identify and prudently address such vulnerabilities before they turn into nasty surprises?

Over the last 18 months the New Zealand government has made clear its expectations regarding the prudent management of security, assets, and health and safety. There are a number of legislative and regulatory changes that are now in effect, or are about to be. These include:

  • The NZ Protective Security Requirements (PSR), December 2014, which sets out the Government’s expectations for risk-based approaches to managing the security of personnel, physical premises and information.
  • Cabinet Circular CO(15)5, Investment Management and Asset Performance in the State Services (July 2015), which sets out the Government’s expectations for evidence-based investment decisions and asset management in the State Sector.
  • The Health and Safety at Work Act (HSWA), coming into effect from April 2016, imposes a duty of care on all organisations to ensure, so far as is reasonably practicable, that the health and safety of workers and others is not put at risk by the activities of the organisation.

The PSR and CO(15)5 are mandatory for State Sector agencies, while the HSWA Act applies to all organisations in New Zealand. These new requirements span quite different domains for quite different purposes, but they share a common theme: increased accountability and transparency, and an emphasis on demonstrable, evidence-based decision-making and risk management.

When incidents happen, and especially when they involve loss of life, investigators and regulators will be asking the following questions:

  • Was it foreseeable?
  • What should you have known as a prudent operator?
  • Was there something more you could reasonably have done to eliminate or minimise the risk?

Simply having a scenario listed on a risk register will not be sufficient to demonstrate prudent management. You will need to be able to demonstrate that you knew and did what was reasonably practicable.<

So in this new environment, it will probably not be BAU risks that will keep you awake at night. Rather, it is the things you don’t know about. It is thus becoming increasingly important to understand and evaluate the hidden vulnerabilities in your infrastructure, systems and practices which expose you to the possibility (however unlikely) of significant financial loss, disruption, major incident or loss of life.

In the aftermath of the 2011 Japan Earthquake, the Japanese automotive industry lost billions of dollars after its production capacity was almost cut in half. This disruption was not due to damage to the car maker’s plants, but rather because the main plant of Renesas Electronics – which manufactured around 40% the world’s supply of automotive microcontrollers – was severely damaged. This single point of failure in the automotive supply chain had not been identified prior to the event. While manufacturers such as Toyota had attempted to mitigate disruption risk by multi-sourcing with Tier 1 suppliers, they hadn’t realised that many Tier 1 suppliers shared some or all of their Tier 2 suppliers, including Renesas Electronics. As it turns out, industry surveys on supply chain disruption have shown that half of supply chain disruptions originate below Tier 1.

Where are the single points of failure that could disrupt your operations or threaten the continuity of your business?

On 2 September 2006, a Royal Air Force Nimrod suffered an in-flight fire and subsequently crashed in Kandahar, Afghanistan, killing all fourteen crew members on board. The fire broke out in the bomb bay after a mid-air refuelling and is believed to have been caused by leaking fuel coming into contact with a high temperature duct. The investigation into the crash identified that a number of previous incidents and warning signs had occurred on other aircraft, but never been recognised as risk indicators. The investigation revealed serious flaws in the Nimrod safety case and the Royal Air Force’s safety case process.

Are you or your people vulnerable because your systems and practices don’t map and manage seemingly disparate risks?

Black Swan events
The Nimrod crash and the disruption to the Japanese automotive industry after the Fukushima earthquake are examples of High Impact, Low Probability (HILP) events, also sometimes referred to as Black Swan events. The term “Black Swan event” is taken from the title of a 2007 book by Nicholas Taleb and refers to the surprising discovery of black swans in Australia in 1697. That discovery turned the conventional knowledge of the time on its head: black swans were presumed not to exist as they were not known in Europe before that.

Events are regarded as Black Swan if they have the following characteristics:

  • Are a surprise,
  • Have a major impact, and
  • Appear predictable with hindsight.

If you don’t find them first, critical vulnerabilities in your infrastructure, systems and practices may be revealed in the form of Black Swan events. They will be surprising, they will have a major impact on your business or its people, and the judgement after the fact may be that you should have known and done something about them

So how can you find and address such vulnerabilities? And how can you do this without unnecessarily gold-plating your infrastructure or diverting investment required to maintain current service levels?

We hope the following guidelines will help.

Recommended steps to address HILP risks:
Step 1: Identify your mission critical assets and business functions

Undertake an impact assessment to evaluate the impact on your business (and on others) if your important assets and business functions become compromised. Your critical assets and functions are those for which the consequences of failure would be serious enough to justify avoidance measures, assuming they are practicable. These should be given higher risk management priority than non-critical assets and functions, even if the probability of occurrence is low.

When evaluating the criticality of assets and functions, focus on the severity of the potential consequences of failure or disruption. Ask yourself, how serious would the impacts be if…? Don’t worry about likelihood at this point.

In complex organisations, systems and networks, consider spatial (where) and temporal (when) variations as well as personnel (who):

  • Where: some locations will be more critical than others, for instance those which house essential command and control functions or for which there is no backup or alternative means of supply.
  • When: certain times may be more critical than others, for instance when a substation is running at peak capacity.
  • Who: the loss or departure of key personnel can be more damaging than failure of a critical physical asset, and take longer to recover from.

And remember to keep the criticality assessment simple. You don’t need to evaluate every possible consequence. The objective is to highlight those things you need to take a more detailed look at, so limit the assessment criteria to the minimum needed to achieve this (ideally no more than 2 – 3).

Step 2: Identify how those assets or business functions might be compromised

Key sources of risk include plant or equipment failure, operational error or accident, logistics disruption (e.g. failure of a key supply or supplier), situational or seasonal hazards (e.g. severe weather), security threats, and loss or departure of key staff.

When identifying risks, it is not necessary to quantify the actual likelihood of events. When it comes to high impact, low probability events we are dealing with uncertainty more than true risk. The test should be whether there is a credible mechanism for an event occurring regardless of whether such a thing is rare or has never occurred before.

Step 3: Consider whether you have done everything practicable to eliminate, prevent or mitigate the risks

Options include renewing or replacing assets which are in poor condition, improving operational procedures or operating environments to reduce the likelihood of error or accident, redesigning systems and networks to improve resilience or provide diversity, or altering maintenance strategies to improve reliability. When considering existing arrangements, look for evidence that they are effective and test assumptions. For instance, do your procedures for testing the operation of critical backup equipment also include a test of the duty/standby switch-over mechanisms?

The notion of reducing risk so far as is reasonably practicable has been introduced by the Health and Safety at Work Act (HSWA), which comes into effect in April 2016. This approach shifts the risk acceptance criterion from one of tolerability to a test of whether there is anything more that can practicably be done to further reduce the risk. In the aftermath of an incident involving loss of life, this is the test that will be applied by the courts. It is a prudent test to apply when considering HILP events, even where safety is not an immediate concern.

Step 4: Develop and implement risk reduction strategies

When evaluating the justification for further risk reduction, consider whether the effort and cost of implementation is grossly disproportionate to the risk; where it is not, further risk reduction may be prudent. This is a vital step in the risk management process. To do it well, decision makers will need clear visibility of the risks and an agreed basis for evaluating “grossly disproportionate”.

When developing risk reduction strategies, an important factor to consider is how critical customers might be affected. These are the customers which are particularly sensitive to a disruption of your services or supply (e.g. medically dependent customers for which loss of power supply could be life threatening). If you know who these customers are and why they are critical then you can develop appropriate risk and relationship management strategies for those customers.

Step 5: When you can’t anticipate the risk, prepare for the consequences

While you can do your best to identify every HILP risk, you are limited by known industry experience and your own world view. If the truly unexpected happens, the backup is to make sure you can satisfactorily and efficiently deal with the consequences through emergency preparedness, contingency and business continuity planning, and succession planning.

Conclusion
The New Zealand government has communicated its expectations for prudency in the management of system security and worker safety. There is a growing emphasis on demonstrable, evidence-based risk management and decision-making, both in the government sector and more widely. Risks system security and worker safety are expected to be eliminated or minimised so far as is reasonably practicable, and risk-based asset management practices adopted.

Key questions to ask yourself are:

  • What should I know about as a prudent operator?
  • What are my critical assets and systems and practices?
  • What would happen if they failed?
  • Have I done enough to control and manage the risk?
  • Have I done enough to prepare for the consequences of truly unpredictable events?

A systematic risk identification, exploration and mitigation process, as outlined above, can help you achieve this.

About the Author

Richard Donnelly

Associate - Asset Performance

Richard is an Asset Performance and Risk Management Consultant. He holds a PhD on Enterprise Risk Management and has a background in asset management, risk management and management systems development. He is currently advising clients across a number of sectors on asset risk profiling, enterprise risk management and safety case management.

Ignite Your Thinking

What Do You Think?

ADD A COMMENT
Matt Ensor · 10/03/2016 11:54:46 a.m.
Thanks Richard and Eric, this is excellent. There's good evidence to show that level of diversity in the leadership team is also a key factor in how well organisations cope when Black Swan events occur. A homogenous executive will typically over-rely on their own experience of what's worked in the past.