Thursday, April 27, 2017

Where does "Resilience" rank in NERC CIP?

The NERC CIP discussion has an oddly blank spot when it comes to the Reliability discussion. Reliability seems to mean "always available with no outage" to most people.

I'll agree, that's a good goal for the Bulk Electric System. It's just not a very achievable goal for every single Cyber Asset in the BES.

In the world of Disaster Preparedness/Disaster Response, IT security, Business Continuity Planning, and the remainder of the Critical Infrastructure sectors, the main conversation is around Resilience, not perfection. 

We know bad guys will win sometimes. We know mistakes will be made sometimes. It's important to be able to recovery quickly with minimal residual damage when these things happen. We have to look at Risk Management in terms of 5 strategies:


  • Avoidance
  • Transference
  • Reduction
  • Mitigation
  • Acceptance


    Avoiding risk is difficult when you're a large, stationary, tempting target. Transference is not practical for Critical Infrastructure- there's nobody outside the system to insure you against damage.

    So we try to reduce risk by reducing or eliminating vulnerabilities We can't do anything about reducing threats. Many of our threats are either criminal elements or nation-state actors. these threats are for law enforcement and military force to deter. We're stuck trying to close off attack vectors (vulnerabilities to particular types of threats). We try to reduce the probability of exploit by narrowing the windows of opportunity for threat actors to exploit those vulnerabilities with things like timely updates of security patches, malware signatures, and periodic tests of security mechanisms and logging. The problem is we can do a good job of whack-a-mole and still fail on a zero-day exploit (which is a combination of a threat and a vulnerability) that either didn't exist yesterday, or was unknown to us.

    FERC has told Responsible Entities that we're not allowed to accept risk on our own, because our entanglement in the Bulk Electric System means we'd be accepting risk for the whole system, not just ourselves. The Regions who audit us have permission to accept risk on our behalf but no incentive to do so because they bear none of the cost of mitigation for those risks, and would be exposed to criticism and worse if it came to light that a risk deemed "acceptable" had been exploited.

    So, mitigation.  It sort of overlaps with risk reduction in common conversation; people often use the term mitigation to mean reducing vulnerabilities. This isn't really precise from a Risk Manager's perspective. That's actually risk reduction. Mitigation is more about controlling the amount of damage that can be done if an exploit is successful.

    One way of limiting the damage is having a means to quickly restore vital capabilities. Another way is to have a means of rapidly addressing vulnerabilities once they are exploited. And we can also spread our eggs out into different baskets so any particular attack can only get to some of them. Anything that limits the scope, impact, or duration of damage is a mitigation strategy.

    Mitigation plans are the heart of Resilience. When Reliability efforts reach the point of diminishing returns, we need to start talking about contingencies, and that means Resilience. How many times does this concept appear in the NERC CIP standards? Without losing the emphasis on risk reduction, we need to start including resilience strategies in our planning.

    No comments:

    Post a Comment