Friday, May 5, 2017

The problem with EACMS

The NERC CIP Glossary is foundational to the (let us not forget "mandatory and enforceable") CIP Standards

One of the terms defined there is Electronic Access Control or Monitoring System (EACMS):

"Cyber Assets that perform electronic access control or electronic access monitoring of the Electronic Security Perimeter(s) or BES Cyber Systems. This includes Intermediate Systems."

Disregard the last sentence for a moment. There are a few examples of throw-away statements like this added in NERC CIP for convenience, or exceptions where no one could think of wording that would be universally applicable.

Focus on the meat of the definition: it's an access control and access monitoring system. Outside of NERC CIP Standards, this is generally known as AAA: Authentication, Authorization, and Accounting, which actually captures the steps involved in granting and monitoring access much better than the EACMS definitions. There is also tons of guidance and information on implementing AAA in the broader IT Security realm.

But ignore that for a moment too.

The real problem with NERC CIP Standards and the applicable systems that they list, is that some systems that fall into the EACMS category (plus a number which don't) actually pose a much more significant risk than simple access management. They actually perform configuration management via "service accounts" with elevated privileges.

So for example, an Active Directory system is an EACMS, even though it not only controls access, it also controls configurations. But there is no requirement in NERC CIP to monitor the configuration changes made during a session, only the access to the system. Specifically, the failed and successful login attempts.

SCCM is not specifically an EACMS, even though it has an agent installed on Windows devices, and has an elevated privilege service account with Domain Administrator equivalent permissions. But it doesn't control or monitor access attempts.

There is no requirement to protect these systems any differently than any other user-accessible system, like perhaps a data portal on a web server. There is no requirement to separate user or system access within an ESP based upon roles, or impact levels (another problem with NERC IP is impact level is based upon the facility's physical impact on the Bulk Electric System, not the Cyber System's impact on operations). Once you're in, you're in and there is no requirement and not even any explicit security objective to do more than guard the perimeter. As I've mentioned before, this is the "hard crunchy shell, soft gooey center" model from 20 years ago.

On top of that, there's an exemption to remote access requirements for machine-to-machine communications. A management system located outside the Electronic Security Perimeter isn't even required to have encryption, and has no special requirements above and beyond the simple baseline, change management, and logging requirements applied to any system used to support the BES.

Weird.

Monday, May 1, 2017

P,P, & T

Security is people, processes, and technology. Of these, there's a reason why technology is listed last.

Gadgets simply don't suffice without people driving them in a consistent, smart manner.

I bring this up because of the focus on device-level controls, measures, and impact criteria in NERC CIP. It doesn't get much more technology-oriented than expecting a security solution to be all-in-one on a particular box, rather than based upon a combination of technical controls dispersed across the network, in combination with process controls and people at the helm monitoring.

Friday, April 28, 2017

GAAP... GASP...

Tom Alrich says that, at the RF CIP workshop Lew Folkerth pointed out that:
"the key to being able to audit non-prescriptive requirements is for the entity to have to demonstrate that the measures they took were effective."

A lesson can be taken here from the principles of GAAP: "Generally Accepted Accounting Practices". 

Paraphrasing GAAP: There is no absolute, perfect accounting during an audit. There are only sliding scales of better and worse practices. You must depart from [the accepted practice] if following it would lead to a material [mistake]. In the departure you must disclose, if practical, the reasons why compliance with the principle would result in a [mistake].

I know that the electric utility industry doesn't like outside influence, and has a severe "not-invented-here" allergy. But we really do need to move toward "Generally Accepted Security Practices" (GASP, to flippantly coin an acronym.)

So the take-away is that an entity should be able to demonstrate either that the novel approach they took is effective via testing (such as penetration testing), OR that it is widely "accepted" as a security best practice. Making every entity extensively, intrusively pen test every single product, software update, and configuration is counterproductive, and outside the core competency of the entity. Vendors and security firms test these things and make recommendations. Other than due diligence in researching a solution and verifying the provenance and integrity of software or firmware to be installed, an entity shouldn't have any particular obligation to "prove" perfect security of a commercial software offering or hardware device, because it's a distraction from the real issue.

The real issue is providing overlapping defenses in depth.

Thursday, April 27, 2017

Where does "Resilience" rank in NERC CIP?

The NERC CIP discussion has an oddly blank spot when it comes to the Reliability discussion. Reliability seems to mean "always available with no outage" to most people.

I'll agree, that's a good goal for the Bulk Electric System. It's just not a very achievable goal for every single Cyber Asset in the BES.

In the world of Disaster Preparedness/Disaster Response, IT security, Business Continuity Planning, and the remainder of the Critical Infrastructure sectors, the main conversation is around Resilience, not perfection. 

We know bad guys will win sometimes. We know mistakes will be made sometimes. It's important to be able to recovery quickly with minimal residual damage when these things happen. We have to look at Risk Management in terms of 5 strategies:


  • Avoidance
  • Transference
  • Reduction
  • Mitigation
  • Acceptance


    Avoiding risk is difficult when you're a large, stationary, tempting target. Transference is not practical for Critical Infrastructure- there's nobody outside the system to insure you against damage.

    So we try to reduce risk by reducing or eliminating vulnerabilities We can't do anything about reducing threats. Many of our threats are either criminal elements or nation-state actors. these threats are for law enforcement and military force to deter. We're stuck trying to close off attack vectors (vulnerabilities to particular types of threats). We try to reduce the probability of exploit by narrowing the windows of opportunity for threat actors to exploit those vulnerabilities with things like timely updates of security patches, malware signatures, and periodic tests of security mechanisms and logging. The problem is we can do a good job of whack-a-mole and still fail on a zero-day exploit (which is a combination of a threat and a vulnerability) that either didn't exist yesterday, or was unknown to us.

    FERC has told Responsible Entities that we're not allowed to accept risk on our own, because our entanglement in the Bulk Electric System means we'd be accepting risk for the whole system, not just ourselves. The Regions who audit us have permission to accept risk on our behalf but no incentive to do so because they bear none of the cost of mitigation for those risks, and would be exposed to criticism and worse if it came to light that a risk deemed "acceptable" had been exploited.

    So, mitigation.  It sort of overlaps with risk reduction in common conversation; people often use the term mitigation to mean reducing vulnerabilities. This isn't really precise from a Risk Manager's perspective. That's actually risk reduction. Mitigation is more about controlling the amount of damage that can be done if an exploit is successful.

    One way of limiting the damage is having a means to quickly restore vital capabilities. Another way is to have a means of rapidly addressing vulnerabilities once they are exploited. And we can also spread our eggs out into different baskets so any particular attack can only get to some of them. Anything that limits the scope, impact, or duration of damage is a mitigation strategy.

    Mitigation plans are the heart of Resilience. When Reliability efforts reach the point of diminishing returns, we need to start talking about contingencies, and that means Resilience. How many times does this concept appear in the NERC CIP standards? Without losing the emphasis on risk reduction, we need to start including resilience strategies in our planning.

    Wednesday, April 26, 2017

    Rapidly evolving threats & slow-moving regulatory standards

    Over at the Anfield Group Blog, Chris Humphreys posts: 

    The DOE’ s Quadrennial Energy Review Report states that:
    “The current cybersecurity landscape is characterized by rapidly evolving threats and vulnerabilities juxtaposed against the slower-moving prioritization and deployment of defense measures.” I lump regulatory standards and requirements into the “slower-moving prioritization and deployment of defense measures” as one of the key components to preventing a truly proactive stance on cybersecurity. Additional focus on recovery and resiliency needs to be a foundational element of any cybersecurity program because the idea that an organization can combat against 100% of cyber intrusions is false. What becomes critical is the recovery of the system if/when a successful cyberattack occurs."


    I couldn't agree more. We will never eliminate all risk.So it behooves us to have a backup plan- resilient recovery strategies. NERC CIP's specific language around redundancy doesn't dismiss the importance of redundancy, but a lot of NERC CIP compliance folks do. The language says one cannot exclude a Cyber Asset from scope of CIP simply because the system is redundant. Fair enough. Redundancy doesn't protect from software vulnerabilities, malware, or mis-configuration. But too many people seem to think that this means redundancy doesn't matter, and in fact, there doesn't appear to be any requirement to have redundancy for Cyber Assets. 

    Something that NERC CIP doesn't do well: make clear that assessing technical controls for high availability at the systems level rather than at the device level can provide a more accurate perspective on real cyber security, and this high availability is achieved through redundancy of underlying infrastructure (perhaps switching and virtual network systems, or hypervisor infrastructure) that has little or nothing to do with BES functions, BES Information etc. Building resiliency in and eliminating reliance upon single devices (or as I like to call them, "single points of failure") is a key part of virtualization's benefit.

    The entire mindset behind and promoted by the NERC Glossary and the definition of BES Cyber Asset is to blame for this lack. Add that to the prescriptive requirements, device-centric example measures, and the device-oriented Severity Level tables, and you get a self-reinforcing  echo chamber about how to achieve reliability that makes it difficult to look outside the way it has always been done.

    Tuesday, April 25, 2017

    Monday, April 24, 2017

    A Tale of Two Viewpoints: Mixed Trust vs Shared Infrastructure



    In NERC CIP Standards Drafting efforts, industry chatter, and auditing, there has been quite a bit of talk about “mixed trust”; meaning an environment that has both BES Cyber Systems and Cyber Assets not subject to CIP standards. Non-CIP Cyber Assets may be systems that are under the Responsible Entity’s control, and may even be providing functions related to Grid Operation, but not functions which “if rendered unavailable, degraded, or misused for 15 minutes will affect the reliability of the Bulk Electric System. For example, corporate business systems are not BES Cyber Assets, even though they are under the control of the same Responsible Entity as the BES Cyber Systems.


    Here’s the thing; “mixed trust” is not the right term. Mixed trust would imply that Cyber Assets of different trust levels can access each other in an uncontrolled fashion. Nobody is proposing a relaxation of security controls between CIP and non-CIP assets. What is being proposed is “shared infrastructure”. At some level we all have shared infrastructure- the same building, the same power, the same Internet connection. Shared Infrastructure doesn’t mean “mixed trust”. Shared Infrastructure can have logical controls and isolation involved.


    A few days back, I wrote about Streetlight Effect in relation to Lew Folkerth's "Lighthouse" article in the March-April issue of Reliability First's newsletter. In it he talks about “Zones of Authority” in relation to the audit process for NERC compliance. At the end of the article he makes an assertion about Virtualization being a bad idea not because of actual security concerns, but because of the auditor’s inability to look at things outside of the designated Electronic Security Perimeter (compliance concerns) required for BES Cyber Assets. While I respect Lew's experience and appreciate the viewpoint, I disagreed with that approach pretty strenuously.


    Tom Alrich has another perspective on the article. He makes a point about security being enhanced if the RE’s entire network were in scope for NERC CIP Compliance audits. I don’t think I can agree with that either (although he does acknowledge that the average compliance specialist would rather repeatedly hit themselves in the head with a hammer than take this approach because of the burden of paperwork involved, so it doesn’t appear to be a serious suggestion.) Such an approach would only be a net gain if compliance evidence production didn’t overwhelm the efforts to secure things in the first place. And if all of the compliance requirements were strictly security requirements and not just designed to make audits easier.


    But realistically, bringing security mechanisms that exist outside the ESP (Electronic Security Perimeter) into scope of auditing doesn’t require making the entire corporate network subject to inspection by the Regions. Here's the problem with CIP-005 and ESP. Most NERC CIP standards require you to have a program or process to accomplish X. CIP-005, on the other hand, requires everything peripherally related to BES Cyber Systems to reside inside the ESP. What we have here is the "hard crunchy shell" concept from 20 years ago. It provides “bright line” criteria for audit and categorizing assets as either in-scope or out, because the auditor can require a diagram with the asset shown inside a neatly drawn “dotted red line” (inside joke for NERC CIP compliance specialists), but it doesn’t provide good security on its own.

    Some of this difficulty is definitional. A PCA "protected cyber asset" is any cyber asset "associated" with a BES Cyber System and can exist "in or on" an ESP, but this is only made specific in the NERC Glossary.  A BES Cyber Asset must be contained in an ESP (CIP-005). An Intermediate System must reside outside the ESP. An EACMS can be in or out.


    Documenting a mechanism that provides a security function doesn’t require that everything else on that network be in scope for auditing. Authentication, Authorization, and Accounting (AAA) may be partly provided by an RSA Token system. That AAA system could reside inside or outside of the security zone where clients of the system exist. The documented controls for CIP AAA could bring that RSA server into scope without touching other, unrelated non-CIP servers on its network segment.


    The key to providing good security for BES Cyber Assets doesn’t rely solely upon whether a device is inside a particular perimeter. It depends upon the controls applied to accessing that device. A layered defense of the device includes boundary identification & control, but it doesn’t stop at the outer boundary.

    Let's address a couple specific points:


    "Lew says (or implies) that auditors are having differences of opinion with entities on the security of mixed-trust switches. It seems these entities have switches that implement both ESP and non-ESP VLANs. When the auditors tell them this isn’t secure, the entities point out that the non-ESP VLANs have just as good security as the ESPs do. So why aren’t they safe?"


    Let's examine a scenario with two VLANs, one named "CIP" and one named "non-CIP". The main point really isn't that the security of "non-CIP" is as good as "CIP". The most important thing is that network isolation is maintained between them. The configuration of devices in one VLAN are irrelevant to devices in the other VLAN. If it's just a Layer 2 VLAN, there is no "mixed trust" because the traffic from "non-CIP" doesn't mix with that of "CIP". If it's a Layer 3 switch, and routing takes place between the 2 VLANs, then an EAP must be identified (probably the VLAN's logical or virtual interface) and inbound/outbound access controls applied at that point. If it is a Hypervisor situation, then the same principle applies. vSwitch "non-CIP" doesn't have a traffic path that includes vSwitch "CIP" and the Guests cannot communicate with each other. If you create an EAP between them, then you apply the inbound and outbound access controls at that point.

    And lest we forget, L2 VLAN is not the only way to have a shared switch infrastructure. Software Defined Networks and Network Overlays exist on a shared hardware infrastructure, but provide network isolation between the security zones as defined in configuration. This isolation, like that of the VLAN example above, is provided as a baseline function of the device, implemented in the control plane by code base. Access to modify configuration and code base is confined to a management plane interface. Traffic transiting the device doesn't have access to this function, only administrators accessing the device in the management plane do.


    "Anything else, such as a VLAN that isn’t an ESP, is completely out of his or her purview; the auditor just has to assume these are completely insecure, and thus shouldn’t be found on the same switch as ESP VLANs."


    Auditors aren't forced to assume anything.  The configuration of the switch is in scope, because it provides isolation to the "CIP" VLAN. (It doesn't provide ESP under the current model, because it doesn't provide EAP with in-bound/out-bound ACLS. It simply provides isolation, which is one flaw in CIP-005's prescriptive approach. CIP-005 would benefit from being changed to a security objective of isolating more critical assets and traffic from less critical.)The configuration of devices in the "non-CIP" VLAN are out of scope. If something is out of scope, they are not required to render an opinion on it.


    "the auditors can’t look at what isn’t in an ESP, which limits the kinds of evidence an entity can show them. "


    Currently, auditors can examine EACMS that may exist outside the ESP. They can examine Intermediate Systems which are required to be outside the ESP. I don't think this is a valid assertion. It's an auditing approach preference with deliberate blinders on.


    "most entities, if told they could force CIP auditors to consider security controls they have implemented for non-ESP networks, but only in return for having at least some of the cyber assets on those non-ESP networks fall into scope for CIP, would say “Thanks but no thanks. We’ll leave things as they are.”


    Perhaps the entity might choose that. But the issue isn't "security controls implemented for non-ESP networks." It is "security controls that exist outside the ESP, that are implemented to protect CIP Assets". ESP is a fundamentally limited construct, and must be considered fatally flawed when used without other other layered defenses.