Friday, August 18, 2017

Virtualization in the CIP Environment Drives Discussion of Applicable Systems Classifications

Here's a draft of a Whitepaper I intend to submit to NERC.

Alt Title:

More Problems With EACMS

Summary:

One of the topics entwined in the NEC CIP virtualization discussion is the risk posed by the virtual management consoles. This is related to consolidated interfaces and automation where management console access can change or delete entire infrastructures including virtual servers, networks, and storage. A short-hand phrase has been coined calling this the “Fewer, Bigger Buttons” problem.

Because this is a valid concern that is not really addressed at all in NERC CIP, the scope of discussion quickly grew to address similar Centralized Management Systems (CMS) in the physical arena as well, and from there to all kinds of systems which pose, or appear to pose similar risks.

Statement of the Issues:

The 2016-02 SDT has been discussing several options publicly. One option is to classify CMS (a heretofore undefined term in NERC CIP, and not an industry standard definition in Cyber Security either) as applicable systems and apply specific security requirements to them.

Another option the SDT has discussed is including CMS into the existing EACMS definition. This is more or less by default the approach taken to date with all of those legacy enterprise automation tools. Obviously, this capability and risk has existed, largely unacknowledged in CIP standards, for a long time without being confined to virtualized environments. HP Openview, IBM Tivoli, Solarwinds Orion, and CiscoWorks (to name just a few enterprise automation tools) have had the ability to affect the entire enterprise all at once for literally decades now.

Anti-malware and software patching systems likewise. If a system in my network operates via a system account with administrative privileges that allow it to modify the configuration of a BCS, isn't that tool both a target and a potential attack vector? And if such a system inside my network that performs this function is an EACMS, then isn’t Microsoft’s Software Update Services site, and Ubuntu’s Linux Repository, or Symantec or McAfee’s antivirus signature update sites on the Internet as much an EACMS as any SCCM or Anti-virus server inside my network?

However, this perpetuates and exacerbates an issue where EACMS has become a "catch-all" category of CIP-related Cyber Assets with one-size-fits-all requirements regardless of the degree of risk or technical constraints posed by the particular system.

An example is the Intermediate System. In order to make the IS subject to CIP requirements it has to be categorized somehow as a type of applicable system. To function as an intermediate in practical terms (and by definition per the NERC Glossary) it has to be outside the ESP. Apparently the IS has therefore been categorized as an EACMS simply because that is the only category currently available that allows for applicable systems to be subject to CIP requirements outside the ESP.

Thus, the otherwise-unconnected phrase “This includes Intermediate Systems” was tacked onto the end of the EACMS definition. It is notable that no other examples had previously been given.

The Definition of Intermediate Systems:

“A Cyber Asset or collection of Cyber Assets performing access control to restrict Interactive Remote Access to only authorized users. The Intermediate System must not be located inside the Electronic Security Perimeter”

Here we see a presumably audit-able CIP requirement set in the definition of Intermediate System rather than in a table of requirements or in a security objective. We see a cyber security function (Authentication and Authorization) defined as a specific type of applicable device, and we see the security benefit of such a function truncated to apply only to users who interactively access Cyber Assets inside the ESP from outside the ESP rather than applying the benefit of robust authentication, authorization and accounting to all remote access.

Another example of the problem with the one-size-fits-all approach to compliance requirements for EACMS is what is known as the “hall of mirrors” effect. Specifically, there may be some types of Ecyber Security Systems that should be required to be protected behind a firewall. However, that requirement can’t exist for all EACMS without defining a new category because a firewall is itself an EACMS. Without defining a new category, the result would be every EACMS needing to be inside an ESP and protected by another EACMS which creates a recursive "hall of mirrors" effect without end.

In addition to the catch-all and recursive problems I've just noted, there is also a missing component: Risk-based assessment and mitigation. For example, a system that only monitors and logs access (such as monitoring systems like Splunk, Tripwire, etc) does not pose the same level of risk as a management console for a large virtualized Control Center infrastructure. In addition, the technical controls to mitigate the risk may differ. A SIEM system presents a risk of leaking BES Cyber System Information; an electronic access control (AAA) system presents a risk of unauthorized access to or modification of a BES Cyber System’s operational parameters; and a Centralized Management Console (physical or virtual) presents an infrastructure reliability risk.

Risk-based assessments and mitigation would help with this. It would allow for acknowledging that there are a number of systems that both monitor and provide part of the solution of controlling access, but which do not actually control traffic at the point of entry. These devices or systems may or may not benefit from being inside a protected boundary, or they may form part of the strategy that protects BES Cyber Assets. The technical means of implementing some multi-part systems may require that components be outside or that they span the ESP.

All of this appears to be an unfortunate artifact of the single-level security mindset inherent to the ESP approach (hardened perimeter defense) and the “All-In” nature of CIP Applicable Systems. Part of this problem (creeping scope of applicability to increasingly peripheral systems) stems from using the ESP to define scope of applicability, rather than using risk to BES Cyber Systems and security objectives to define the scope of necessary controls. FERC does not allow a Registered Entity to assess and accept their own risks due to the interconnected nature of risk to the BES. At the same time, NERC and the Regions have zero incentive to accept risks on behalf of Entities. NERC and the Regions bear none of the cost of mitigation, and would receive the lion's share of criticism in the case of a failure of reliability or security breach. It is very difficult to create standards that are effective, comprehensible, and inclusive of different technical capabilities, based upon the existing definition of EACMS or even new definitions structured with the same ESP-as-hardened-perimeter mindset. ESP is easy to visualize. Drawing a “red dotted line” around assets needing protection is convenient. It’s simply not sufficient. In contrast, a modern defense-in-depth, systems- rather than device-based approach doesn’t require torturing definitions.

With these examples in mind, the issues caused by having EACMS as a “catch all” category are obvious. Some applicable systems were defined into existence for compliance purposes and these standards are incompatible with broader standardized cyber security best practices, and with one-size-fits-all requirements applied to the whole artificial group.

Recommendations:

So what do we do about it?
  1. The audit process needs to envision an approach more concerned with meeting an objective than a performance requirement. Then it doesn’t matter where something resides, as long as the applied combination of protections achieves the objective. It doesn’t matter much what is providing the control, as long as the assets needing protection receive the control. For example, the implied security objective of CIP 5 is not to “have inbound and outbound rules”. That is a limited method of achieving the real objective, which is to protect the BCS from unauthorized and potentially malicious traffic. If you have a better method, you should be allowed to use it.
  1. It might be necessary to break up the EACMS category of applicable systems into discrete functions (bullet points below) so that the appropriate security objective and requirements for each can be derived and applied whether the systems in question are physical or virtual.
  • SIEM - Security Incident & Event Monitoring systems.
This would subtract the “M” for “Monitoring” from EACMS. These are systems that strictly monitor and collect information about the ESP and BES Cyber System electronic communications or status but do not control access. A great deal of literature, discussion, guidance, and best practices are published across a broad range of industries as to how to securely implement SIEM. The risk presented by compromise of these systems revolves around the information (such as configurations and event logs) they contain. The crux of the concept here is that the protections already defined for BES Cyber Systems Information (BCSI) are adequate and effective at providing protection for this information, and it is the information that needs protecting, not necessarily the SIEM system.


A rather large issue with EACMS is CIP-004 and its applicability of most personnel-oriented controls (training, background checks, etc.) to anyone with potential access to an EACMS. This kills sharing any service whatsoever (AAA, SIEM, etc.) because anyone with any form of “access” to an EACMS gets sucked into CIP-004. Have an account on a EACMS AAA server but NO access to any BCS? Too bad, you still must have a CIP background check and be trained on the CIP program (beyond your ‘need to know’). It’s a symptom of the “all-in” issue.

Splitting these monitoring systems out and adjusting the requirements would may allow entities to more easily use outsourced managed security service providers or global/enterprise-wide SIEM systems and correlate event information in their CIP operational environments with those in their non-CIP environments to provide increased security and reliability benefits. The concern is that under current standards the CIP program and device-level CIP audits might be deemed to encompass Cyber Assets which do not actually affect the reliable operation of the BES in the wider enterprise network or at the service provider and could therefore dilute attention from BES Cyber Security functions in favor of paperwork exercises.
  • EACS - Electronic Access Control System. The fundamental defining characteristic of an Electronic Access Control system is that it performs authorization of traffic or users. This is the gatekeeper function- the classic Authentication and Authorization functions of standard AAA.

In many cases these systems do not perform any active filtering of the traffic passing through any particular interface. The primary duty of EACS is to authenticate and authorize. Additional components of electronic access security strategies are accounting (logging) systems and gateways which actually pass or drop traffic. In comparison to the SIEM discussion above, EACS & EAG move beyond the risk of unauthorized access to meta-information about an environment to unauthorized access to and modification of operational parameters of the actual BES Cyber Systems.

In contrast, theoretically an application level IPS could send a control signal from anywhere in the enterprise to anywhere in the enterprise telling a specific host not to respond to a given type of packet, with a certain payload, from a particular address- and it could do this dynamically based upon heuristics rather than signatures, but this outstanding security solution would not be a compliant solution under our current regime.

Or a network level intrusion protection system combined with dynamic firewall rules may send a control message to a firewall instructing it to dynamically change an Access Control List (ACL) in response to traffic patterns indicating a threat. The IPS does not itself filter traffic in this scenario, but it is involved in controlling access. An Active Directory server may enforce a lock-out on a user account after hours or subsequent to a number of incorrect password attempts. It is involved in controlling access (authentication and authorization), but it is not blocking traffic at layer 3 and from the current NERC CIP ESP perspective is therefore irrelevant in boundary protection. All boundary protection requires access controls, but not all access control is boundary protection.

A metaphor for Access Control Systems that do not reside in or on an ESP/ESZ is: a pair of military units with interlocking fields of fire supporting each other against frontal assaults and flanking movements. Defense relies upon being positioned to assist one another, not on “being inside a fence”. Electronic Access Control Systems (AAA) can work to protect Cyber Assets inside the ESZ from anywhere.
  • EAG - Electronic Access Gateway. The fundamental defining characteristics of an EAG are that it hosts the EAP and performs the active function of filtering or forwarding traffic at the demarcation point (boundary protection). Primarily it is firewalls and routers that perform gateway functions at the layer 3 ESP boundary demarcation point. Virtual firewalls and virtual routers inside a hypervisor perform the same function in the same manner. However, hypervisors themselves may not be EAGs if they are not configured with a virtual firewall or virtual router function to provide a gateway function.
Modern security methods typically employ a defense in depth strategy using distributed AAA (Authentication, Authorization, and Accounting) systems to authenticate and authorize access to the Electronic Security Zone based upon characteristics of the user and traffic, while the EAG subscribes to the AAA service for user permissions and filters (permits or denies) traffic based on the source, destination, and port or protocol. Further, Electronic Access Control strategies often employ multiple devices, each containing a part of the AAA solution, such that compromise of one element of AAA does not result in the entire system failing. Vendor and platform diversity within a defense-in-depth systems-based approach to Access Control are generally an element of securing the entire system from vulnerabilities common to specific classes of devices (e.g. all-Windows or all-Linux environments may have common configuration or malware vulnerabilities). Often the Accounting (logging) function is used to determine (and formulate a strategy to correct) any failures in Authentication and Authorization. 

Although some EAG devices are also capable of performing various levels of functions to authenticate and authorize traffic, many are not capable of complete AAA solutions in themselves and therefore differ enough from EACS to warrant different technical control measures. Requirements that acknowledge the difference and allow for handling them differently will prevent any "hall of mirrors" effects as described above only. 

For conceptual discussion purposes EAG acts somewhat like the legacy ESP as a logical demarcation point for conceptual discussions to delineate PCA and BCS from non-CIP-Applicable Cyber Assets. It may even be useful to replace ESP completely using ESZ with EAG for demarcation points.
  • CMS - Centralized Management System. System using an elevated privilege account either on behalf of an interactive user or in an automated fashion, allowing mass modification of BES Cyber Systems.
As discussed, these systems are the ones driving the SDT's apparent thought process. The risk posed by these systems is not just unauthorized access to information or BES Cyber Systems, but the ability to modify or destroy the infrastructure the BCS rely upon, or the BCS themselves. These will have unique requirements over and above the others. It would obviously not be beneficial to simply create a reclassification and documentation exercise for entities who would not see sufficient benefit.

Proposed definitions:

AAA: Authentication, Authorization and Accounting systems are Cyber Systems that control ‘Gatekeeper’ functions (electronic access methods and permissions, e.g. authentication of users or control messages that alter dynamic ACLs) for Cyber Systems.
CMS: Centralized Management Systems are Cyber Systems that perform automated management tasks and mass configuration of BCS whether scheduled or on demand using a dedicated service account credential.
EACMS: Deprecated.
EAG: An Electronic Access Gateway is a Cyber Asset that performs active electronic traffic control (filtering and/or forwarding) for ESZ boundary protection based upon the criteria given to it.
EAP: An Electronic Access Point is the logical interface on an EAG where traffic filtering operations take place. 
ESZ: An Electronic Security Zone is the logical container providing separation or isolation from threats or attack vectors to grouped Cyber Assets, said Cyber Assets being characterized by similar operational criticality, or sensitivity to compromised data confidentiality and integrity, as well as needing similar access controls, audit logging and/or monitoring requirements.
SIEM: Security Incident & Event Monitoring is a Cyber System that performs electronic monitoring of Electronic Security Zone(s) or Cyber Systems. 

Friday, May 5, 2017

The problem with EACMS

The NERC CIP Glossary is foundational to the (let us not forget "mandatory and enforceable") CIP Standards

One of the terms defined there is Electronic Access Control or Monitoring System (EACMS):

"Cyber Assets that perform electronic access control or electronic access monitoring of the Electronic Security Perimeter(s) or BES Cyber Systems. This includes Intermediate Systems."

Disregard the last sentence for a moment. There are a few examples of throw-away statements like this added in NERC CIP for convenience, or exceptions where no one could think of wording that would be universally applicable.

Focus on the meat of the definition: it's an access control and access monitoring system. Outside of NERC CIP Standards, this is generally known as AAA: Authentication, Authorization, and Accounting, which actually captures the steps involved in granting and monitoring access much better than the EACMS definitions. There is also tons of guidance and information on implementing AAA in the broader IT Security realm.

But ignore that for a moment too.

The real problem with NERC CIP Standards and the applicable systems that they list, is that some systems that fall into the EACMS category (plus a number which don't) actually pose a much more significant risk than simple access management. They actually perform configuration management via "service accounts" with elevated privileges.

So for example, an Active Directory system is an EACMS, even though it not only controls access, it also controls configurations. But there is no requirement in NERC CIP to monitor the configuration changes made during a session, only the access to the system. Specifically, the failed and successful login attempts.

SCCM is not specifically an EACMS, even though it has an agent installed on Windows devices, and has an elevated privilege service account with Domain Administrator equivalent permissions. But it doesn't control or monitor access attempts.

There is no requirement to protect these systems any differently than any other user-accessible system, like perhaps a data portal on a web server. There is no requirement to separate user or system access within an ESP based upon roles, or impact levels (another problem with NERC IP is impact level is based upon the facility's physical impact on the Bulk Electric System, not the Cyber System's impact on operations). Once you're in, you're in and there is no requirement and not even any explicit security objective to do more than guard the perimeter. As I've mentioned before, this is the "hard crunchy shell, soft gooey center" model from 20 years ago.

On top of that, there's an exemption to remote access requirements for machine-to-machine communications. A management system located outside the Electronic Security Perimeter isn't even required to have encryption, and has no special requirements above and beyond the simple baseline, change management, and logging requirements applied to any system used to support the BES.

Weird.

Monday, May 1, 2017

P,P, & T

Security is people, processes, and technology. Of these, there's a reason why technology is listed last.

Gadgets simply don't suffice without people driving them in a consistent, smart manner.

I bring this up because of the focus on device-level controls, measures, and impact criteria in NERC CIP. It doesn't get much more technology-oriented than expecting a security solution to be all-in-one on a particular box, rather than based upon a combination of technical controls dispersed across the network, in combination with process controls and people at the helm monitoring.

Friday, April 28, 2017

GAAP... GASP...

Tom Alrich says that, at the RF CIP workshop Lew Folkerth pointed out that:
"the key to being able to audit non-prescriptive requirements is for the entity to have to demonstrate that the measures they took were effective."

A lesson can be taken here from the principles of GAAP: "Generally Accepted Accounting Practices". 

Paraphrasing GAAP: There is no absolute, perfect accounting during an audit. There are only sliding scales of better and worse practices. You must depart from [the accepted practice] if following it would lead to a material [mistake]. In the departure you must disclose, if practical, the reasons why compliance with the principle would result in a [mistake].

I know that the electric utility industry doesn't like outside influence, and has a severe "not-invented-here" allergy. But we really do need to move toward "Generally Accepted Security Practices" (GASP, to flippantly coin an acronym.)

So the take-away is that an entity should be able to demonstrate either that the novel approach they took is effective via testing (such as penetration testing), OR that it is widely "accepted" as a security best practice. Making every entity extensively, intrusively pen test every single product, software update, and configuration is counterproductive, and outside the core competency of the entity. Vendors and security firms test these things and make recommendations. Other than due diligence in researching a solution and verifying the provenance and integrity of software or firmware to be installed, an entity shouldn't have any particular obligation to "prove" perfect security of a commercial software offering or hardware device, because it's a distraction from the real issue.

The real issue is providing overlapping defenses in depth.

Thursday, April 27, 2017

Where does "Resilience" rank in NERC CIP?

The NERC CIP discussion has an oddly blank spot when it comes to the Reliability discussion. Reliability seems to mean "always available with no outage" to most people.

I'll agree, that's a good goal for the Bulk Electric System. It's just not a very achievable goal for every single Cyber Asset in the BES.

In the world of Disaster Preparedness/Disaster Response, IT security, Business Continuity Planning, and the remainder of the Critical Infrastructure sectors, the main conversation is around Resilience, not perfection. 

We know bad guys will win sometimes. We know mistakes will be made sometimes. It's important to be able to recovery quickly with minimal residual damage when these things happen. We have to look at Risk Management in terms of 5 strategies:


  • Avoidance
  • Transference
  • Reduction
  • Mitigation
  • Acceptance


    Avoiding risk is difficult when you're a large, stationary, tempting target. Transference is not practical for Critical Infrastructure- there's nobody outside the system to insure you against damage.

    So we try to reduce risk by reducing or eliminating vulnerabilities We can't do anything about reducing threats. Many of our threats are either criminal elements or nation-state actors. these threats are for law enforcement and military force to deter. We're stuck trying to close off attack vectors (vulnerabilities to particular types of threats). We try to reduce the probability of exploit by narrowing the windows of opportunity for threat actors to exploit those vulnerabilities with things like timely updates of security patches, malware signatures, and periodic tests of security mechanisms and logging. The problem is we can do a good job of whack-a-mole and still fail on a zero-day exploit (which is a combination of a threat and a vulnerability) that either didn't exist yesterday, or was unknown to us.

    FERC has told Responsible Entities that we're not allowed to accept risk on our own, because our entanglement in the Bulk Electric System means we'd be accepting risk for the whole system, not just ourselves. The Regions who audit us have permission to accept risk on our behalf but no incentive to do so because they bear none of the cost of mitigation for those risks, and would be exposed to criticism and worse if it came to light that a risk deemed "acceptable" had been exploited.

    So, mitigation.  It sort of overlaps with risk reduction in common conversation; people often use the term mitigation to mean reducing vulnerabilities. This isn't really precise from a Risk Manager's perspective. That's actually risk reduction. Mitigation is more about controlling the amount of damage that can be done if an exploit is successful.

    One way of limiting the damage is having a means to quickly restore vital capabilities. Another way is to have a means of rapidly addressing vulnerabilities once they are exploited. And we can also spread our eggs out into different baskets so any particular attack can only get to some of them. Anything that limits the scope, impact, or duration of damage is a mitigation strategy.

    Mitigation plans are the heart of Resilience. When Reliability efforts reach the point of diminishing returns, we need to start talking about contingencies, and that means Resilience. How many times does this concept appear in the NERC CIP standards? Without losing the emphasis on risk reduction, we need to start including resilience strategies in our planning.

    Wednesday, April 26, 2017

    Rapidly evolving threats & slow-moving regulatory standards

    Over at the Anfield Group Blog, Chris Humphreys posts: 

    The DOE’ s Quadrennial Energy Review Report states that:
    “The current cybersecurity landscape is characterized by rapidly evolving threats and vulnerabilities juxtaposed against the slower-moving prioritization and deployment of defense measures.” I lump regulatory standards and requirements into the “slower-moving prioritization and deployment of defense measures” as one of the key components to preventing a truly proactive stance on cybersecurity. Additional focus on recovery and resiliency needs to be a foundational element of any cybersecurity program because the idea that an organization can combat against 100% of cyber intrusions is false. What becomes critical is the recovery of the system if/when a successful cyberattack occurs."


    I couldn't agree more. We will never eliminate all risk.So it behooves us to have a backup plan- resilient recovery strategies. NERC CIP's specific language around redundancy doesn't dismiss the importance of redundancy, but a lot of NERC CIP compliance folks do. The language says one cannot exclude a Cyber Asset from scope of CIP simply because the system is redundant. Fair enough. Redundancy doesn't protect from software vulnerabilities, malware, or mis-configuration. But too many people seem to think that this means redundancy doesn't matter, and in fact, there doesn't appear to be any requirement to have redundancy for Cyber Assets. 

    Something that NERC CIP doesn't do well: make clear that assessing technical controls for high availability at the systems level rather than at the device level can provide a more accurate perspective on real cyber security, and this high availability is achieved through redundancy of underlying infrastructure (perhaps switching and virtual network systems, or hypervisor infrastructure) that has little or nothing to do with BES functions, BES Information etc. Building resiliency in and eliminating reliance upon single devices (or as I like to call them, "single points of failure") is a key part of virtualization's benefit.

    The entire mindset behind and promoted by the NERC Glossary and the definition of BES Cyber Asset is to blame for this lack. Add that to the prescriptive requirements, device-centric example measures, and the device-oriented Severity Level tables, and you get a self-reinforcing  echo chamber about how to achieve reliability that makes it difficult to look outside the way it has always been done.

    Tuesday, April 25, 2017