Friday, April 28, 2017

GAAP... GASP...

Tom Alrich says that, at the RF CIP workshop Lew Folkerth pointed out that:
"the key to being able to audit non-prescriptive requirements is for the entity to have to demonstrate that the measures they took were effective."

A lesson can be taken here from the principles of GAAP: "Generally Accepted Accounting Practices". 

Paraphrasing GAAP: There is no absolute, perfect accounting during an audit. There are only sliding scales of better and worse practices. You must depart from [the accepted practice] if following it would lead to a material [mistake]. In the departure you must disclose, if practical, the reasons why compliance with the principle would result in a [mistake].

I know that the electric utility industry doesn't like outside influence, and has a severe "not-invented-here" allergy. But we really do need to move toward "Generally Accepted Security Practices" (GASP, to flippantly coin an acronym.)

So the take-away is that an entity should be able to demonstrate either that the novel approach they took is effective via testing (such as penetration testing), OR that it is widely "accepted" as a security best practice. Making every entity extensively, intrusively pen test every single product, software update, and configuration is counterproductive, and outside the core competency of the entity. Vendors and security firms test these things and make recommendations. Other than due diligence in researching a solution and verifying the provenance and integrity of software or firmware to be installed, an entity shouldn't have any particular obligation to "prove" perfect security of a commercial software offering or hardware device, because it's a distraction from the real issue.

The real issue is providing overlapping defenses in depth.

Thursday, April 27, 2017

Where does "Resilience" rank in NERC CIP?

The NERC CIP discussion has an oddly blank spot when it comes to the Reliability discussion. Reliability seems to mean "always available with no outage" to most people.

I'll agree, that's a good goal for the Bulk Electric System. It's just not a very achievable goal for every single Cyber Asset in the BES.

In the world of Disaster Preparedness/Disaster Response, IT security, Business Continuity Planning, and the remainder of the Critical Infrastructure sectors, the main conversation is around Resilience, not perfection. 

We know bad guys will win sometimes. We know mistakes will be made sometimes. It's important to be able to recovery quickly with minimal residual damage when these things happen. We have to look at Risk Management in terms of 5 strategies:


  • Avoidance
  • Transference
  • Reduction
  • Mitigation
  • Acceptance


    Avoiding risk is difficult when you're a large, stationary, tempting target. Transference is not practical for Critical Infrastructure- there's nobody outside the system to insure you against damage.

    So we try to reduce risk by reducing or eliminating vulnerabilities We can't do anything about reducing threats. Many of our threats are either criminal elements or nation-state actors. these threats are for law enforcement and military force to deter. We're stuck trying to close off attack vectors (vulnerabilities to particular types of threats). We try to reduce the probability of exploit by narrowing the windows of opportunity for threat actors to exploit those vulnerabilities with things like timely updates of security patches, malware signatures, and periodic tests of security mechanisms and logging. The problem is we can do a good job of whack-a-mole and still fail on a zero-day exploit (which is a combination of a threat and a vulnerability) that either didn't exist yesterday, or was unknown to us.

    FERC has told Responsible Entities that we're not allowed to accept risk on our own, because our entanglement in the Bulk Electric System means we'd be accepting risk for the whole system, not just ourselves. The Regions who audit us have permission to accept risk on our behalf but no incentive to do so because they bear none of the cost of mitigation for those risks, and would be exposed to criticism and worse if it came to light that a risk deemed "acceptable" had been exploited.

    So, mitigation.  It sort of overlaps with risk reduction in common conversation; people often use the term mitigation to mean reducing vulnerabilities. This isn't really precise from a Risk Manager's perspective. That's actually risk reduction. Mitigation is more about controlling the amount of damage that can be done if an exploit is successful.

    One way of limiting the damage is having a means to quickly restore vital capabilities. Another way is to have a means of rapidly addressing vulnerabilities once they are exploited. And we can also spread our eggs out into different baskets so any particular attack can only get to some of them. Anything that limits the scope, impact, or duration of damage is a mitigation strategy.

    Mitigation plans are the heart of Resilience. When Reliability efforts reach the point of diminishing returns, we need to start talking about contingencies, and that means Resilience. How many times does this concept appear in the NERC CIP standards? Without losing the emphasis on risk reduction, we need to start including resilience strategies in our planning.

    Wednesday, April 26, 2017

    Rapidly evolving threats & slow-moving regulatory standards

    Over at the Anfield Group Blog, Chris Humphreys posts: 

    The DOE’ s Quadrennial Energy Review Report states that:
    “The current cybersecurity landscape is characterized by rapidly evolving threats and vulnerabilities juxtaposed against the slower-moving prioritization and deployment of defense measures.” I lump regulatory standards and requirements into the “slower-moving prioritization and deployment of defense measures” as one of the key components to preventing a truly proactive stance on cybersecurity. Additional focus on recovery and resiliency needs to be a foundational element of any cybersecurity program because the idea that an organization can combat against 100% of cyber intrusions is false. What becomes critical is the recovery of the system if/when a successful cyberattack occurs."


    I couldn't agree more. We will never eliminate all risk.So it behooves us to have a backup plan- resilient recovery strategies. NERC CIP's specific language around redundancy doesn't dismiss the importance of redundancy, but a lot of NERC CIP compliance folks do. The language says one cannot exclude a Cyber Asset from scope of CIP simply because the system is redundant. Fair enough. Redundancy doesn't protect from software vulnerabilities, malware, or mis-configuration. But too many people seem to think that this means redundancy doesn't matter, and in fact, there doesn't appear to be any requirement to have redundancy for Cyber Assets. 

    Something that NERC CIP doesn't do well: make clear that assessing technical controls for high availability at the systems level rather than at the device level can provide a more accurate perspective on real cyber security, and this high availability is achieved through redundancy of underlying infrastructure (perhaps switching and virtual network systems, or hypervisor infrastructure) that has little or nothing to do with BES functions, BES Information etc. Building resiliency in and eliminating reliance upon single devices (or as I like to call them, "single points of failure") is a key part of virtualization's benefit.

    The entire mindset behind and promoted by the NERC Glossary and the definition of BES Cyber Asset is to blame for this lack. Add that to the prescriptive requirements, device-centric example measures, and the device-oriented Severity Level tables, and you get a self-reinforcing  echo chamber about how to achieve reliability that makes it difficult to look outside the way it has always been done.

    Tuesday, April 25, 2017

    Monday, April 24, 2017

    A Tale of Two Viewpoints: Mixed Trust vs Shared Infrastructure



    In NERC CIP Standards Drafting efforts, industry chatter, and auditing, there has been quite a bit of talk about “mixed trust”; meaning an environment that has both BES Cyber Systems and Cyber Assets not subject to CIP standards. Non-CIP Cyber Assets may be systems that are under the Responsible Entity’s control, and may even be providing functions related to Grid Operation, but not functions which “if rendered unavailable, degraded, or misused for 15 minutes will affect the reliability of the Bulk Electric System. For example, corporate business systems are not BES Cyber Assets, even though they are under the control of the same Responsible Entity as the BES Cyber Systems.


    Here’s the thing; “mixed trust” is not the right term. Mixed trust would imply that Cyber Assets of different trust levels can access each other in an uncontrolled fashion. Nobody is proposing a relaxation of security controls between CIP and non-CIP assets. What is being proposed is “shared infrastructure”. At some level we all have shared infrastructure- the same building, the same power, the same Internet connection. Shared Infrastructure doesn’t mean “mixed trust”. Shared Infrastructure can have logical controls and isolation involved.


    A few days back, I wrote about Streetlight Effect in relation to Lew Folkerth's "Lighthouse" article in the March-April issue of Reliability First's newsletter. In it he talks about “Zones of Authority” in relation to the audit process for NERC compliance. At the end of the article he makes an assertion about Virtualization being a bad idea not because of actual security concerns, but because of the auditor’s inability to look at things outside of the designated Electronic Security Perimeter (compliance concerns) required for BES Cyber Assets. While I respect Lew's experience and appreciate the viewpoint, I disagreed with that approach pretty strenuously.


    Tom Alrich has another perspective on the article. He makes a point about security being enhanced if the RE’s entire network were in scope for NERC CIP Compliance audits. I don’t think I can agree with that either (although he does acknowledge that the average compliance specialist would rather repeatedly hit themselves in the head with a hammer than take this approach because of the burden of paperwork involved, so it doesn’t appear to be a serious suggestion.) Such an approach would only be a net gain if compliance evidence production didn’t overwhelm the efforts to secure things in the first place. And if all of the compliance requirements were strictly security requirements and not just designed to make audits easier.


    But realistically, bringing security mechanisms that exist outside the ESP (Electronic Security Perimeter) into scope of auditing doesn’t require making the entire corporate network subject to inspection by the Regions. Here's the problem with CIP-005 and ESP. Most NERC CIP standards require you to have a program or process to accomplish X. CIP-005, on the other hand, requires everything peripherally related to BES Cyber Systems to reside inside the ESP. What we have here is the "hard crunchy shell" concept from 20 years ago. It provides “bright line” criteria for audit and categorizing assets as either in-scope or out, because the auditor can require a diagram with the asset shown inside a neatly drawn “dotted red line” (inside joke for NERC CIP compliance specialists), but it doesn’t provide good security on its own.

    Some of this difficulty is definitional. A PCA "protected cyber asset" is any cyber asset "associated" with a BES Cyber System and can exist "in or on" an ESP, but this is only made specific in the NERC Glossary.  A BES Cyber Asset must be contained in an ESP (CIP-005). An Intermediate System must reside outside the ESP. An EACMS can be in or out.


    Documenting a mechanism that provides a security function doesn’t require that everything else on that network be in scope for auditing. Authentication, Authorization, and Accounting (AAA) may be partly provided by an RSA Token system. That AAA system could reside inside or outside of the security zone where clients of the system exist. The documented controls for CIP AAA could bring that RSA server into scope without touching other, unrelated non-CIP servers on its network segment.


    The key to providing good security for BES Cyber Assets doesn’t rely solely upon whether a device is inside a particular perimeter. It depends upon the controls applied to accessing that device. A layered defense of the device includes boundary identification & control, but it doesn’t stop at the outer boundary.

    Let's address a couple specific points:


    "Lew says (or implies) that auditors are having differences of opinion with entities on the security of mixed-trust switches. It seems these entities have switches that implement both ESP and non-ESP VLANs. When the auditors tell them this isn’t secure, the entities point out that the non-ESP VLANs have just as good security as the ESPs do. So why aren’t they safe?"


    Let's examine a scenario with two VLANs, one named "CIP" and one named "non-CIP". The main point really isn't that the security of "non-CIP" is as good as "CIP". The most important thing is that network isolation is maintained between them. The configuration of devices in one VLAN are irrelevant to devices in the other VLAN. If it's just a Layer 2 VLAN, there is no "mixed trust" because the traffic from "non-CIP" doesn't mix with that of "CIP". If it's a Layer 3 switch, and routing takes place between the 2 VLANs, then an EAP must be identified (probably the VLAN's logical or virtual interface) and inbound/outbound access controls applied at that point. If it is a Hypervisor situation, then the same principle applies. vSwitch "non-CIP" doesn't have a traffic path that includes vSwitch "CIP" and the Guests cannot communicate with each other. If you create an EAP between them, then you apply the inbound and outbound access controls at that point.

    And lest we forget, L2 VLAN is not the only way to have a shared switch infrastructure. Software Defined Networks and Network Overlays exist on a shared hardware infrastructure, but provide network isolation between the security zones as defined in configuration. This isolation, like that of the VLAN example above, is provided as a baseline function of the device, implemented in the control plane by code base. Access to modify configuration and code base is confined to a management plane interface. Traffic transiting the device doesn't have access to this function, only administrators accessing the device in the management plane do.


    "Anything else, such as a VLAN that isn’t an ESP, is completely out of his or her purview; the auditor just has to assume these are completely insecure, and thus shouldn’t be found on the same switch as ESP VLANs."


    Auditors aren't forced to assume anything.  The configuration of the switch is in scope, because it provides isolation to the "CIP" VLAN. (It doesn't provide ESP under the current model, because it doesn't provide EAP with in-bound/out-bound ACLS. It simply provides isolation, which is one flaw in CIP-005's prescriptive approach. CIP-005 would benefit from being changed to a security objective of isolating more critical assets and traffic from less critical.)The configuration of devices in the "non-CIP" VLAN are out of scope. If something is out of scope, they are not required to render an opinion on it.


    "the auditors can’t look at what isn’t in an ESP, which limits the kinds of evidence an entity can show them. "


    Currently, auditors can examine EACMS that may exist outside the ESP. They can examine Intermediate Systems which are required to be outside the ESP. I don't think this is a valid assertion. It's an auditing approach preference with deliberate blinders on.


    "most entities, if told they could force CIP auditors to consider security controls they have implemented for non-ESP networks, but only in return for having at least some of the cyber assets on those non-ESP networks fall into scope for CIP, would say “Thanks but no thanks. We’ll leave things as they are.”


    Perhaps the entity might choose that. But the issue isn't "security controls implemented for non-ESP networks." It is "security controls that exist outside the ESP, that are implemented to protect CIP Assets". ESP is a fundamentally limited construct, and must be considered fatally flawed when used without other other layered defenses.

    Friday, April 21, 2017

    Cost: FREE
     
    Date:           April 25-26, 2017
    Upcoming Location:     Denver, CO
     

    https://aws.amazon.com/government-education/events/automating-security-cloud-workshop/
     

    Thursday, April 20, 2017

    From the EnergySec website:

    "EnergySec published comments on the Transmission Owners Control Center and Virtualizations white papers the Standards Drafting Team is seeking comments on. Since EnergySec does not formally vote on SDT ballot issues, we are not formally submitting these comments to the SDT. However, we are making them available for our members to use or revise as they see fit. The documents are available in the Members section of the EnergySec Community wiki."


    Not only are they not formally submitted, it would appear that almost no one on the SDT or the NERC staff associated with it actually has access to read it. You would think that something like this would be more use to their membership if it were not behind a pay wall.

    Tuesday, April 18, 2017

    CIP 9 and Hypervisors

    Carlo asked a question about CIP 9 in comments to a previous post.

    I hope this answers it well enough:

    Backup solutions are going to be specific to your Hypervisor choice. If you're working with a Type I Hypervisor (Bare Metal) your backup solution (for the Hypervisor itself) may very well be “install fresh from vendor media” because there is very little specific data to be restored and which Host the Guest resides upon is transparent to the Guest and/or users of the system. If you have a Type II Hypervisor that includes an operating system, and possibly performs functions other than Hypervisor in addition (which I would strongly recommend AGAINST), then your backup solution may be more complex.

    In most cases, backups for Hypervisors are not urgent because you plan and implement swap space capacity into your infrastructure. You should have more Hypervisor capacity than is necessary for the Guests in each. This allows for maintenance of Hypervisors (patching, upgrades) without taking any functionality offline. You simply move Guests from the Hypervisor being taken offline to other Hypervisors for the duration of the procedure. In an unplanned outage, the same process is used. Guests don’t rely on any specific Hypervisor, they simply need a Hypervisor.

    Backup solutions for Guests can work exactly the same as your traditional network-based backup management software. I wouldn't do it that way, but it's possible.

    Most Virtual Cluster Management consoles allow for "snapshots" of the running state of your Guest. This is much more complete than a backup of files and directories, and requires merely rebooting to a previous state rather than a process of restoring files and settings to a base image.

    The advantage here is that if your target to be restored is participating in a security domain such as Active Directory, you're merely restoring a previous state, not deleting and recreating AD objects with unique and possibly conflicting GUIDs. It's more complicated if it is an AD Domain Controller; this may require authoritative or non-authoritative restore procedures (depending on the state of the AD domain and how corrupted it may be).

    Depending on your Hypervisor choice, the Guest’s configuration data (number of processor cores, RAM, storage targets etc.) may be contained in the image of the Guest or in your private cloud cluster manager. This configuration data is rather static and rarely changes. In most cases, the cluster manager is a virtual machine itself and can be rebooted from a snapshot. It could also be backed up across the network using a traditional backup application.

    If you’re using standalone, unmanaged Hypervisors (perhaps for cost reasons) then you have more manual planning to do. You have to make sure that you have a process to identify target destination Hosts with adequate resources and you should also maintain functional redundancy or security zone separation for manual moves of Guests during planned or unplanned outages. For automating Guest movements, this is managed by setting affinity of certain Guests to targeted Hosts.

    For CIP-009-6 specifically, nothing changes until CIP-009-6 R1.3. Here we need to note that the Hypervisor doesn’t have any BES Cyber System Information. The Hypervisor is just a container, and doesn’t interact with the code base of the BES Cyber Systems themselves. So “processes for the backup  and storage of information required to recover BES Cyber System functionality {emphasis added} apply more strictly to the Guests individually (Cyber Asset) and to the Hypervisors as a general function (Cyber System).

    R1.4 and 1.5 are also going to have to be applied at a Cyber Systems level rather than Cyber Asset.
     

     
     

    Virtualization Webinar Today (NERC CIP Standards Drafting)


     

     
    Industry Webinar
    Project 2016-02 Modifications to CIP Standards Virtualization in the CIP Environment

    April 18, 2017 | 3:00 – 4:30 p.m. Eastern
     
    Click here for Webinar Registration
    Dial-in: 1-415-655-0002 | Access code: 731 913 110

    Background
    In addition to the Order No. 822 directives, the Project 2016-02 Modifications to CIP Standards Drafting Team (SDT) is addressing four issues identified by the CIP Version 5 Transition Advisory Group (V5 TAG). These issues are:
    ·         Cyber Asset and BES Cyber Asset Definitions;
    ·         Network and Externally Accessible Devices;
    ·         Transmission Owner (TO) Control Centers Performing Transmission Operator (TOP) Obligations; and
    ·         Virtualization
     
    Virtualization of Cyber Assets provides advantages for the availability, resiliency, and reliability of applications and functions hosted in such an environment when implemented in a secure manner. The SDT is offering a series of webinars in an effort to provide a resource for technical information related to several concepts relevant to virtualization. During the first webinar, held on March 21, 2017, the SDT discussed logical isolation and Centralized Management Systems (CMSs), in addition to introducing storage virtualization.
     
    Webinar Objectives
    This second webinar will discuss the Hypervisor with a special focus on template considerations, as well as multi-tenancy and the concepts of underlay hardware and Electronic Security Zones. This webinar will also expound on the introduction to storage virtualization provided in the first webinar, to include discussion of storage area networks and the manner in which underlying virtualization concepts are similar to server virtualization. Finally, this webinar will discuss storage virtualization scaling and data leakage.
     
    For more information or assistance, contact Katherine Street (via email) or at (404) 446-9702 or Mat Bunch (via email) or at (404) 446-9785
    3353 Peachtree Road NE
    Suite 600, North Tower
    Atlanta, GA 30326
    404-446-2560 | www.nerc.com

     

    Monday, April 17, 2017

    Definitions matter

    There is a need to revise CIP language to clarify “programmable” due to differences between the current NERC CIP definition of Cyber Asset, the language in Section 215 of the Energy Policy Act of 2005 that discusses Cyber Assets as Electronic Programmable Devices, and commonly understood security standards and definitions of computers or cyber devices.

    There is a perception that the particular wording of “Cyber Asset” is a deliberate, well-thought-out, and legally binding definition. However, there are multiple inconsistencies between NERC CIP Standards, the Energy Policy Act, and FERC Orders which have not been legally challenged and have not prevented progress from being made to security standards. This being so, there is no practical benefit in objecting to modifications based on a presumption of precision in the original wording.

    These varying definitions have caused some confusion in categorizing Cyber Assets as in-scope. There may be gains to be achieved by modifying the definition to be more consistent both internally and with cross-sector IT security practices that are more technically in line with the way devices are designed by vendors and intended to be operated. When the parsing of the grammar becomes too circular, the utility of the definition is lost. The main goal of NERC CIP standards MUST be usefulness of the standard.

    Some commenters have made the point that NIST does not use the term “Cyber Asset” and recommend using the term “computer”. However “computer” also has connotations of server/workstation to many people and is not inclusive of other information processing devices such as network and security appliances, cyber-physical industrial control system devices, etc. “Cyber Asset” is a workable, comprehensible and inclusive term that provides benefit to the security discussion and therefore should be retained.

    Cyber Assets are platforms which can accept variable sets of encoded instructions known as operating systems and software programs. They use these instructions to manipulate data inputs to create outputs in the form of processed data or in the case of Cyber-Physical devices, control signals. This programming is stored in either volatile or non-volatile memory, and may reside in the device or on other devices in the overall Cyber System that provides storage services to the device.

    Conversely, dedicated devices which perform a function defined purely by the physical configuration of the device (dip switches, jumper connectors, or EEPROM) and not in a changeable, encoded set of logic-based instructions are not generally considered to be Cyber Assets, but rather microprocessors. The modification of that dedicated function (control plane logic) is not programmable via a human or network-accessible communications interface (management plane) that can be interacted with logically by other Cyber Assets. Re-programming requires physical modifications to the micro-processor device, often by a vendor technician at a factory using tools that change the physical or electrical properties of the device. These devices are not in any practical way “programmable” by the user and the risk of them being re-programmed maliciously or covertly are mitigated by physical access controls.


    Additionally, devices which have a stored firmware not accessible unless installed in another device (such as but not limited to internal/external hard drives, flash drives, Ethernet or Wireless NICs cards or USB, Security dongles, serial adapters, etc.) are not Cyber Assets in themselves because they are not capable of being re-programmed or executing code without being installed (permanently or temporarily) in a Cyber Asset. These types of devices are peripheral components of a Cyber Asset or removable media. While these devices may pose a risk of carrying mal-ware, the means of mitigating that risk is separately covered by removable media controls and supply chain requirements.

    Friday, April 14, 2017

    Do you really need a requirement for that?

    Over at the Anfield Blog, CEO Chris Humphreys says:


    "3. Does your organization really need a formal standard to tell you that you should be testing any/all third party software/hardware before deploying it within your operational environment? 
    This is the most alarming concern I have. If you answered “yes” to the above question, the state of security within our industry is in horrible shape. Nothing gets me more fired up than when I speak to a security “expert” at a utility he says: “There’s no NERC requirement for me to do that.” I’m sure that’s exactly what the Iranians said before they installed those PLCs in their nuclear reactor."
    {my own emphasis added in the second paragraph.}

    I really can't add anything to that, other than it's not just a Supply Chain issue. It sort of places some other people's opinions about the benefit of "Mandatory and Enforceable Standards" in context.

    Thursday, April 13, 2017

    Cyber Assets vs Cyber Systems Confusion in the Virtual Environment

    I'm concerned to still see discussion in the Nerc-isphere about how to categorize a VM in the most simple of clear-cut examples: the virtual server/hypervisor combination. Some folks still disagree that it is necessary to treat each virtual machine and hypervisor as separate Cyber Assets. They think:


    "The hypervisor (parent) is the device or software which runs the virtual machine (child). The virtual machine (VM) cannot operate without the hypervisor. This shared relationship means that neither can be separate Cyber Assets. For example, if a VM has been identified as a BES Cyber Asset (BCA); the hypervisor that runs the VM is also a BCA; which also applies to PACS, EACMS, and PCA’s

    Treating the VM and hypervisor as separate Cyber Assets can cause mixed-trust virtual environments; the hypervisor runs CIP and corporate VM’s. CIP controls are only being applied to the CIP VM and not the hypervisor; even though the hypervisor “if rendered unavailable, degraded, or misused” can impact the CIP and corporate VM’s."


     Allow me to counter:

    The hypervisor (parent) is the device or software which runs the virtual machine (child).

    The Hypervisor or Host merely provides a container or environment for the virtual machine. In this aspect the control plane of the Hypervisor operates certain control plane functions on behalf of the guest. However, the Hypervisor is not involved in the control plane¹ decisions made by the guest.  The Hypervisor does not need to (and should not be configured to) interact in the data plane of the Guests. They make computing decisions entirely independent of each other, including their reactions to inputs and malware. If an RE decided to treat Host and Guest as one Cyber Asset for compliance reasons, there is no logically consistent framework to require the long-standing and well-known security best practice of separating the management and data plane of the Hypervisor and Guests. The guest should be completely unaware of and unable to interact with the Host in a secure virtual environment. The Host and Guest more often than not run different operating systems, and it would be difficult to categorize that as one Cyber Asset for any practical purpose. 

    The virtual machine (VM) cannot operate without the hypervisor.

    This is not strictly correct. More precisely, the VM cannot operate without a hypervisor. It is not dependent upon any specific hypervisor (rather, it can exist on any Hypervisor in the cluster and this is one of its biggest advantages). Therefore treating them as one Cyber Asset is inappropriate because this approach would not require distinct vulnerability assessments, patching, baseline configuration etc.

    This shared relationship means that neither can be separate Cyber Assets.

    The assumption in this statement doesn’t work in both directions. Presuming that a Guest could not be a separate (meaning independent) Cyber Asset, does not preclude a Hypervisor from being an independent Cyber Asset. After all the Hypervisor is a complete hardware, operating system, (and potentially software) stack in itself. Depending on whether it is a Type I or Type II, it might even be the same operating system as the Guests with Hypervisor function software merely installed on top of a generic operating system. It exists with a hostname, an address, a particular set of open ports/APIs, network connection, and responds to network traffic. The Hypervisor can exist and operate without a single Guest inside.

    For example, if a VM has been identified as a BES Cyber Asset (BCA); the hypervisor that runs the VM is also a BCA; which also applies to PACS, EACMS, and PCA’s

    This may be true; it doesn’t negate the necessity of treating the Host and Guest as distinct Cyber Assets (for multiple reasons). At most it makes the entire assemblage a BCS.

    Treating the VM and hypervisor as separate Cyber Assets can cause mixed-trust virtual environments; the hypervisor runs CIP and corporate VM’s.

    This is known as argumentum ad consequentiam (appeal to consequences) and is a known logical fallacy. Whether or not shared infrastructure is allowed and whatever the consequences of doing so, it does not change the distinct character of the Guest vs the Host. It also presumes that sharing infrastructure between CIP-applicable systems and “Corporate” VMs is impossible to secure and must be prohibited. This remains to be proven.

    CIP controls are only being applied to the CIP VM and not the hypervisor; even though the hypervisor “if rendered unavailable, degraded, or misused” can impact the CIP and corporate VM’s.

    Again, this is an unproven assumption. Treating the Hypervisor and the Guest as separate Cyber Assets does not require a difference in controls. If both are BES Cyber Assets then the same requirements apply to both. Additionally, the technical configuration controls applied to a Hypervisor are generally different than those applied to a Guest (Mal-ware, for example). Again, The Host and Guest may run different operating systems with different open ports, APIs, software vulnerabilities, and baseline capabilities. 
    The question is whether a BCA Hypervisor can host a PCA or non-CIP Applicable System safely. The advisability of this approach depends upon whether or not sufficient security controls can be put in place to render negligible the risk of unavailability, degradation, or misuse of the BCA Hypervisor and associated BCA Guests. Risk assessment and acceptance are highly subjective and specific questions that depend more upon the overall architecture and defense in depth posture, than upon a simplistic question of "to virtualize, or not to virtualize."


    (1)Control plane functions are not directly accessible by users of the system, they are embedded in the logic of the code base, and are generally require modification to the code base to change them. Virtualizing a server involves abstracting the hardware interactions much like the Hardware Abstraction Layer (HAL) does in Windows. The difference being that the Guest operating system simply sends its hardware access requests to the Hypervisor rather than to firmware. This has a three-fold security benefit. 


    1. The users and software accessing the Guest in the Data plane cannot substitute malware in place of authorized device drivers. Since device drivers are one of the worst vectors for malware after Phishng attacks, this narrows the attack surface of the Guest OS.
    2. The Hypervisor can be a different operating system than the guest, which means attacks in the data plane against the guest firmware will be ineffectual against the Guest due to no device drivers present and no direct access to hardware, and will be ineffective against the Hypervisor because there is no Data plane access between the two, and the device drivers that actually do interact with hardware are for a different operating system than what is visible to the attacker.
    3. Drivers that actually interact with the hardware are generally a smaller subset of better-vetted drivers approved by the Hypervisor vendor, as long as you make the smart choice and go with a Type I Bare Metal Hypervisor.

    Tuesday, April 11, 2017

    Electronic Security Perimeter

    Over at his blog, Tom Alrich commented to me:


    "objectives-based requirements are the only way not to have the situation you discussed (drawing from Lew Folkerth's recent article in RF's Newsletter)- where an entity can be doing great things for cyber security that the auditors can't even consider because they're not in their "zone of authority" (in this case, the ESP). 

    Objectives-based requirements would have to be built on top of a framework of concepts that goes beyond just BCS and ESP's, to include all of the entity's computing infrastructure (including "IT" networks)."


    I definitely agree that the ESP construct is inherently limiting. For one thing, it is established by arbitrarily drawing a line, identifying where the access point (with controls applied) is, and it only exists at OSI Model Layer 3: routable protocols. Lest we forget, not all security takes place at Layer 3, it's simply the easiest place to measure.


    Right now, guidance from NERC and the regions tells us that we cannot use a Layer 2 switch with some ports inside the ESP and some outside. Some would argue you can't use any switch, even a Layer 3 or 4 switch this way. However, there is no requirements language that says this is the rule. It's not a clearly established and absolute security best practice. The reason there can't be any NERC CIP requirements around this is because the ESP is defined strictly at Layer 3, and it was done that way on purpose to exclude Layer 2 connections from needing ACLs (access control lists) and firewalls. Here's the relevant reasoning from CIP-005  Guidelines & Technical Basis: 



    "This requirement applies only to communications for which access lists and ‘deny by default’ type requirements can be universally applied, which today are those that employ routable protocols. Direct serial, non-routable connections are not included as there is no perimeter or firewall type security that should be universally mandated across all entities and all serial communication situations. There is no firewall or perimeter capability for an RS232 cable run between two Cyber Assets. Without a clear ‘perimeter type’ security control that can be applied in practically every circumstance, such a requirement would mostly generate technical feasibility exceptions (“TFEs”) rather than increased security."


    It's not easy to measure security being applied to layer 2, so we define this type of connectivity out of relevance. Good security would be to apply controls upstream to those devices which can participate, but the device-centric focus of early CIP standards language leads people away from thinking in these terms. Security doesn't have to be "perimeter" based, and generally in a modern layered approach security functions are distributed across multiple devices/systems within your network. They're not all "on the perimeter" so the current NERC CIP standards apply poorly if at all to this strategy. We need some changes to the dominant paradigm. At the same time, we don't need auditors crawling into every ancillary system in our IT department.


    Some concepts that add to the discussion:


    Accessing a device means the ability to view/modify it's configuration settings. This can be because the device interacts only in the data plane and doesn't have a separate management plane, i.e. a Windows Server. It could be access to a dedicated management port.


    Traffic transiting a device means that the packets are forwarded via that device. It doesn't necessarily mean that the user originating the traffic has the ability to view or modify the device in question, in most cases they may not even be aware that the device exists in the traffic stream. Packets crossing a router, switch, or firewall in the data plane would be examples.


    Management plane is the logical function which allows a user to interact with the configuration settings of a system. It includes any dedicated management ports, the user interface (terminal emulation of graphical users interface), and the network connectivity to these if accessing remotely (really remote access, which means from anywhere not directly connected, not NERC CIP remote access which only means from outside the ESP).



    Control Plane is the logic embedded in the code base of the system. Admins can modify the control plane logic, but usually only by replacing the code base (upgrades, patching, etc.)

    Data plane is just packet switching. traffic is generated by users or devices, and sent to a destination IP address. It is switched, routed, forwarded to devices inline along the way. This traffic has no access to the Control Plane or Management Plane on a properly configured network appliance (including hypervisors).

    Monday, April 10, 2017

    Determining Metrics & Requirements

    On Thursday, I wrote a bit about defining BES Cyber Assets, in response to some older discussion on the WICF Forum that is still an undecided topic of discussion today.

    It's funny how often the definition of BES Cyber Asset has to be discussed. Tom touches on it again here in talking about the difficulty of an auditor determining whether something is a BCA based upon legal semantics. The BES CA definition is:

    "A Cyber Asset that if rendered unavailable, degraded, or misused would, within 15 minutes of its required operation, misoperation, or non-operation, adversely impact one or more Facilities, systems, or equipment, which, if destroyed, degraded, or otherwise rendered unavailable when needed, would affect the reliable operation of the Bulk Electric System. Redundancy of affected Facilities, systems, and equipment shall not be  considered when determining adverse impact. Each BES Cyber Asset is included in one or more BES Cyber Systems."

    In NERC CIP Standards, definitions are often contrary to plain language or commonly-understood terminology, and all-too-often include definition by exclusion, even when it tortures reading comprehension. In the definition above, they are intending to close a loophole that one might exploit to get out of compliance scope. And to the point that an identically-configured system might be just as vulnerable to a flaw or cyber attack, it's a valid point. But there are many scenarios where redundancy alone makes the idea of a functional impact into a very remote possibility.

    The same article from Tom crosses over to touch upon the Streetlight Effect I wrote about here and here, and is particularly apt with the tagline I ended the second article with: "When your auditor insists upon a simple and clean way to measure compliance at the device level, they may be doing us all a disservice."

    It boils down to a common management problem. I can't manage what I can't measure, because to determine the effectiveness of processes I must determine whether they change an outcome and whether that change is positive or negative. So determining what to measure and how it is related to a specific process is necessary, but the problem is that this is not always easy. Some measurements are meaningless. They either don't relate to the process, or the relate to the wrong (or misunderstood) process and therefore indicate results unrelated to the change I'm trying to make. If I don't choose the right test points and interpret the test results correctly, I am not gaining anything other than a thin cover story for why it's not my fault when things go drastically wrong. We have lots of people measuring lots of things and producing tons of documents to prove that they're testing them.

    But are we looking at the right things or just the easy ones?

    Friday, April 7, 2017

    5 Things About the Electric Industry

    Amy Thomas, Government Relations Director, American Public Power Association blogged "Five Things You May Not Know about Cybersecurity in Electricity" back on February 24th.


    The public may not know these things about the Electric Utility Industry, but insiders do and it's not ALL puppies and sunshine:
    1. The electric utility sector is the only critical infrastructure sector (besides nuclear power plants, which are a part of the overall sector) that has mandatory and enforceable standards in place for cybersecurity.
    {emphasis in original} 

    That's certainly interesting. It sort of begs for a couple questions to be asked. Why don't the other critical infrastructures have mandatory standards? Also, is the Electric sector now generally acknowledged to be more secure than these other sectors? (Not risk, but result.) If so, is this increased reliability directly attributable to these mandatory standards?
    1. The regularly updated, highly technical cybersecurity standards that govern electric utilities are drafted by the North American Energy Reliability Corporation, approved by the Federal Energy Regulatory Commission, and enforced by fines of up to $1 million per day per infraction.
    It's true the standards have been updated several times. Version 3 was in place from 2010 to 2016. I'm not sure that this meets a practical definition of “regular updates” given the pace of change in information technology.

    3. The process for crafting cybersecurity standards for utilities has provided, and continues to provide, a solid foundation for strengthening the industry’s security posture and allowed standards to evolve with constantly changing threats.

    Lest we forget, these standards are reliability standards. Cybersecurity (protection from attack) is an aspect of reliability. It's good that the process exists to update these standards, however slow and cumbersome it may be.

    4. Standards alone are not enough to protect the grid. That’s why the American Public Power Association and our member utilities have worked to develop close partnerships with others in the industry and the federal government. We share threat information to prepare for and respond to cyber attacks.

    Amen, preach it! Standards rely upon voluntary compliance (even in mandatory schemes, you can't possible enforce everything without a great deal of goodwill and honest effort from the participants in the system).Too, standards are always compromised by conflicting interests. They must be a low bar to be universally applicable, and must be universal to be legal (but your specific circumstances or mine may really need more than just a minimal approach).

    5. The Association recently signed a three-year cooperative agreement with the Department of Energy for up to $7.5 million to help public power utilities better understand and implement cybersecurity protections, resiliency, and advanced control concepts.

    Training is always a good idea. Even better is cross-training. The electric utility industry seems to have a “not-invented-here” mindset that dismisses lessons and parallels from outside of the niche industry. We also have a reluctance to modify the approach to cyber-physical “operations technology”. Vendors have slipped behind in providing security controls for these types of devices, and the time-sensitive nature of some functions makes it problematic to add more devices in front of them.

    To address the not-invented-here problem, NERC can work to incorporate NIST's work and that of other standards bodies into the standards drafting process. As the only mandatory, enforceable regulatory scheme in the Critical Sectors, it's tough to proceed without sometimes nailing down precise definitions and meanings, because lawsuits and immense fines may hinge upon a misplaced comma or an incautious throw-away phrase. However, “inside baseball” jargon should be avoided whenever possible. NERC needs to avoid adding terms to the NERC glossary which don't match commonly-understood usage in other IT security contexts. An IT Security professional who comes from outside the electric utility industry should be able to understand and apply CIP standards without an historical dissertation for context, or learning a new language of tortured and twisted definitions.

    I like that "resilience" (corrected for the terrible sin against grammar) is mentioned in the article, but oddly enough this term, which is ubiquitous in disaster and business continuity planning disciplines across the world, doesn't feature in NERC CIP standards. FERC's approach to risk management doesn't really address that, at least not in NERC CIP Cybersecurity standards. 

    Indeed, as I've had pointed out to me multiple times by NERC staff, FERC's position is that acceptance of risk is not allowable at the entity level (FERC Order 706, Section 139, Para. 154). However, an integral part of widely accepted resilience strategies is to acknowledge residual risks and plan for their possible occurrence with graceful failure modes and scope limitations as a risk mitigation tactic. It doesn't seem practical to get the ERO or Regional Entities to approve residual risk- there's no incentive for them to do so. They bear none of the costs of compliance and (impossible to achieve) risk elimination, but could be (almost certainly would be) blamed for a security breach that appears to be related to an explicitly accepted risk.

    Things that make you go "Hmmm... "

    Thursday, April 6, 2017

    Trial Balloon

    Open up the current CIP-005, and go to page 15 (towards the bottom) to see the original, then compare it side by side to this (unofficial, non-authoritative, completely speculative, proposed, draft) language... What do you think?

    *********************************************************************************

    Requirement R1 requires isolation of BES Cyber Systems from other systems of differing trust levels by requiring Boundary Protection and controlled Network Ports, Protocols, and Services via identified Electronic Access Points between the applicable BCS and Non-CIP Cyber Systems. Electronic Security Perimeters are also used to identify a defense boundary for some BES Cyber Systems that may not inherently have sufficient cyber security functionality, such as devices that lack authentication capability.


    All applicable BES Cyber Systems that are connected to a network must reside in a defined Electronic Security Zone (ESZ). Even standalone networks that have no external connectivity to other networks must have a defined ESZ. The ESP is a demarcation of the security zone containing the BES Cyber System, and it also provides clarity for entities to determine what systems or Cyber Assets are in scope and what requirements they must meet. The ESP is used in:

    • Defining the isolation boundary between CIP-applicable Cyber Assets and Non-CIP Cyber assets, including the location where certain controls are applied, i.e. Electronic Access Points (EAP).
    • Defining the scope of ‘Associated Protected Cyber Assets’ that must also meet certain CIP requirements.
    • Defining the boundary inside of which:
      • All of the Cyber Assets meet the requirements of the highest impact BES Cyber System that is in the zone (the ‘high water mark’) –or-
      • Cyber Assets reside in security zones characterized by specific sets of controls applied to a class or classes of BCS according to risk or impact criteria


    The CIP Cyber Security Standards do not require network segmentation of BES Cyber Systems by impact classification. Many different impact classifications may be mixed within an ESP. However, all of the Cyber Assets and BES Cyber Systems within the ESP must either be protected at the level of the highest impact BES Cyber System present in the ESP (i.e., the “high water mark”) where the term “Protected Cyber Assets” is used, or grouped into security zones with discrete security controls applied to each zone.

    -The CIP Cyber Security Standards accomplish the high water mark by associating all other Cyber Assets within the ESP, even other BES Cyber Systems of lesser impact, as “Protected Cyber Assets” of the highest impact system in the ESP. For example, if an ESP contains both a high impact BES Cyber System and a low impact BES Cyber System, each Cyber Asset of the low impact BES Cyber System is an “Associated Protected Cyber Asset” of the high impact BES Cyber System and must meet all requirements with that designation in the applicability columns of the requirement tables.
    -The alternative to the high water mark approach is to categorize BCS and Associated PCA into security zones according to their risk or impact and apply those controls which are required for the impact rating to the zone in which they reside. Security zones are isolated from other zones of differing risk or impact rating by controlling traffic, allowing only that which is explicitly identified as necessary.

    If there is external routable connectivity to any CIP-applicable Cyber Asset, then an Electronic Access Point (EAP) must be identified where inbound and outbound access controls are applied to traffic traversing the ESP. Responsible Entities should know what traffic needs to cross an EAP and document those reasons to ensure the EAPs limit the traffic to only those known communication needs. These include, but are not limited to, communications needed for normal operations, emergency operations, support, maintenance, and troubleshooting.

    The control strategy implemented at the EAP should apply to both inbound and outbound traffic. The standard added outbound traffic control, as it is a prime indicator of compromise and a first level of defense against zero day vulnerability-based attacks. If Cyber Assets within the ESZ become compromised and attempt to communicate to unknown hosts outside the ESP (usually ‘command and control’ hosts on the Internet, or compromised ‘jump hosts’ within the Responsible Entity’s other networks acting as intermediaries), the EAPs should function as a first level of defense in stopping the exploit. This does not limit the Responsible Entity from controlling outbound traffic at the level of granularity that it deems appropriate and large ranges of internal addresses may be allowed.

    The SDT’s intent is that the Responsible Entity knows what other Cyber Assets or ranges of addresses a BES Cyber System needs to communicate with and limits the communications to that known range. For example, most BES Cyber Systems within a Responsible Entity should not have the ability to communicate through an EAP to any network address in the world, but should probably be at least limited to the address space of the Responsible Entity, and preferably to individual subnet ranges or individual hosts within the Responsible Entity’s address space.

    The SDT’s intent is not for Responsible Entities to document the inner workings of stateful firewalls, where connections initiated in one direction are allowed a return path. The intent is to know and document what systems can talk to what other systems or ranges of systems on the other side of the EAP, such that rogue connections can be detected and blocked.
    This requirement applies only to communications for which access lists and ‘deny by default’ type requirements can be universally applied, which today are those that employ routable protocols. Direct serial, non-routable connections are not included as there is no perimeter or firewall type security that should be universally mandated across all entities and all serial communication situations. There is no firewall or perimeter capability for a serial cable run between two Cyber Assetsand such a requirement would mostly generate technical feasibility exceptions (“TFEs”) rather than increased security. However, the technical security control does not need to be applied at the device level. The security control can be applied at the system level.

    For example, a relay may be controlled via a serial connection (RS-232, etc.) from a device with an Ethernet port. This device, generally a terminal server or computer, may have multiple serial connections, a console port, and one or more Ethernet connections capable of interacting with a remote access client via a routable protocol. The terminal server generally has the capability to act as an AAA client, requesting authentication, authorization, and providing or participating in logging for Accounting purposes with a network based service or combination of services such as RADIUS, TACACS+ or LDAP directory services. This provides the opportunity for enhanced security such as multi-factor authentication which is typically not natively available in relays or terminal servers.

    The security objective of providing controlled access and isolation is achieved by controlling the external addressability of the serial device rather than placing a security mechanism between the serial ports of the device and its immediate upstream serial controller. An IP-serial converter that has an Ethernet port outside and serial connection(s) inside is externally addressable, and on a practical level passes through that external addressability to the device receiving the serial connection. The control should be applied to the security zone where the Ethernet connection resides, upstream of the relay.

    As for dial-up connectivity, the Standard Drafting Team’s intent of this requirement is to prevent situations where phone number alone can establish direct connectivity to the BES Cyber Asset.  If a dial-up modem is implemented in such a way that it simply answers the phone and connects the line to the BES Cyber Asset with no authentication of the calling party, it is a vulnerability to the BES Cyber System. The requirement calls for some form of authentication
    of the calling party before completing the connection to the BES Cyber System. Some examples of acceptable methods include dial-back modems, modems that must be remotely enabled or powered up, and modems that are only powered on by onsite personnel when needed along with policy that states they are disabled after use. If the dial-up connectivity is used for Interactive Remote Access, then Requirement R2 also applies.

    The standard adds a requirement to detect malicious communications for Control Centers. This is in response to FERC Order No. 706, Paragraphs 496-503, where ESPs are required to have two distinct security measures such that the BES Cyber Systems do not lose all perimeter protection if one measure fails or is misconfigured. The Order makes clear that this is not simply redundancy of firewalls, thus the SDT has decided to add the security measure of malicious traffic inspection as a requirement for these ESPs. Technologies meeting this requirement include Intrusion Detection or Intrusion Prevention Systems (IDS/IPS) or other forms of deep packet inspection. These technologies go beyond source/destination/port rule sets and thus provide another distinct security measure at the ESP.