30 March 2017

Industrial control systems: The holy grail of cyberwar

Joe Weiss

MARCH 24, 2017 —Industrial control systems (ICSs) are critical to the operation of a modern society. ICSs were designed to be reliable and safe, rather than cybersecure, and to ensure safe operations within specific known engineered states.

These systems carefully manage transitions to control risk between operational states that are defined to protect against random occurring failures of a component or a few components. However, focused cyberattacks such as Stuxnet or Aurora that can push a system into known dangerous states are not commonly expected in the normal operation of ICSs. This essay identifies a number of very critical issues that threat analysts, policymakers, and critical infrastructure protection personnel need to understand and address. That is, how cyber compromise of ICSs or physical system design features can cause physical damage to hardware and/or physical processes.

Hackers view exploits that can damage physical processes as the holy grail of cyberwarfare. Any device that can cause catastrophic damage through remote operation of cybercomponents could be a target for compromise. The more high risk components that can be compromised in an ICS, the greater the risk to the operator and value to the attacker.

ICSs are not designed to ensure resilience against concerted attacks that intend to place components in dangerous operating states. As ICS systems/components were designed prior to consideration of cyberthreats, securing these systems will be a growing area of cyberwarfare and engineering research.

Cyberincidents have been defined by the Presidential Decision Directive 41 issued in June 2016 to be electronic communications between systems that can impact confidentiality, integrity, and availability. As of July 2016, I have amassed a database of more than 800 actual ICS cyberincidents. Most of the incidents were not malicious and most were not identified as being cyber.

Malicious cyberattacks against ICSs have been relatively rare. One of the first known cases of a cyberattack against ICSs in the critical infrastructure was the cyberattack against the Maroochy wastewater system in Australia in 2000. This attack was by a disgruntled employee, not a nation-state. It demonstrated several key points that were used in later nation-state attacks. It was an attack directly against the control systems not the IT network. It was also done by a knowledgeable insider." Finally, the attack was not immediately evident as being a cyberattack. Stuxnet and Aurora utilized these attributes.

Arguably the most famous ICS cyberattack was Stuxnet. Stuxnex was a sophisticated, nation-state cyberattack targeting the control systems in industrial infrastructure. Stuxnet bypassed the engineered protective components (control and safety systems) to execute unauthorized commands compromising the integrity of the system into a dangerous operational state. Stuxnet was not malware in the normal sense and therefore would not have been detected by IT defenses. The Stuxnet code consisted of controller-generic software, software specific to Siemens, and software specific to the target centrifuges. Consequently, the underlying infrastructure of Stuxnet can be applied to any industrial process compromising any ICS vendor. The attack damaged more than 1,000 centrifuges in Iran’s Natanz nuclear fuel enrichment plant. It was ongoing for more than a year with the attack being masked before it was identified as being cyber-related. That is, it appeared to be “normal” mechanical failures of the centrifuges not a cyberattack.

The electric engineering community has known for more than 100 years that connecting Alternating Current (AC) equipment out-of-phase with the electric grid can cause damage to the equipment. The Aurora vulnerability is the name for a class of electric system power line attacks that manipulate physical forces to do damage through manipulation of substation protective relays. It is also not malware and therefore would not be detected by IT defenses.

Until the March 2007 test at the Idaho National Laboratory (INL) named Aurora," most people in industry felt that the out-of-phase condition could only be caused by accident not by malicious attacks. The INL Aurora test was an intentional cyberattack that destroyed a large diesel generator. The Aurora test was not a traditional hack, but a demonstration that cyberconditions could lead to physical equipment damage. In the case of the Aurora demonstration, the relays were opened and closed using cybermeans to exploit the physical gap in protection of the electric grid.

The Aurora vulnerability occurs when the substation protective relay devices (think of your fuses in your home circuit box) are opened and then reclosed out-of-phase (that is, the sine waves do not line-up) with the electric system. This out-of-phase condition results in physical damage to any Alternating Current (AC) rotating equipment such as generators and induction motors and potentially to transformers connected to the substation.

Because of concern about the damage that could be caused by this event, Aurora was initially classified by the Department of Homeland Security (DHS) as being “For Official Use Only.” With the exception of the CNN tape, the Aurora information was classified until July 2014. At that time, DHS mishandled a freedom of information request on Google Aurora (a different event but using the same “Aurora” name) declassifying more than 800 pages of the INL Aurora test. The mistake was important because the only way to prevent an Aurora attack is with specific hardware mitigation devices. However, the North American electric utilities have been very slow to employ the appropriate hardware protection making the DHS disclosure even more disconcerting. Recent studies have demonstrated that many protective relays can be hacked leading to potential Aurora or other significant grid disturbances.

In December 2015, the Ukrainian electric grid was cyberattacked and more than 230,000 customers lost power. The power outage was caused by remotely opening the protective relays – step 1 of Aurora. For reasons only the attackers can provide, the attackers chose not to reclose the relays (as in the Aurora case) which could have caused significant long-term damage to Ukraine’s electric grid and other critical infrastructures. The operation grid was restored within hours because the Ukrainian operators were still used to operating the grid in a manual manner. I don’t believe the same can be said of the North American electric system.

What's unique about targeted ICS attacks?

The cyberattacker looks at the facility and its ICSs in a holistic way, identifying physical vulnerabilities of the controllers and the process and ways to exploit such vulnerabilities by digital manipulations. There are very few people with the expertise to understand the physical process being controlled, the control system domain with its unique design features, and the exploitation of IT vulnerabilities.

Traditional IT cyberattacks focus on Windows boxes using zero-day vulnerabilities (previously unknown vulnerabilities) or other IT flaws to capture valuable data (data breach) or cause a denial of service (loss of data). Targeted ICS attacks can be built on top of IT attacks but take aim at the physical process. Targeted ICS attacks such as Stuxnet and Aurora exploit the legitimate product or system design features. Additionally, IT is focused on Advanced Persistent Threats (APT) and traditional insider threats. Threats such as Stuxnet and Aurora are persistent design vulnerabilities that exploit features that are inherent in the design of the ICS and systems and cannot be corrected by installing a security patch.

Unfortunately, many ICS devices, including new ones, are still insecure by design and many legacy ICSs cannot implement IT security technologies. Yet the devices won’t be replaced because they still work. The culture gap that exists between the IT organization and the control system organizations exacerbate the physical threats in attempting to secure ICSs. I believe a major part of Stuxnet’s success was it was arguably the only instance where IT, control system, and physical security teams tightly coordinated to make the attack successful. 

It is an unfortunate fact that this coordination still doesn’t happen (with very rare exceptions) when trying to protect ICSs.

The cascading effect of a compromised ICS

ICS are generally a system of systems. Consequently, the effect of a compromised ICS due to cyberattack or intrusion is not limited to the zone of equipment that the ICS is responsible to operate. ICSs and logic are designed to coordinate with other ICSs in the overall system operation. This coordination is necessary in order to insure that the ICSs react to the faults that occur in their zone. However, cases occur where other ICSs affect not only their zone of equipment when they operate but other zones of equipment controlled by other ICSs causing cascading effects.

Another aspect of ICSs is their connection to communication equipment that sends and receives information and commands to operate the ICSs. These communication devices are a part of a command operation system known as SCADA (Supervisory Control and Data Acquisition). SCADA equipment usually resides at an operation center where system operators monitor the ICSs and operate them when system conditions warrant it. While the majority of the equipment that comprise a SCADA system resides in the control center network behind firewalls, localized SCADA communication equipment directly connected to the ICSs can be as vulnerable as the ICSs themselves. A digital attack or intrusion on these localized communication systems would have a greater effect on the overall system and allow the attacker access to all ICSs connected to them. This condition would allow the attacker the ability to operate all ICSs causing a broader and more far reaching effect on system operations.

Modern industrial systems operate with standard ICSs from a few vendors (roughly half internationally-based and half US-based) with similar architectures, similar training, and often even the same default passwords. This has implications that are much more important than the increasing network connectivity that is often identified as the biggest ICS security problem. ICS cybersecurity processes/practices are either not available, inadequate or just not being followed. Additionally, the control system designs generally lack the cybersecurity requirements and engineering (hardware and software) to be able to protect against the many failure modes related to attacks by hackers, unintentional failures due to increased complexity.

Although Stuxnet was only designed to attack certain systems, it is the design approach Stuxnet used that is novel and long lasting. As mentioned previously, much of the Stuxnet approach is generic and can be applied to any ICS from any manufacturer against any process. 

As the same ICSs are used across multiple industries, it means that a compromise of the ICS features in systems in one facility or industry can affect all facilities or industries that utilize those systems and devices. More often than not, physical vulnerabilities for a production process and plant configuration have been known for a long time. 

The BlackEnergy malware compromised the human-machine interface (HMI) of several major ICS vendors allowing root access to HMIs used in multiple industries worldwide. BlackEnergy was used in the 2015 Ukrainian hack and has infected many US electric grids since late 2014. Given that all of the cyberattack mechanisms used in the Ukraine can be used against the US electric grid, it is unclear why the Department of Energy (DOE), DHS, and the North American Electric Reliability Corporation (NERC) have chosen to explicitly play down this threat to the US electric grid.

To target specific damage or destruction, it is necessary to understand the process. I gave a presentation at a major petrochemical organization and asked their ICS experts what they could do if they wanted to create damage. The looks on their faces ranged from blank stares, to looks of horror, to snide grins as this question was not what they were taught to consider. From the attacker’s point of view, exploiting features rather than bugs has a significant advantage as they can’t be expeditiously fixed by a vendor releasing a patch, and having the end-users implement it. Instead, the attacker can be confident that those vulnerabilities (i.e., design features) have been in place for years, even after successful exploits are out in the wild. 

To create significant physical damage, it generally takes compromising both the ICS that optimizes the process and the safety systems which are used to prevent damage to equipment and people. However, these features were not designed to withstand a cyberattack. The Stuxnet attack bypassed the automated safety systems and prevented the manual safety systems from being initiated. Aurora uses the safety systems to produce the targeted attack. ICSs with some protection against cyberattacks are just being released, but the installation and upgrade cycle will be very long. Not only that, but it obviously doesn’t guarantee that the new cyberprotection will address novel new attack scenarios.

Critical infrastructure systems such as turbine and substation controls and their vulnerabilities are much more well-known than the centrifuges attacked by Stuxnet. Applying the “Stuxnet” approach to these systems would not be difficult as there is specific exploit software on the web available for many ICS vendors – metasploits - often for free.

One other point is that cascading failures of the electric grid were viewed as a worst case. An example of a cascading outage was the 2003 Northeast Blackout that lasted two to three days. Cyberthreats, however, provide the ability to both damage equipment and attack multiple locations leading to extended long-term outages with the need to replace or repair long-lead time equipment. Consequently, cyberattacks can cause long term outages without the traditional “need” for a cascading effect.

Is existing guidance adequate for targeted ICS threats?

While ICS security has discussed the insider threat for many years, insiders who unwittingly create cybervulnerabilities or deploy cyberweapons generally have not been addressed. Obviously, they play a much more important role than the very small subset of insiders that may have malicious intentions.  

Most ICS cyber security guidance and training is given to end-users and ICS vendors. There is very little guidance available to others such as system integrators. However, system integrators are often used to implement new designs and to upgrade older legacy designs. This becomes a very important issue with older legacy systems where the original vendor is no longer supporting its products and consequently is unaware of how its systems are being reconfigured. 

There have been many actual ICS cyberincidents that have occurred because of insiders creating or “exploiting” cybervulnerabilities without being aware of it. Unintentional cases include implementing network interfaces with other networks that were supposed to be isolated; connecting IT systems to ICSs that were not previously identified as being connected; implementing dial-up or wireless access to ICSs that were supposed to be isolated; connecting compromised laptops or USB devices (also known as thumb drives) to ICS networks. Three examples where insiders unknowingly created cybervulnerabilities resulting in significant impacts were the 2007 Plant Hatch nuclear plant shutdown, the 2008 Florida outage, and the 2010 San Bruno, Calif. natural gas pipeline rupture. Stuxnet itself is a case where the system integrator is thought to have unintentionally inserted the malicious malware.

What is necessary to prevent an ICS attack?

Protecting ICSs requires a combination of appropriate cybersecurity technologies, segmented architecture, and detailed understanding of the overall systems at any point in time. This includes knowing what hardware, software and firmware has actually been installed, if it has been changed, how it has been connected, and to what it is interconnected inside and outside the facility. As Aurora is a physical gap in protection, it requires hardware for mitigation. The hardware mitigation identifies the Aurora conditions and isolates the loads before they can be reconnected out-of-phase. Specific hardware solutions were developed to address Aurora and made available to industry along with guidance from NERC. However, there continues to be no requirement to install the requisite hardware mitigation. Unfortunately, there is so much misinformation about Aurora that nine years after the INL test very few utilities have implemented actual protection in the US electric industry and almost none have done so outside the US. Because of the July 2014 DHS disclosure of previously classified information, Aurora is now known in the hacker community but many utilities continue to ignore its hardware mitigation.

Could a major ICS cyberincident happen here? The answer is two-fold. Hardware mitigation is available to prevent Aurora. However, the utility industry, NERC, FERC, and the NRC have not required the hardware mitigation to be implemented. Consequently, the unfortunate answer is Aurora, and incidents like it, are very plausible. Aurora also introduces a different mindset. The traditional mindset is that all industries rely on the grid and therefore the grid must have a high level of cyberprotection. However, Aurora uses the electric substations as the vehicle for launching attacks against any generator, AC motor, or transformer connected to the substation. Consequently, it is the grid that can be the source of the attack. Depending on the equipment, the damage from an Aurora attack can take months to repair or replace assuming the equipment can be manufactured, transportation is available for delivery, and trained staff are available to install it.
What does ICS malware mean to multiple industries?

Many of the same control systems used in the electric industry are used in other industries. The vulnerabilities in general are the same especially if they are “design features” of the control systems. Consequently, both Stuxnet and Aurora affect a wide swath of industries and government entities besides the electric industry. 

Closely aligned to the electric utility is electrified rail, where operations depend on electric substations and AC equipment in the electrified locomotives. Diesel-electric locomotives, of course, have their own vulnerability to Aurora. In the refining industry, pumps and compressors are critical infrastructure and their failure can shut down the entire refinery for days or weeks. A catastrophic failure can be disastrous to the refinery. The burgeoning natural gas industry, which is providing some measure of energy independence to the US for the first time since the late 1960s, is vulnerable to Aurora because of the critical use of compressors. Pumps, motors and compressors in the water and wastewater utilities are vulnerable as well. Additionally, ICS honeypots (systems that look like real systems but are actually “test systems” used to identify who is attempting to attack them) are being attacked by Chinese, Russian, Iranian, North Korea, and other hacking groups. The chemicals, fine chemicals, food and pharmaceuticals industries are also vulnerable, to greater or lesser degrees. Far less well known is the very major user of rotating machinery, as well as building automation systems, boiler control systems, SCADA and distributed control systems. How effective would the Pentagon be if their critical infrastructure were compromised? 

How widespread are ICS cyberincidents?

ICS cyberincidents have impacted electric grids, power plants, nuclear plants, hydro facilities, pipelines, chemical plants, oil and gas facilities, manufacturing, and transportation around the world. Impacts have ranged from trivial to significant environmental releases, significant equipment damage, to widespread electric outages to injuries and deaths.

ICS-specific cyberthreats such as Stuxnet and Aurora, are real and different than the typical IT threats. The industry and regulatory cybersecurity approaches need to be expanded to address ICS-specific threats based on system design features not just IT vulnerabilities. While the risk is different depending on the industry and the application, the equipment is often the same. However, there does not appear to be sufficient industry understanding or desire to address these very critical issues. A significant effort must be made at all levels of management within government and industry before it is too late.

Control Systems Cybersecurity Expert, Joseph M. Weiss, is an international authority on cybersecurity, control systems and system security. Weiss weighs in on cybersecurity, science and technology, security emerging threats and more on his Unfettered Blog.

No comments: