“For those on the ramparts of the world’s sole superpower, the digital winds are blowing an icy chill through the triumphant glow of the post-Cold War,” reads the tagline for the article titled, “Farewell to arms,” by journalist and writer, John Carlin. The article that delves into the potentially devastating effects of cyber-warfare became the inspiration for the blockbuster Die Hard 4.0.
In the movie, the antagonist Thomas Gabriel launches a series of attacks known as a “Firesale.” Gabriel first sabotages a city’s transport system, arms anthrax alarms in federal buildings then takes over telecommunication networks, followed by financial systems, and finally disrupts the utility infrastructure like power grids & gas pipelines.
The movie depicts the mass scale of destruction that can happen to a country when critical infrastructure is attacked. Attacks like these have the potential to take a country’s economy back by decades due to financial and operational losses. Above all, such attacks on critical infrastructure might even result in the loss of human life.
The Cybersecurity Infrastructure & Security Agency of the US defines those sectors as critical whose assets, systems, and networks, whether physical or virtual, are considered so vital to the country that their incapacitation or destruction would have a debilitating effect on security, national economy, national public health or safety, or any combination thereof.
It is up to a country’s analysis and discretion to categorize or identify a particular sector/establishment as a critical infrastructure. But irrespective of the country and its definition, such infrastructure is at the core of its foundation, enabling it to function. Thus, any severe attack on the critical infrastructure can cause unprecedented devastation to any country.
With the advent of “Industry 4.0,” through automation and digitization, those plants, factories, and utility-systems that were earlier siloed and protected by closed systems, are now connected to an ever-expansive IT infrastructure.
The collection of software and hardware that works in cohesion to carry out an industrial task is called Operational Technology or OT (a term interchangeably used with Industrial Control Systems/Industrial Automation and Control Systems). Due to the convergence of IT-OT networks, inherent cybersecurity risks have become an intrinsic part of digitized factories and industries. This has increased the attack surface for threat actors. These industrial environments consist of complex systems which are not easy to breach. But this has not deterred ambitious attackers. They are constantly evolving and developing multi-stage attacks that could cause significant destruction. Additionally, unlike IT environments where cybersecurity practices are somewhat mature, OT environments lag in cybersecurity maturity by years, if not decades.
Being part of critical infrastructure, these OT environments are prized targets for countries or groups with malicious intent. The cost of conducting a conventional war in terms of people, money, and goodwill is far greater than carrying out a cyber-attack on the critical infrastructure of a country. Such attacks have the added advantage of nominal expenditure and covertness, thereby resulting in little or, in most cases, zero accountability.
Two days before Christmas, on 23rd December 2015, at around 1530 hours, seven 110-kV and twenty-three 35 kV substations went offline for three hours. Later, it was revealed that three
“oblenergos” (energy companies) were attacked, and 225,000 customers lost access to electricity on that cold, chilly night.
Investigations revealed a well-planned attack involving multiple attackers trained and well-equipped to conduct such a large-scale operation. The attackers followed the ICS cyber kill chain protocols and demonstrated a variety of cyber-capabilities. The actual intrusion happened three to four months before the outage. This indicated a high level of reconnaissance capabilities of the attackers and the weak adversary detection capabilities of the energy companies.
The techniques, tactics, and procedure (TTP) of the attack comprised the following,
- Attackers gained an initial foothold through spear-phishing on system administrators and other IT workforce of the power companies. They used macro-enabled word documents to deliver the “Black Energy 3 malware.” Subsequently, they initiated information gathering steps and began harvesting credentials to gain access to the ICS network.
- The affected companies did not have security and perimeter controls such as firewalls to segregate the IT and OT networks. The attackers harvested the VPN credentials used by employees to connect to the OT network remotely. This could have made their detection difficult after this point since the attackers were leveraging genuine and standard functionalities of the system.
- Attackers then reconfigured the backup power of the control centres. Before launching the final assault, they initiated a telephone denial of service (TDoS) attack on customer-care call centres to prevent any customers from reporting the outage.
- To thwart any recovery mechanisms, the attackers wrote malicious firmware of serial-to-ethernet converters (the devices used to relay commands from operators to substation control systems), rendering them useless and beyond recovery.
- Further, they used a malicious piece of code known as KillDisk to overwrite the master boot record (MBR) of the operator machines (HMI), which rendered them inoperable as well as disabling the “view function” of operators.
- Finally, after accessing the operator stations, at 1530 hours on 23rd December 2015, the attackers opened the circuit breakers of the power supply plunging over two hundred thousand people in darkness.
- One of the key highlights of the attack was that none of the malicious code deployed caused the outage. The circuit breakers were opened by accessing the operator stations, which were authorized to do so under regular operations. Black Energy3, malicious firmware for converters, Kill Disk, and TDoS were used only to ensure no disruptions in the operator stations’ access.
Industrial tasks such as producing electricity, mining, or refining crude oil are extremely hazardous. The risks posed by such activities can cause significant harm both to the workforce and the environment. Although organizations typically have qualified engineers, operators, alarms, and detection systems to minimize these risks, there is still scope for human error, as witnessed in multiple industrial disasters worldwide. The Bhopal Gas tragedy that killed more than 3000 people and affected over 200,000 is one such example.
To counter or prevent such unforeseen situations, safety systems are put in place, which in most cases act as the last line of defense.
One set of such safety nets consisting of hardware sensors, software, and logic solvers are called the Safety Instrumented Systems (SIS). Various elements of SIS have predefined thresholds, which, when crossed, triggers functions that either shut down the entire system or initiate operations to bring the metrics below the threshold levels.
In August 2017, attackers targeted the “Triconex Emergency Shutdown System (ESD),” a major SIS of a prominent Saudi Arabian petrochemical plant, with the goal of taking the ESD offline. The TRISIS/ TRITON malware deployed by attackers leveraged a zero-day exploit found in the firmware. According to Schneider Electric, the ESD’s OEM, the malware escalated privileges to completely take over the system.
Luckily, a flaw in the malware prevented it from causing any damage. Although the actual intention behind the malware could not be uncovered, there is no doubt that the attack, if materialized, could have caused exceptional damage to the plant in terms of safety incidents and production.
The investigation pointed towards nation-sponsored and well-funded cyber-criminals. The system they attacked had proprietary components such as the operating systems and protocols, which could not have been deciphered but for a complex and lengthy reverse engineering process.
This incident can be compared to a criminal removing all fire extinguishers from a building just before starting a fire.
The learnings from this attack were significant, as this was the first known attack on an SIS.
Difference between IT and OT Security
Before exploring the vulnerabilities present in OT systems that enable such attacks, it is important to understand some of the key differences between IT & OT establishments.
|Priority||Confidentiality, Integrity, and Availability depending on the system under consideration||Safety and availability of people, environment and equipment take precedence. Then integrity of systems and finally confidentiality of network drawings and other details.|
|Patch Management||Easily defined -Enterprise-wide, remote, and automated.||Patches require exhaustive testing and qualifications before pushing them into ICS/OT network.|
|Incident Response and Forensics||Easily developed and deployed. Few regulatory requirements. Embedded within technology.||Uncommon beyond system resumption activities. No forensics beyond event re-creation.|
|Physical and Environmental Security||Poor (office systems) to excellent (critical operations systems)||Good to Excellent (operations centers; guards, gates, and guns).|
|Security Skills/ Awareness||Usually, good.||Usually, poor.|
|Consequence of disruption||Financial and reputational losses,||Loss of life, effect on large scale public (e.g., disruption of electric grid distribution), financial losses, and regulatory implications.|
The differences mentioned above highlight the security challenges in OT. The security implementations are too risky or difficult to carry out, which leaves OT networks exposed and/or ill-managed.
Different layers of OT have different sets of security challenges
Most of the OT networks around the world are designed as per the “Purdue Enterprise Reference Architecture (PERA),” which is also referred to as the “Purdue Model,” named after its developers, the Purdue University.
The model defines different layers of an OT/ICS network and details their segregation. This helps in achieving an air-gapped network that helps in insulating the layers from each other.
Layers 0 and 1 form the control network, consisting of sensors and actuators. Layers 2 and 3 form the plant network. The convergence of IT with OT happens after layer 3. Hence layers 4 and 5 are usually part of the IT network.
Each layer has its ingredients, such as in layer 2, you will find the HMI (Human Machine Interface) or engineering stations with (primarily) Windows-based desktops due to dependencies. Similarly, these layers are connected with each other through network devices such as switches or protocol converters. Therefore, each layer will have to deal with its unique spectrum of both physical and logical security challenges.
We can bifurcate the weaknesses of OT/ICS networks into two umbrella-categories, namely, “technological” and “procedural.”
It starts with the design of network architecture. Few OT networks are exposed to other networks improperly without strong perimeter controls or with weak segregations to increase convergence. Often, the remote access provided to OEMs for troubleshooting and maintenance is poorly configured. Such misconfigurations may allow malicious entities to intrude OT networks.
For instance, in Feb 2021, a hacker remotely accessed the water treatment plant of Oldsmar, a Florida city, and briefly changed the levels of lye in the drinking water. Thankfully, an operator spotted the breach and immediately reduced the level of the chemical to normal, saving 15,000 residents from water poisoning-related complications.
Another major vulnerability is the lack of hardening of infrastructure. Often the critical elements of an OT network such as HMIs/operator stations, engineering stations, switches, etc., are not hardened. Also, many of these elements use the default, vendor-provided configurations. This is a significant leverage for attackers as the details of these default configurations are available in the public domain. Often, systems and network devices are not patched or updated frequently due to operational reasons. It is normal for these systems and network devices to not be updated for years. OT systems also use weak clear-text protocols or protocols with security issues.
Even if security controls are employed on OT systems, they are weak in nature. For example, the anti-viruses used are most often outdated.
Even from a process point of view, the guidelines and policies of OT networks are weak. Usually, the procedures and policies related to operations and maintenance of a plant are well-defined, but those pertaining to information security are not. Following are few processes which are extremely critical for the security of OT networks but are often ignored.
- Asset Management (Procurement, Implementation & Decommissioning)
- Patch Management
- Remote Access management
- Configuration Management
- Back up & Restore
- Incident Management
- Access Control Management
How can we prevent such attacks?
As attacks are increasing on such establishments, so are the products and market of OT security. Lately, OEMs who are providers and integrators of OT/ICS plants and systems are imbibing the security by design concept. There are specific standards, frameworks, and agencies guiding this transition from plane-jane unsecured, flat-networked OTs to a more secure arena.
The concerned system owners, integrators, and even assessors should adopt the “defence-in-depth” paradigm.
Defence-in-depth allows us to implement controls at each layer for various devices and introduce a deterrence factor for attackers. It also helps in robust incident response and monitoring. Furthermore, there are vital tasks that are elementary but, if implemented and monitored correctly, provide strong measures against attacks.
- Secure Network Design: Ensure that the segregation is proper between different layers and networks. Redundancy of devices and network paths will prove fruitful in case of failure. Also, ensure that there is a provision to introduce monitoring technologies and industrial DMZ.
- Device Hardening: Ensure that all manageable/ configurable devices such as desktops, servers, switches, RTUs, etc., are hardened wherein unnecessary/insecure/default services and configurations are appropriately managed.
- Device Update & Patches: Ensure that the devices in all layers are up to date. If they are not compatible with the OT applications/functions, ensure that controls are implemented to mitigate risks originating from outdated operating systems and applications.
- Policies and Procedures: Extend your current IT security policies and procedures to cater to the complex requirements of OT networks. Follow the mature, industry-specific standards/frameworks such as NERC-CIP for power establishments, IEC 62443 for manufacturing units, C2M2-ONG for oil/gas, etc. For some establishments, there could be regulatory requirements. For those who are not under such obligation, the conditions can be adapted to their needs or refer to a universal maturity framework like NIST CSF.
- Monitoring & Visibility: Ensure that mechanisms are in place to provide visibility into the OT network about devices connected, operations, configurational modifications carried, etc.
- Incident Response & Recovery: Implement mechanisms that facilitate forensics activities, incident handling. Ensure that there is enough visibility and detection capability to carry out such processes with ease.