General guidelines and realities of managing a cybersecurity program for critical national infrastructure
By Juan Vargas, Cybersecurity and Engineering Consultant, Artech, LLC
What’s the reality of managing a cybersecurity program for critical national infrastructure? Twenty years ago, we had no idea. Companies didn’t have to get serious about protecting infrastructure until the North American Electric Reliability Corporation (NERC), in the wake of the attacks on 9/11, forced power companies into mandatory compliance with its Critical Infrastructure Protection (CIP) standards. Or an early version of them. But that change effectively created an entire ecosystem of products and services for the world of Operational Technology (OT) we didn’t we needed.
While the definition of critical infrastructure may change in the future- it’s been circulating in the news that the United States may expand the definition to include water plants- my background is where it all started- in power generation. Over many years I’ve witnessed many organizational iterations to keep up with the ever-changing nature of regulation. And it is only fair for new people to have a proper introduction to what has worked and what hasn’t.
A common misconception about managing an OT cybersecurity program is that it is mostly about choosing the right software. Or the newest software. Or the most powerful software. While the software tools have gotten significantly better, cheaper, and more effective, the biggest challenge has been managing and executing highly advanced programs with the existing talent pool. A workforce that we didn’t train to use or understand IT software and executives that don’t see the return on investment of these types of initiatives.
Painful as it has been, the NERC CIP standards have been widely successful in their goal to help protect critical infrastructure. The subject of how safe power plants are may be up for a deeper analysis in the future, but at least it is safer than before and gets better with every iteration. It has been so successful that it has become an international reference for others to follow. Canada and Mexico have adopted it, as well as several parts of Europe and many countries in Central and South America. It’s a great starting point for any nation seeking to improve its resilience and reliability. And given that NERC revises the standards frequently for improvements, the trend will likely continue in years to come. But how does this translate into actions for a program manager to avoid the growing pains?
Understanding the usual challenges of a program manager
Let’s start with what seems logical. A newly hired program manager gets support from management to roll out a program using a limited budget. The program manager knows he needs software to make it work, so he walks into a store with every possible cybersecurity software on the market. He thinks of the easiest possible solution to the problem. Can he buy cutting-edge software and ask his IT team to install and support it at the power plant? Sure. But that’s a big mistake. Who is going to support it? Does IT know how the control system works? Will the control system vendor support you when things break? As it turns out, support is the keyword here and is crucial for your program’s success because, otherwise, you are on your own. And soon after, you will have to become an expert in things you should not be an expert on. This approach rarely works in OT because it is very slow and costly. The program manager relies too heavily upon their ability to quickly become subject matter experts and get and retain top talent to create a customized program that works. The reality is that IT methods don’t translate well into the OT world, vendors won’t support your decisions, and your program will suffer greatly each time an employee leaves for a better job.
We have learned that the least expensive and most effective way to manage a cybersecurity program is by having a long-term relationship with key vendors and learning to develop three internal competencies that scale well for power plants. Those competencies follow three career paths in compliance, engineering, and operations.
Why working with vendors is important
To expand further, you are not looking for the lowest price when working with OT vendors. Instead, you are looking for a reputable cybersecurity strategy, guaranteed integration to your control system, and phenomenal customer support to get your teams the support they need. Having a long-term relationship with vendors will also help alleviate issues of talent attrition or training needs. Lastly, unlike IT vendors, OT ones have experience with power plant personnel and their operational realities.
The role of the compliance analyst
You meet your compliance needs with the help of compliance analysts. They are typically company employees, preferably people who are very comfortable with extracting and manipulating data from various sources. You also want them to be good at coding. And if someone has to know the NERC requirements is them. They may go to the plant for a few days now and then, but the bulk of their work is back at the office or perhaps at home. They aim to avoid NERC fines by generating evidence that the power plants comply with CIP standards. And you will also use the compliance data to develop your Key Process Indicators (KPIs) for upper management, and you will also use it to inform your engineering team’s decisions when they do maintenance.
If your program is new and you are hiring an inexperienced (but technically sound) analyst, then the best strategy is to let them work at a single site first. Let them get acquainted with the compliance requirements and the tools available at the plant until they figure out a way to automate the extraction of this data and can do multiple sites simultaneously. Many times power companies have merged the roles of compliance analysts and engineers, but the results have not been great because often, the analyst and the engineers have conflicting interests. Conversely, a well-trained compliance analyst could easily oversee five or more plants as his methods improve. They can also train new hires, reducing the learning curve once the program is underway.
Engineers provide routine maintenance
Cybersecurity engineers are typically a rotating workforce traveling to different sites for maintenance. The right engineer will know computer systems very well and be confident troubleshooting for hours until they find a solution. Recruiting IT professionals have not yielded as good results as hiring former control engineers. Smaller companies that don’t see it economically viable to employ full-time engineers can outsource these roles to vendors. They install software patches, update antivirus definitions and various software packages, and troubleshoot common issues. A trained engineer can typically complete their tasks in about one week per site. Patching every plant once a month is too costly and resource-intensive for most companies, so most power plants tend to complete these tasks at a slower frequency, for example, once every three months. Under extraordinary circumstances, NERC allows for exceptions. But in general, it is not ideal to rely on exceptions as the risk of non-compliance is higher.
The goal of the engineers is to provide a working system for compliance analysts to extract their data from and to provide company employees with the tools to protect their systems. When a change is needed, the engineers are the people that know the software intimately to make the changes. However, engineers are not the end users of most tools they help maintain.
Who are the end-users?
Letting local employees at a power plant run the day-to-day cybersecurity operations can be a controversial decision. It is the norm in the IT world to have a dedicated (read “trusted”) team to handle computer security concerns and relieve employees from any responsibilities regarding configuring their computers. However, in the OT world, your operations team has to be co-responsible for information security because they will eventually need to enable or disable features to complete their work. Install new software. Work with vendors. In general, experience has shown that relying on external engineers results in security gaps, long waiting times, and a lack of oversight and accountability.
Here’s a thought experiment we can use to frame it in terms of what we already know. Most employees have no medical training, and it stands to reason that it would be dangerous for them to make medical decisions for themselves or others. However, it is a well-established practice to train employees to provide CPR and a series of first-response techniques to care for others while medical professionals get on their way. Similarly, we have to teach a subset of the power plant employees on first-response procedures to keep their systems safe because we have limited resources. The same reason we don’t have doctors sprinkled around the office. That is not to say that every employee has the same level of access. Operators, I&C Technicians, and DCS Engineers may all have different access levels. And some features, like access to the firewall configuration, may not be accessible to anyone at the site. You give access to people based on what they can protect.
What does the Program Manager do?
Finally, the program manager’s role is to understand the big picture. Allocate capital resources to keep the program running. Find the right talent- which is a tremendous challenge- and communicate to them what the team’s vision is so they can go out and do their jobs. Also, program managers will negotiate with vendors over time to compensate for temporary talent gaps and customize their software offering to reflect changing realities. Awareness of these realities leads to another very tough challenge for the program manager: to be realistic about what the cybersecurity program can accomplish for the organization.
That last idea is often left unexplored. As advanced as your cybersecurity program may be, a great manager understands that there are many moving parts that a sophisticated attacker could exploit in ways we can’t even imagine. For example, they know that they cannot have engineers deploy patches in real-time. There is a lag. And end-users can be sloppy from time to time. And no one in their organization may have the tools or expertise necessary to block or even detect a zero-day exploit. Hence it is vital to enable event logging, often to an external server, and have a contingency plan. The logs will be your black box to help you write a post-mortem and work with vendors to understand what happened. And the contingency plan will help you contain a problem as soon as it is detected. Sometimes the contingency plan is as simple as an identified uplink cable to the firewall that plant operators disconnect in an emergency to isolate the control network.
We’ve come a long way. Many companies are still trying to figure out the roles for their employees, and many are still writing exceptions because they can’t keep up with patching even once a year. A few still believe they can merge OT into IT. And several others are taking advantage of the opportunities presented by the Biden administration to improve their cybersecurity programs in new ways. Some others, mainly unregulated utilities, don’t yet have a cybersecurity program. As a spectator of different strategies, I see the value of witnessing these organizational experiments and providing insight into what appears to be working best.
About the Author
Juan Vargas, a Cybersecurity and Engineering Consultant at Artech, LLC.
A graduate of Carnegie Mellon University, he started his career doing data analysis at Intel Corp before focusing on automation and control systems at Emerson Electric and finally becoming a cybersecurity expert for those systems. He has worked with most control systems in power generation and on various projects for all of the Top 10 utility companies in the United States.
Juan can be reached on Twitter @JuanVargasCMU.