'Data storm' blamed for nuclear plant shutdown
Malfunctioning control device causes fatal spike in traffic
The US House of Representative's Committee on Homeland Security called this week for the Nuclear Regulatory Commission (NRC) to further investigate the cause of excessive network traffic that shut down an Alabama nuclear plant.
During the incident, which happened last August at Unit 3 of the Browns Ferry nuclear power plant, operators manually shut down the reactor after two water recirculation pumps failed. The recirculation pumps control the flow of water through the reactor, and thus the power output of boiling-water reactors (BWRs) like Browns Ferry Unit 3. An investigation into the failure found that the controllers for the pumps locked up following a spike in data traffic - referred to as a "data storm" in the NRC notice - on the power plant's internal control system network. The deluge of data was apparently caused by a separate malfunctioning control device, known as a programmable logic controller (PLC).
In a letter dated 14 May but released to the public on Friday, the Committee on Homeland Security and the Subcommittee on Emerging Threats, Cybersecurity, and Science and Technology asked the chairman of the US Nuclear Regulatory Commission to continue to investigate the incident.
"Conversations between the Homeland Security Committee staff and the NRC representatives suggest that it is possible that this incident could have come from outside the plant," committee chairman Bennie G Thompson (D-Miss.) and subcommittee chairman James R Langevin (D-RI) stated in the letter.
"Unless and until the cause of the excessive network load can be explained, there is no way for either the licensee (power company) or the NRC to know that this was not an external distributed denial-of-service attack."
The August 2006 incident is the latest network threat to affect the nation's power utilities. In January 2003, the Slammer worm disrupted systems of Ohio's Davis-Besse nuclear power plant, but did not pose a safety risk because the plant had been offline since the prior year. However, the incident did prompt a notice from the NRC warning all power plant operators to take such risks into account.
In August 2003, nearly 50 million homes in the northeastern US and neighbouring Canadian provinces suffered from a loss of power after early warning systems failed to work properly, allowing a local outage to cascade across several power grids. A number of factors contributed to the failure, including a bug in a common energy management system and the MSBlast, or Blaster, worm which quickly spread among systems running Microsoft Windows, eventually claiming more than 25 million systems.
I first came across this phenomenon about 8 or 9 years ago. Several PLCs connected to the Ethernet network (the Ethernet connection only for remote monitoring/programming - not for control), and all had faulted, shutting down a hydro-electric power station. When investigated, I discovered all the PLCs had erased their application software, looking like they'd just come out of the box.
An IT engineer had been on site at the time replacing a blade in a hub.
On discussion with the manufacturer, this was a known problem, due to what they called an 'Ethernet Storm,' but not a problem they thought was a serious issue and needed publicising. They even had a fix, but wanted £2000 per processor to upgrade the firmware.
I pointed out that there were serious implications especially for Chemical/Nuclear plants etc. and that they should be proactively addressing the problem. They eventually agreed to upgrade the problem to something called a 'code 10' and that way all our processors would be upgraded for free. In the new revision, there was a new register called an 'Ethernet Storm counter'.
Since then, the problem has re-occured and we are now uprevving all our processors to the latest revision of firmware - which they say is now definitely Ethernet storm resistant (we'll see!).
In critical applications where control system components need to communicate with each other, we do use Controlnet, Modbus, etc., monitor for device failures and all the other good practice that Ranjan advocates, but Ethernet is widely used for non critical connections for MIS, Remote Monitoring etc. Who would foresee that a non-critical connection to an Ethernet network could erase the memory of a processor?
Why Ethernet? Use a Field Bus
My concern is that people are using Ethernet when there are numerous rugged industrial buses such as PROFIBUS, ControlNet not to mention just segmenting your network (no matter what bus you use) and de-rating the network load AND putting monitoring in your PLC or PAC code to look for devices falling off the bus and letting the SCADA operator know.
This system was just badly engineered. Everyone knows that PLCs and PACs have flaky Ethernet stacks. They hate too much chatter and disappear from the network from time-to-time. The automation software for SCADA isn't much better.
If it is critical, I would look at wiring/networking it in such a matter to make it just a little bit rugged.
PLC's don't run windows. They are firmware devices. It's always possible that there was a hardware failure in the network equipment.