To say that Will Brown was unhappy would be an understatement. He felt cornered and he was looking for a way out.
Will, operations director for the Western Sky Data Center, just learned that the facility had recently exceeded the peak load requirement in its power contract – a costly mistake that raised electricity rates.
He was already struggling to keep maintenance and operational costs within budget. And higher density servers were becoming a critical issue too; they had just brought four new racks of servers online to meet promised service levels – even though average traffic had remained relatively flat.
The last thing Will wanted to do was explain why they were now paying a penalty rate for power. Especially because the explanation was so frustrating.
The error occurred on the first hot day of summer, when the HVAC was working just as hard as the servers themselves. The backup diesel generators were eventually fired up – but not until it was too late. Simply put, nobody figured out in time that the facility was exceeding its peak load limit.
Apparently, Will had learned, an alarm had gone off that should have directed attention to power usage, but it was one of many alarms that afternoon. Staff had been kept busy with an air conditioning unit, unusually high server traffic and a stubborn electrical control issue that had forced them to take a piece of gear offline for repair. By the time anyone had time to deal with the alarm on the power usage, the damage was done.
Will desperately felt like yelling at someone, but from everything his investigation had uncovered, nobody had failed to do their job. Somehow, he didn’t expect that “It was just one of those things” was going to be a satisfactory explanation at headquarters.
How can Will improve monitoring of power usage in the future? Is there something he can do about the variety of other issues that constantly chip away at the data center’s performance?