In the face of global upheaval on multiple axes – from rapid climate change and a transition to cleaner energy supplies, to a reliance on digital infrastructure to underpin our modern economy – data centre operators are increasingly concerned about the power usage effectiveness (PUE) of their facilities.
PUE is a ratio that describes how efficiently a computer data centre uses energy; specifically, how much energy is used by the computing equipment. According to the International Energy Agency, strong growth in demand for data services continues to be mostly offset by ongoing efficiency improvements for servers, storage devices, network switches and data centre infrastructure, as well as the high and growing share of services met by highly efficient cloud and hyperscale data centres.
Despite this, there is growing concern that digital infrastructure will put undue pressure on energy grids and emissions targets. In Ireland, Hannah Daly, lecturer in sustainability at University College Cor, told an online panel that the predicted fast growth of digital infrastructure would make emissions targets unmeetable, as data centres already consume ten per cent of the Irish grid’s output. Elsewhere, opposition to data centre construction in the Netherlands, on the grounds of sustainability, has led to temporary bans in Amsterdam and Flevoland, and a new Meta development in Zeewolde subject to a vote of approval by the national government.
Given the intense scrutiny that data centres are placed under in terms of energy consumption, it is surprising, therefore, to see a rise in what is known as ‘zombie servers’. Essentially, a zombie server is a server that no longer has any external communications or visibility, contributes no compute resources, but continues to consume electricity. The Uptime Institute, a digital infrastructure advisory organisation, goes further and expands the concept of a zombie server to include servers that are less than five per cent utilised.
One of the most common complaints about PUE as a measure is that it does not take into account the overall performance of a facility from an efficiency or utilisation standpoint. The only way to keep PUE even, according to Dale Sartor, a scientist and engineer at Lawrence Berkley National Laboratory, is to increase the IT load. But, obviously, this is not a recommended practice if the loads are made up of poorly utilised zombie servers and old, inefficient IT equipment.
Additionally, a 2017 report from Koomey Analytics and Anthesis shows that 25 to 33 per cent of data centre investments are tied up with zombie servers and virtual machines, and that capital (totalling in the tens of billions of dollars in the US alone) is generating zero financial return. Clearly, this is an issue that cannot be ignored.
Identifying zombie servers and security risks
Jay Dietrich, research director of sustainability at the Uptime Institute, says that a lot of the focus on zombie servers in the past has been on servers that are in enterprise data centres. More recently, however, zombified virtual machines (VMs) that are based in the Cloud have added to the problem.
So, how do these servers deteriorate to the point that they become comatose? “I would argue it is human nature,” explains Dietrich. “What it really boils down to is companies do not have processes to track the behaviour of their servers, or, in some cases, where their servers are even located. There is no way for companies to know that their VM or server has been inactive for five weeks, for example.
“On top of that, data centre operators sometimes do not know the owner of particular VMs or servers, because their records are not up to date and accurate. That adds another level of complication because operators cannot just go around and shut down inactive machines; eventually, they will come across a customer who will be upset if they do. So, the path of least resistance for operators is to just simply leave them alone. Not to mention that those servers, no matter whether they are active or comatose, will generate revenue for the operator.”
However, there are more sinister reasons as to why identifying and eliminating a zombie server should be a pressing issue. Zombie servers are unlikely to have the latest security patches, which makes them an open door to many enterprise data centres for malicious actors to exploit the weakness and infiltrate their stored data. With reliability and security key drivers for data centre success, addressing this issue should be of great importance.
What to do with zombie servers
Unfortunately, companies have thus far seemed either reluctant to act upon this issue, or find themselves in a state of obliviousness. The 2017 Koomey Analytics and Anthesis analysis showed that about one-quarter of physical servers were zombies in companies that had taken no action to remove them, which corresponds to the vast majority of companies running enterprise data centres.
The report concludes that: “Chief financial officers everywhere should leap to fix this problem, because the potential for improving operations and management of these facilities is vast and because zombie servers represent a security risk.”
“For me, it is about process discipline,” asserts Dietrich. “Companies need to look at their utilisation curves on a periodic basis and see where power is consumed. There are lots of software packages out there that can do this automatically, and are specifically built for workload placement to optimise the utilisation of servers that are running, as well as tell users which servers are no longer in use.”
Once identified, cloud-based zombie servers can usually be repurposed. Either the machine or VM is taken offline and scrapped, sent for refurbishment, or reassigned to someone else. Alternatively, it is also possible to consolidate information from a host of servers at five per cent utilisation.
Nevertheless, identifying a zombie server and its owner remains a complex process that businesses and operators must continue to improve upon. New research should also be commissioned in order to track progress, helping to eliminate the prevalent institutional and measurement issues that inhibit the discovery and elimination of zombie servers. The alternative is to continue to allow them to act as a silent parasite on the digital infrastructure world: leeching electricity, rendering investments unproductive, and weakening cyber security defences.