It was reported that Google's data center located in nearby Council Bluffs, Iowa experienced an electrical incident at noon on Monday, August 8, 2022. Three electricians were critically injured while working at a substation near the data center building, according to the local police.
Image source: Internet
Hours after the explosion, some of Google's services, including Search and Maps, were reported to suffer brief outages.
The accident was the second to occur in Google's data center within a month. On July 19, a Google Cloud data center in London suffered an outage. A report released by Google indicates that the incident occurred due to "simultaneous failures of multiple, redundant cooling systems." The incident happened during the recent record heatwave in London, when external temperatures were extraordinarily high, causing the machines to fail to maintain a safe operating temperature. It was not until the following morning that the system was recovered from the downtime.
Although the importance of data centers does not need to be explained, we seem to have seen more news concerning incidents of data centers such as explosions, fires, and power outages in recent years. In light of the extreme global heat these days, summer seems to have become a nightmare for data centers.
Data centers are facing more risks in summer
A data center consists of a large number of pieces of equipment and consumes a significant amount of energy. Throughout the years, accidents in data centers have always caused substantial damage.
In July 2014, a fire broke out in the data center of Chongqing Rural Commercial Bank, destroying the entire server room. A direct loss of over RMB 100 million was reported;
In October 2015, the data center of Microsoft Azure in Shanghai was affected by a power outage caused by a fire in server rooms, resulting in the cloud computing service being unable to provide services to users in industries such as finance, internet, and real estate;
In April 2017, a fire occurred in the network data center of the Beijing University of Posts and Telecommunications, caused by a faulty UPS battery pack. This incident led to the internet access to several universities in Beijing to be disconnected;
In August 2018, a fire broke out in a building under construction for an AWS data center in Tokyo. During the eight-hour fire, five people died and 50 were injured;
In November 2018, KT, one of the three largest telecommunications companies in South Korea, suffered a fire at its building in downtown Seoul. The accident forced several public services to close, including police, hospitals, and financial services;
In March 2021, OVHcloud, one of Europe's leading cloud computing companies, suffered a severe fire in one of its server rooms in Strasbourg, France. The fire affected 3.6 million websites, and some customer data could not be recovered as a result.
Fire at OVHcloud. Image source: Internet
Server rooms of data centers are vital carriers of massive data. Therefore, it is imperative to maintain their security for the entire computing and information system. These rooms, however, are often fragile and require the involvement of the entire organization to ensure their safety, and a fire in a server room always results in irreparable damage.
Fires account for a substantial portion of accidents in data centers, and a variety of factors below cause these fires:
- UPS batteries.
- Excessive cable load. Adding equipment in the server room is not difficult, but modifying the cable load is more challenging. Too high cable load may result in overheating, which may cause an accident.
- Failure of air conditioners or electrical equipment. Air conditioning equipment is essential in a server room, and electric heaters and humidifiers can also cause fires.
- Secondary fires caused by the flame spread.
- High temperatures and thunderstorms.
A data center with many computers running simultaneously will generate a considerable amount of heat, so a cooling system is essential to ensure heat is dissipated as quickly as possible. Furthermore, summers have become hotter in recent years, and outdoor temperatures place an increased strain on the cooling system of data centers.
In July, a record-breaking extreme heatwave hit the UK, which caused Google's cooling system in London to fail. Data centers are normally designed to withstand high temperatures, but the intense heat these days has far exceeded many operators' expectations.
Statistical data indicates that the optimal temperature for data center equipment operation is 22°C. In the baseline temperature case, computer reliability drops by 25% for every 10°C increase in temperature.
There can be little doubt that cooling systems are essential to data centers, but excessive use can result in significant CO2 emissions, exacerbating the greenhouse effect and creating a vicious circle.
Now, many technology companies are exploring green, low-carbon, and energy-efficient cooling methods to cope with extreme weather conditions while saving energy and reducing consumption.
All for cooling
Heat dissipation prompted some manufacturers to come up with alternative locations for data centers.
In 2013, Meta(Facebook) established a data center in Lulea, a northern Swedish city near the Arctic Circle. Using giant fans, it brought cold air in from outside to cool its servers;
On top of the glacier at the South Pole, U.S. scientists have built a data center that contains a high-performance computing cluster with more than 1,200 cores and three petabytes of storage.
In 2015, Alibaba Cloud opened its Qiandao Lake Data Center, where the average annual temperature is about 17 degrees. A constant temperature of the deep lake water allows the data center to avoid using cooling energy other than lake water on 90% of the days per year, saving over 80% of cooling energy;
In 2018, Microsoft sank a data center prototype with over 800 servers off the coast of Scotland's Orkney Islands;
The undersea data center of Microsoft. Image Source: People.cn
In Vita Berg Park, Stockholm, Sweden, there is a data center called Pionen White Mountains, which is located 30 meters under the granite rocks.
Owned by a Norwegian shipping company Smedvig, the Green Mountain data center is hidden within a mountain adjacent to a cold fjord that provides cooling water.
Located in Guian New Area, Guizhou province, the Tencent Guian Qixing Data Center has all its core equipment hidden in a mountain tunnel covering over 30,000 square meters.
The Tencent Guian Qixing Data Center. Image source: Xiaoxiang Morning Herald
It has been estimated that 50% to 70% of a data center's overall costs are borne by power costs, and nearly half of this cost is attributed to air conditioning. Following the site selection described above, it can be concluded that high latitude areas (including the polar circle), locations near water, and remote mountain regions are all common viable locations, where the natural environments are ideal for maximizing energy efficiency for data centers.
In the deep sea, for example, the thermal conductivity of water is greater than that of air, so the ocean can absorb much heat generated by the server, effectively reducing temperatures. In addition to having verified this point, Microsoft also found that the undersea data center performs better than traditional data centers in all aspects and that its failure rate in the water is one-eighth of that on land.
It is also true for those choosing remote mountain regions. Guian New Area of Guizhou province is approximately 1,100 meters above sea level and is characterized by average temperatures of 14°C to 16°C, with summer temperatures rarely exceeding 25°C. With the cool climate and thick soils and rocks, caves dug in the mountains can keep a more constant temperature. In this manner, the cooling system can be greatly relieved of pressure. In addition, building data centers in the mountains also reduces human interference, reducing accidents and ensuring the security of user data.
Early detection and correction of problems
Choosing a data center site requires careful consideration; however, not every data center can accommodate local conditions.
Data centers are best protected through routine security procedures:
First, it is imperative to have an excellent offsite disaster recovery backup in place. This is the key to preventing complete data loss and ensuring that the system operates normally. In particular, it includes the following:
- Performing offsite backups of vital local data.
- Testing regularly whether the backup data is available.
- Providing hot redundancy of key data processing systems to ensure high availability.
- In addition, choosing the most appropriate backup mode requires other considerations, such as business needs and costs.
Second, make plans for daily operations and emergencies. Daily operation and maintenance of data centers involves daily inspections, application changes, hardware and software upgrades, and responding to unexpected failures. Today, many data centers have implemented fully automated inspection systems that can customize inspection routes, generate inspection tasks automatically, and generate inspection reports with only one click. Moreover, we should also develop early warning mechanisms and specifications to prevent potential problems. Meanwhile, emergency plans and drills can be implemented to reduce the risk of downtime during such an unexpected event.
Third, reduce and save energy consumption. For instance:
- in the data center, lights should be turned off when no employees are present;
- periodically check if idle equipment is still operating;
- in server rooms with hot aisle containment, servers are located back to back to the outside of the enclosure to maximize airflow and cooling efficiency;
- double loop pipes should be installed in the air conditioning system to increase safety and reliability;
- the thermal load should be evenly distributed throughout the racks to minimize "hot spots".
There is no doubt that data centers play a crucial role in the operation of an organization. Therefore, business and IT leaders must keep an open mind and take a proactive approach to design a robust IT infrastructure that will guarantee business continuity during a potentially catastrophic event.