Kyoto University has lost a massive 77TB of critical research data from its supercomputer because Hewlett Packard Enterprise (HPE) issued a software update that caused a script to malfunction and delete backup data. As a result, days of work are gone, and a significant part of the wiped-out data is lost forever.
Kyoto University lost about 34 million files from 14 research groups generated from December 14 to December 16, according to The Stack. GizChina reported that the university could not restore the data from four groups by backup and therefore is gone forever. Initially, specialists from Kyoto thought that the university lost up to 100TB, but it turned out that the limit of the disaster was 77TB of data.
HPE pushed an update that caused a script that deletes log files that are more than ten days old to malfunction. However, instead of deleting old log files stored along with backups in a high-capacity storage system, it wiped out all files from the backup instead, erasing 77TB of critical research data.
HPE admitted that its software update caused the problem and took 100% responsibility.
"From 17:32 on December 14, 2021 to 12:43 on December 16, 2021, due to a defect in the program that backs up the storage of the supercomputer system (manufactured by Japan Hewlett Packard), the supercomputer system [malfunctioned]," a statement by HPE translated by Google reads. "As a result, an accident occurred in which some data of the high-capacity storage (/LARGE0) was deleted unintentionally. […] The backup log of the past that was originally unnecessary due to a problem in the careless modification of the program and its application procedure in the function repair of the backup program by Japan Hewlett Packard, the supplier of the super computer system. The process of deleting files malfunctioned as the process of deleting files under the /LARGE0 directory."
The team has suspended the backup process on the supercomputer. Still, Kyoto University plans to resume the backup by the end of January after fixing the software problem and the script and taking measures to prevent a recurrence.