Data hoarders race to preserve data from rapidly disappearing U.S. federal websites
Websites, databases, and associated YouTube channels quickly being archived by volunteers

U.S. President Donald Trump has issued an executive order that has resulted in many government agencies taking down webpages and sites to comply. Because of this, data hoarders across the internet are racing to preserve them all before they’re taken offline, with MuckRock reporting that the End of Term Archive, which includes the Internet Archive, Stanford University, Common Crawl Foundation, University of North Texas, and Webrecorder, having already saved more than 500 terabytes from .gov domains.
It's reported that more than 8,000 government pages have been taken down, including the Department of Justice database detailing the criminal charges and convictions of January 6 rioters, LGBTQ+ rights and HIV-related information from the Centers for Disease Control and Prevention, and the Climate and Economic Justice Screening Tool released by the Council on Environmental Quality, among others.
Because of this, the r/DataHoarder Subreddit is rallying its over 832,000 members to help save the data in danger of being taken offline and deleted. u/didyousayboop shared on the Subreddit that the Archive Team, composed of volunteer digital archivists led by Jason Scott — the Free Range Archivist and Software Curator at the Internet Archive — is asking for help with its US Government project. This effort is focused on archiving all government content, especially data that is at risk of being removed because of the current administration’s efforts.
We’ve also seen several threads in the r/DataHoarder asking for help backing up specific pages and websites. These include NOAA, USAID, the National Center for Education Statistics, the National HIV Curriculum, CDC Immunization Publications, and more. Someone was even asking for help downloading USAID’s videos on its YouTube channels, fearing that they would be next, after the USAID website went down.
Aside from requests to backup data and volunteers acting on them, we’re also seeing others volunteering to host the archived site data for free on their domains.
This is one of the biggest efforts we’ve seen in archiving, where a huge collection of storage geeks is putting out their best effort to download and preserve online historical data. If you want to join them and help save the information hosted on government servers, you can check out the instructions u/didyousayboop left on r/DataHoarder.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.


















-
Heiro78 It's a bad day when what was free information is being stripped. I've probably never visited any of these sites but it sucks to see them going since I'm sure others utilize them regularly.Reply -
dimar Reminds me when Intel took down all motherboard BIOS files from their site, I took my time to save all of it.Reply -
3tank Seemed ok when orgs like the internet archive were deleting items on their own end that were inconvenient to the last party in chargeReply -
bit_user
Deleting something for political reasons goes against the ethos of a true archivist. I don't believe it was Internet Archive that you're thinking of.3tank said:Seemed ok when orgs like the internet archive were deleting items on their own end that were inconvenient to the last party in charge -
helper800 "Book burnings. Always the forerunners. Heralds of the stake, the ovens, the mass graves."Reply
-Geraldine Brooks -
Heiro78
I only spent around 5 minutes looking, but this is the only related article I could find to the Internet Archive deleting data.3tank said:Seemed ok when orgs like the internet archive were deleting items on their own end that were inconvenient to the last party in charge
https://blog.gingerbeardman.com/2024/08/01/psa-internet-archive-glitch-deletes-years-of-user-data-and-accounts/
Can you clarify what you mean or point to a news article about it? -
thestryker The only good thing is that everyone saw this coming from a ways away. The extra bad part is that it's far more widespread than the last time.Reply
This is one of those things where you'd like to see the preservation of data codified in law. Not to say that the websites need not change as administrations come and go, but simply that any public data remain accessible in some form.