We’re about to enter the Digital Dark AgesOnline archives are vanishing — and they’re taking our history with them.
The long-promised digital apocalypse has finally arrived, and it was heralded by a blog post.
Published on July 18, the post’s headline sounded pretty arcane. “Google URL Shortener links will no longer be available,” it declared. I know, I know — not exactly an attack of alien zombies from the death dimension. But the news nevertheless freaked me out. It means another swath of the web is about to disappear.
Here’s the gist: Google used to have an online service that generated pithy, user-friendly versions of long, commercially unwieldy uniform resource locators — the key addresses that identify everything on the web. Shorter URLs are easier to track and better for online commerce. Google stopped shortening addresses back in 2019, but the concise URLs it had already created kept right on doing their job. Click on one and it would take you to the right webpage, the way it’s supposed to.
No more. In the blog post, Google announced that as of next year, all of the existing shortened URLs are getting turned off. Poof. And on the web, if your URL doesn’t work, you might as well not exist. You are unreachable. Without laborious renaming, everything behind those links — billions of them, a decade of digital content — will become inaccessible. Gone. Ask not for whom the 404 message tolls.
Now, rendering a bunch of web content invisible isn’t the end of days. Not by itself. The problem is, this kind of thing keeps happening. And it’s getting worse. Social networks go bankrupt. Digital journalism sites close up shop. Companies pull their online products. Links rot. Files get not found. The cloud, as wags have noted, is really just “someone else’s computers.” And when clouds get turned off, not even the silver lining is left to tell the tale.
Maybe none of this matters much right now. But it will. The internet has become the default archive of our history and culture. And the whole thing is burning down before our eyes, like the Library of Alexandria — only worse. For the first time since people started carving letters into rocks, we’re making a time with no history. We’re about to enter the Digital Dark Ages.
Attempts to quantify the scope of the problem are heartbreaking. Half of links in US Supreme Court decisions no longer lead to the information being cited. A report in 2021 found that a full quarter of the more than 2.2 million hyperlinks on The New York Times website were broken. Even worse, the Pew Research Center estimates that a quarter of everything put on the web from 2013 to 2023 is inaccessible — meaning almost 40% of the web as it existed in 2013 is simply not there today, a decade later.
The degradation of those links wouldn’t panic me so much if they hadn’t replaced what came before them — if museum storerooms and dusty library stacks still served as the warehouses of our collective memory. It’s not that I miss the days of wrangling with old newspapers preserved on microfiche, or trying to sweet-talk a librarian into an international interlibrary loan. I’m glad lots of old movies are streaming and many out-of-print books are only a few clicks away. But archives and databases are more than places to keep old stuff; what we save defines who we are. Today, so much of everything is only digital that when it disappears, it leaves a hole in our shared culture.
Gawker is gone. So is the archive of The Awl, the beloved culture-criticism site. You can go to a library and read the entire output of long-dead newspapers like the Los Angeles Herald Examiner or New York Newsday, but God help you if you want to read old Vice articles. Shenanigans over the ownership of what used to be Paramount have resulted in the deletion of decades’ worth of shows on MTV and Comedy Central.
The Cartoon Network archive is gone. So are Yahoo Groups, Yahoo Answers, big chunks of the Imgur photo service, the spicy parts of Tumblr that got zapped in a porn purge, everything that ever happened on Friendster and the other pre-Facebook social networks, Club Penguin, Neopets, Geocities, AOL, and Prodigy. Vast swaths of video games made for obsolete systems are unplayable memories.
Hard drives have a finite lifespan, and the ones the music industry used for storage in the 1990s ahead of the transition to digital are crumbling. The Department of Veterans Affairs is legally required to preserve all medical records for 75 years after the death of a vet — but it’s having problems, in part because of a balky digital records system. And that’s not to mention things like personal photographs, most of which now exist only on your phone, and nowhere else. Every email you sent or received in your last job, or anything a deceased relative had on their now-unusable computer? These are the things that make us us. Yet I dare you to find them.
There are always brave souls out there who try to rescue scrolls from a burning library. But it’s hard to rescue something that exists only in the ether. “If a library burns down, it’s a tragedy, but most of the books survive elsewhere,” says Mark Graham, a leading internet archivist. “But the digital world is inherently fragile and potentially ephemeral.”
Graham is director of the Wayback Machine, a decades-old project that seeks to collect and save digital copies of web pages, for posterity. Gawker? Yeah, they got most of it. And that Pew study I mentioned, which showed that more than a third of the recent internet had vanished? “When we redid their study using their data, we found that about two-thirds of that material was safely stored on the Wayback Machine,” Graham says. “So really only a ninth is gone.”
As we store our lives on our devices, we’re actively choosing to punch huge gaps in our historical record. It’s self-inflicted cultural amnesia.
The Wayback Machine automatically archives more than a billion URLs every day. It also performs constant maintenance on the hundreds of millions of links across all 320 language editions of Wikipedia, which are atrophying at a rate of 10,000 URLs a day. Most recently, Graham worked on preserving 5,000 videos from a YouTube channel run by Rohingya activists, whose people were subjected to genocide in 2017. “They asked us to archive it because YouTube regularly deletes videos from their platform,” Graham says. “They don’t even leave metadata up, so you don’t know what was deleted.” He says he got all of the videos except one, which was age-restricted.
Usually, the Wayback Machine’s biggest obstacle is paywalls. Most of the articles in the world’s scientific journals, for example, are widely available to anyone with a university affiliation. But the articles are prohibitively expensive for the rest of us — even if our tax dollars paid for the research they describe. An archive isn’t really an archive if no one can afford the entry fee.
But now there’s a new threat to archiving our lives: artificial intelligence. When websites don’t want to let AI slurp up their content, they block a certain kind of digital crawler-bot — the same species of critter the Wayback Machine uses. “That’s happened almost overnight,” Graham says. AI, with its insatiable hunger for training data, can’t access the sites. But neither can the preservationists. In the wake of artificial intelligence, more intelligence is going to vanish.
Let’s be clear: This is about more than just losing a few news articles or clips from your favorite Adult Swim cartoon. What an archive is able to save, down to what formats fit in its file cabinets or data banks, literally determines what gets remembered. If you preserve, say, bank records from the 18th century but not sewing patterns, your annals are going to leave out a lot of people. Similarly, if your digital archive retains only the records of profitable businesses — because the ones that go bust wind up nuking their servers — you lose the memory of everything those deceased companies labored for. And what gets remembered about the past determines what we’re able to do in the present. “Society is memory,” says Marlene Manoff, who served as a senior collection strategist at MIT Libraries. “When you lose that memory, what does that mean?”
Unreadable hard drives and vanishing links aren’t the only threats to the historical record. Consider the selfie. Fifteen years ago, a researcher from the Scripps Institution of Oceanography named Loren McClenachan wanted to know whether commercial overfishing and environmental changes were making fish smaller. So she looked at five decades’ worth of pictures of winning sportfishing catches off Key West, Florida. The fishing boat company that ran the competitions, it turned out, had kept all the physical photographs, most of which had the date handwritten on the back.
Armed with those artifacts, McClenachan was able to show that over the prior half-century, the sizes of prize-winning catches had declined by more than 50%. None of that data would have been available if all the fishers had kept the records of their catches on their phones. Instead we’d be subject to what’s known as “shifting baseline syndrome” — the common assumption that whatever’s normal today was the norm in the past, too.
As the internet vanishes and we store our lives on our devices, we’re actively choosing to punch huge gaps in our historical record. It’s self-inflicted cultural amnesia, made worse by the fact that most of the web is in the hands of large corporations that place little value on preservation. “Over the long term, you can’t preserve a digital object in its original form,” says Manoff, the former MIT librarian. “But in the case of corporate ownership, the likelihood of responsible long-term stewardship of digital content in any form becomes increasingly unlikely.”
The Dark Ages, as historians used to call the early centuries of medieval Europe, lasted for 500 years. Our digital version may never end. A postliterate society leaves exactly as much of a mark on the world as a preliterate one. Which is to say, not much of a mark at all.