Bit Rot: The Internet Never Forgets,€ Or Does It?

Bit rot: The Internet never forgets,€“ or does it? (Al Jazeera, March 18, 2014):

Planned obsolescence and flipping bits may be putting our digital archives at risk

At The Guardian’s 2013 Activate conference in London, the computer scientist and Internet founder Vint Cerf, when asked about the future of libraries in the digital age, expressed concern. “I am really worried right now about the possibility of saving bits but losing their meaning and ending up with bit rot,” he said. “You have a bag of bits that you saved for a thousand years, but you don’t know what they mean because the software that was needed to interpret them is no longer available or it’s no longer executable … This is a serious, serious problem, and we have to solve that.”

“Bit rot”? The term is nightmarish, conjuring images of a computer system gone haywire, cannibalizing itself from the inside. The phenomenon it describes — the self-erasure of computer bits, caused by aging software’s obsolescence, leading to an irrevocable loss of data — directly contradicts the popular belief that digital data are permanent. Comparatively, the fire at the Library of Alexandria was more straightforward.

But bit rot — and its perceived threat — is contested in the library and archival communities. Some say it exists, while others call it a joke, “the digital equivalent of ‘my dog ate it.’” Even among the believers, its definition is murky, often contradicting itself. The tech blog Ars Technica describes it as “a random bit here or there” flipping and erasing itself, while Cerf’s description relies more on the planned obsolescence of the software used to read those bits. Compared with paper, the turnaround for corruption is astonishingly short. Floppy discs from 1985, the Software Preservation Society notes, “are frequently found rotten.”  Meanwhile, the Abusir Papyri, a series of administrative documents dating to ancient Egypt’s Old Kingdom, are more than 4,000 years old and still legible.

Jane Mandelbaum, project manager of the Library of Congress’ IT office, is emphatic when she tells Al Jazeera, “‘Bit rot’ is not a term that we use in the library. It’s not a term that we use in the IT part of our IT infrastructure.”

“We talk about bit preservation,” says Leslie Johnston, chief of the library’s repository development.

The loss of one bit, then, is more akin to the loss of a page number in a book’s index – irritating but hardly a guaranteed disaster.

Why not talk about bit rot? According to Thomas Youkel, chief of the library’s systems engineering and networking, the term is misleading. Bit degradation is, by design, expected. “Statistically, it’s more likely that a bit is going to change. If you lose one pixel, it’s not a bad thing. You’d still have a picture … This is a technical term, but if you lose a bit in a pointer, you might lose something.” (A pointer controls and orders the data of a program.) The loss of one bit, then, is more akin to the loss of a page number in a book’s index — irritating but hardly a guaranteed disaster.

Nancy McGovern, head of curation and preservation services at the Massachusetts Institute of Technology, shares this ambivalence. “Bit rot is an issue for digital content,” she writes in an email, but preservationists guard against this by making many digital copies of an original object and data and storing these copies across multiple locations.

“Bit rot can affect an object, but not all copies would degrade at the same rate,” she says.

Creating these copies is key to digital preservation’s process. The Library of Congress’ Carl Fleischhauer says, “Our stratagem is to immediately migrate the content” received onto “safer, more secure storage systems.”

At the Library of Congress, checksums — “a mathematical way of saying that this is the state of the file,” explains Johnston — are used to monitor the material over time. Data received on more outdated and vulnerable formats, such as personal hard drives or CD-ROMs, are transferred to disc images, after which labels are created and photographed for documentation purposes. The labels are monitored for degradation alongside the data they describe. Throughout, Youkel says, “you have to actively manage the data. And that’s what we do.”

National and academic libraries monitor their on-site systems. But as digital formats like e-books become increasingly popular, prompting public libraries to make the transition from analog to digital, the real threat might be a question of ownership and accessibility, not bit rot.

Digital archives, which rely on Internet access and electricity, are inherently less stable than their print counterparts.

BiblioTech, the nation’s first all-digital public library, opened its doors in Bexar County, Texas, in September 2013. Since then, it has proved popular with library patrons, and a second branch opened in January. For head librarian Ashley Eklof, maintaining digital data is not yet a concern, but it will be.

“When you talk about bit rot,” she tells Al Jazeera, “I think library vendors, the digital vendors, are going to be facing that much more with how they host the material … We will, once we get digital content from independent authors,” whose e-book files will be monitored or maintained not by outside vendors but by the libraries.

For now, BiblioTech does not host its files on site, unlike the Library of Congress. Rather, it uses a cloud-based arrangement maintained by 3M Library Systems, which stores the e-books on its servers, which are out of state and not accessible to BiblioTech’s librarians. This could leave the library and its 800 e-readers without content in the event of a technological glitch or Internet failure outside its control. “If for whatever reason the Internet stops, just is not there,” says Eklof, “then it’s very difficult to ensure access to that content … [If] all that stuff is gone, then we need a … copy, whether that’s print or whether that’s an e-book on a flash drive.”

This vulnerability might explain why in 2011 the Internet Archive, a nonprofit dedicated to preserving the Web via screen captures accessible through its website, the Wayback Machine, announced that it would begin preserving paper books alongside its digital content, a “physical archive of the Internet Archive.” For all of their perceived ease and flexibility, digital archives, which rely on Internet access and electricity to preserve and present their content, are inherently less stable than their print counterparts.

This is, in essence, bit rot by design: data erasing itself after a certain amount of time.

E-books bring new price negotiations and purchasing agreements. “I’ve noticed that for e-books, for the program we’ve had the longest,” Eklof points out, “they’re on average about $25 per e-book.” That, she says, is “pretty average” and comparable to what both libraries and consumers pay for hardcover books, but prices for e-books bought through vendors like 3M can vary, depending on the publisher.

Through 3M, “Random House, for example, is $85 per [e-]book,” Eklof says, echoing a 2013 price comparison report compiled by Colorado’s Douglas County library system. This report, promoted by the American Library System on its blog, shows a discrepancy between 3M’s library e-book pricing versus consumer retail prices from Amazon and Barnes & Noble. For J.K. Rowling’s “The Cuckoo’s Calling,” for example, Amazon charges consumers $6.50 for an e-book version, while libraries pay 3M $78 for the same file. Publishers control how many times a library is allowed to lend the e-book in question. “You may have heard librarians say that some e-books are just on lend,” Eklof says. “We’re just potentially borrowing them to lend them out, so some of our books are going to expire. We have to give them back, essentially, and purchase new copies.” The time on loans varies. “Some of [the loans] are after 26 checkouts. Some are after 52 checkouts. And some are after a year, so however many people use it, it will expire after a year,” she says.

This is effectively bit rot by design: data erasing itself after a certain amount of time.

In the event of licensing disagreements or copyright disparities, even temporary ownership won’t guarantee a user access to his or her books. In 2009, after a dispute with the digital publisher MobileReference, Amazon deleted copies of “1984” from readers’ Kindles. Though users bought these were copies through Amazon and Amazon later refunded the purchases, it’s a revealing precedent: The same Internet connection that is required for downloading these books can be used to erase them.

According to Eklof, BiblioTech owns about 85 percent of its collection outright. “We get to keep [them],” she says, “and we have those books forever. They’ll never, essentially, decay, as long as they’re basically on some servers.”

“The fact that it’s somewhere out there on the Internet,” says Eklof, means that e-books and other content provided by the library will be accessible, “as long as the Internet doesn’t crash.”

After all, “data is just data,” as Fleischhauer at the Library of Congress says. Like the systems that hold it, it is human-made. Dust to dust: All bit rot proves is that digital is as ephemeral as paper.

1 thought on “Bit Rot: The Internet Never Forgets,€ Or Does It?

  1. It could also be a convenient way to get rid of truth by those who work assiduously to dumb down the people to keep them “controllable.” It is an idea full of fallacy, but those who embrace such folly can do a lot of damage.
    About a year ago, some young person asked me about Enron accounting, how it worked, and so on. I told him to Google it………but fortunately checked while still on the phone with him. They have cleaned up that process dramatically.
    It was a story I followed with great interest. I even copied the hearings (what a waste of time they are!) to listen when I got home from work. It was complex, but not that difficult to follow.
    What Enron would do is to take a huge debt, say $100 million, and move it off their books into a shell corporation. Not only did it rid Enron of the debt on paper, but they took it a step further. They listed the debt as an asset of the shell, and had the shell give Enron huge work orders equal to or exceeding the amount of the debt. Not only did they get rid of the debt, but they turned it into another source of big income. As long as the market went up, it worked. Once the market moved south, Enron went with it.
    Even worse, other big corporations who employed hundreds of thousands of workers were caught playing the same game. Global Crossing, WorldCom and many others…….all of them collapsed leaving millions of workers without jobs, pensions or any chance of recovery.
    That was the first outright plundering of US workers. Millions lost their homes, and everything else they thought they had. The only one tried was Ken Lay, the CEO of Enron who conveniently died before sentencing saving all the stolen money for his heirs. Nobody else, save Jeff Skilling, even got a hand slap. Skilling is now out of jail after serving less than 4 years of a 14 year FEDERAL sentence. All he had to do was pay some chump change.
    Federal sentences normally are not subject to shorter sentences or parole……but in this case, it was.
    Nobody else has paid except the millions who lost all…….and there is nothing for them.
    I thought I could retire years ago…..but most of my assets are gone………
    This behavior by our government has cost us even more than the billions that vanished into the pockets of the greedy guts overnight. It has cost us our long built credibility, our reputation, honor (if they even know what that means any longer) and respect in the world.
    Once it became clear nothing would be done to repair the damage or punish the wrongdoers, the rest of the world began to move away from us. It was silent, gradual……but it has cost us our leadership.
    Leadership is won by respect and straight dealings, and it cannot be forced. A nation that led so effectively turned bully overnight, and the US started attacking nations that had done nothing to warrant it. Now, they threaten war at the drop of a hat, all trust is gone, and statesmen have become foolish gasbags.
    The Bush doctrine, a monstrosity, has become what is left of foreign relationships. As a result, the US is viewed with fear and distrust. Russia has better public relations than we do.
    The dollar has been dropped by half of the world, US financial power has been cut sharply. Prices are soaring because the value of the dollar is dropping. Electronic currencies have replaced any need for a world reserve currency. The US has relied on this status to underwrite the endless printing press, and it no longer works. This is destroying us, and even the US cannot fight the whole world.
    The US is declining very rapidly. Nothing can be done to slow or stop it……..we are finished.

Leave a Comment