Bit Rot & Data Loss: Are You Aware of Its Damage Potential?

Bit rot is a very complex phenomenon. While most publications refer to hard drives and tapes, such as http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/  and http://www.economist.com/node/21553445 the reality is much worse because bit rot also affects RAM and CPUs. Damaged RAM can be quite difficult to spot, too.

 

How Damaged RAM can interfere with your data and affect your backups

When a file is updated, it’s loaded into RAM either in full or partially. RAM defects sometimes affect single bits or whole words at a time. Sometimes a single bit “freezes up”. This could lead to an A turning into B or a 1 to a 2. Imagine if that occurs inside a database or an Excel sheet, it could go unnoticed for quite some time. RAID arrays and backups usually offer no help to this phenomenon. To the system the file appeared to have been modifed by the user.

 

Tapes, CDs, DVDs, Hard Drives, as well as RAM and CPUs are affected

In reality, cosmic radiation, electromagnetic interferrences, power surges, static charge, manufacturing defects, and other influences can cause these bits to fail on tapes, disks, CDs, RAM, and even within CPUs.

Bit rot sometimes causes bytes to be written at the wrong spot. For example, the byte ‘A’ is supposed to be written to address 10, but due to a RAM issue at the pointer register OR due to a defect inside the RAM chip itself, the A ends up being written to totally different section.

After intesive tests with popular RAM checkers, such as MemTest86+, it was shown that RAM defects affecting the address bus rather than the byte contents itself, or multiple bit failures within the same RAM word, are very difficult to spot using RAM checkers because these types of software test a section of RAM at a time.

This type of data corruption can lead to data loss going unnoticed for a very long time. And, what’s tragic about it, data files will be corrupted every time they are moved or updated, and backed up; hence, backups will also become corrupted.

Forever-Incremental Backups: A Risky Concept

Bit rot is one of the reasons why so-called “synthetic full” backups or “forever-incremental” backups are risky. Because data is not being written fresh, from scratch, the backed up data on the backup media may become corrupt and this may go on unnoticed for a long time until the data needs to be restored. Forever-incremental backups may also become corrupted if a RAM issue is present and backup files are merged. By reading the data into corrupt memory, the backup files become corrupted, too, unknowst to the user.

What About ECC RAM?

Having server-grade ECC RAM certainly offers some protection, but only against a certain class of single bit errors in RAM. However, in all the cases we have investigated, and the investigations were very labor intensive and took weeks to complete, ECC RAM was used–and faulty. As mentioned above, in most cases even expensive and popular RAM validation tools did not spot the problem. The most common bit rot problem that passed all RAM checker tools without being detected was RAM address bus corruption.
Note that bit rot also occurs in all other components that use memory, such as hard drives (SSD or mechanical) and other embedded devices that use RAM memory; hence, when copying files from one network server to another, the data pass through dozens of different devices, each of which could potentially corrupt the data while in transit.

The Solution

Using backup software is generally a good idea; however, regular backups will only help if media is being rotated and the bit rot is limited to the media and not the server itself. In the case of bit rot occurring on backup storage media, you have some protection.

However, if bit rot occurs in RAM or other internal hardware, chances are backups may become corrupted, too, when backups coincidentally use the defected RAM area. If you are lucky, the server code may end up in the defected RAM area and the Windows will blue screen; yet, this may or may not happen. We investigated these RAM defects on servers that were operated for many months without blue screening.

Some new file systems and RAID arrays can spot single bit errors but those types of errors are only a subset of all possibilities, see Arstechnica link above. At the moment, the only actionable advice left is to test RAM and disks periodically, which requires a day or so downtime.

 

Signs of Bit Rot in RAM

Mostly, and sadly, there are usually no signs that bit rot is happening. However, sometimes you can spot bit rot issues early on when you pay attention to the following:

  • Blue screens and server crashes, so these must be investigated and never be ignored.
  • Characters show up in Word, Excel, or other documents where they weren’t entered.
  • Numbers visibly change on Excel sheets or other text based documents
  • Spots on screen, such as individual pixels, that seem to ‘stick’ and are out of place and not part of the original image
  • Documents (for example, as Excel etc) can no longer be opened and the application that opens them crashes or errors out.

#2 and #3 are often overlooked because people think it was a typo.

Counteracting Bit Rot

To prevent data loss from bit rot, you will need to use reliable backup software that goes far beyond a simple copy tool for files. Whether you also need cloud backup, Hyper-V backup, or other virtual machine backup, the tool will need to take backups frequently and on a schedule automatically.

Live backup is particularly important for servers but also for PCs as well, as users do not want to be interrupted while at work. On the server application side, you will find BackupChain offers a great range of features for Windows Server backup, VirtualBox backup, and VMware backup, offering deduplication for incremental and differential backups as well.

Backup Software Overview

The Best Backup Software in 2024
Download BackupChain®

BackupChain is the all-in-one server backup software for:
Server Backup
Disk Image Backup
Drive Cloning and Disk Copy
VirtualBox Backup
VMware Backup
Image Backup
FTP Backup
Cloud Backup
File Server Backup
Virtual Machine Backup
Server Backup Solution

Hyper-V Backup

  • 18 Hyper-V Tips & Strategies You Need to Know
  • How to Back up Windows 10 Hyper-V VMs
  • Hyper-V Backup

    Popular

    Resources

    Other Backup How-To Guides