Bit Rot & Data Loss: Are You Aware of Its Damage Potential?

Bit rot is a very complex phenomenon.

While most publications refer to hard drives and tapes, such as

the reality is much worse because bit rot also affects RAM and CPUs. Damaged RAM can be quite difficult to spot, too.


How Damaged RAM can interfere with your data and affect your backups

When a file is updated, it's loaded into RAM either in full or partially. RAM defects sometimes affect single bits or whole words at a time.

Sometimes a single bit "freezes up". This could lead to an A turning into B or a 1 to a 2. Imagine if that occurs inside a database or an Excel sheet, it could go unnoticed for quite some time.

RAID arrays and backups usually offer no help to this phenomenon. To the system the file appeared to have been modifed by the user.


Tapes, CDs, DVDs, Hard Drives, as well as RAM and CPUs are affected

In reality, cosmic radiation, electromagnetic interferrences, power surges, static charge, manufacturing defects, and other influences can cause these bits to fail on tapes, disks, CDs, RAM, and even within CPUs.

Bit rot sometimes causes bytes to be written at the wrong spot. For example, the byte 'A' is supposed to be written to address 10, but due to a RAM issue at the pointer register OR due to a defect inside the RAM chip itself, the A ends up being written to totally different section.

Out tests with popular RAM checkers, such as MemTest86+, have shown that RAM defects affecting the address rather than the byte contents itself, are very difficult to spot using RAM checkers because these types of software test a section of RAM at a time.

This type of data corruption can lead to data loss going unnoticed for a very long time.

And, what's tragic about it, data files will be corrupted every time they are moved or updated, and backed up; hence, backups will also become corrupted.


The Solution

Certainly using backup software is generally a good idea; however, regular backups will only help if media is being rotated and the bit rot is limited to the media and not the server itself. In the case of bit rot occurring on backup storage media, you have some protection.

However, if bit rot occurs in RAM or other internal hardware, chances are backups may become corrupted, too, when backups coincidentally use the defected RAM area. If you are lucky, the server code may end up in the defected RAM area and the Windows will blue screen; yet, this may or may not happen. We investigated these RAM defects on servers that were operated for many months without blue screening.

Some new file systems and RAID arrays can spot single bit errors but those types of errors are only a subset of all possibilities, see Arstechnica link above.

At the moment, the only actionable advice left is to test RAM and disks periodically, which requires a day or so downtime.


Signs of Bit Rot in RAM

You can spot these issues early on when you pay attention to the following:
1. Blue screens and server crashes must be investigated and never be ignored.

2. Characters show up in Word, Excel, or other documents where they weren't entered.

3. Numbers change on Excel sheets

4. Spots on screen, such as individual pixels, that seem to 'stick' and are out of place and not part of the original image

5. Files can no longer be opened and the application that opens them crashes or errors out.

#2 and #3 are often overlooked because people think it was a typo.

Counteracting Bit Rot

To prevent data loss from bit rot, you will need to use reliable backup software that goes far beyond a simple copy tool for files.

Live backup is particularly important for servers but also for PCs as well, as users do not want to be interrupted while at work.

