Keep enough disk space free at all times: At the source as well as the target
Free disk space is of paramount importance. All kinds of strange software errors emerge, even from Microsoft Windows itself, when resources are low. If too little space is available, disk access slows down dramatically.
Backups can’t work properly when disk space is low since old backups won’t be cleared until the next new backup has completed.
Buy 4x the storage space you need
Data usage grows exponentially. If you agree with your team that 10 TB are enough, you’ll find a year down the road you really needed over 20 TB.
It’s a reality in many businesses that it is more expensive to identify which old files could be safely deleted than to just add more backup disk space. Wasting an hour of an engineer’s time at work usually costs several times more than a terabyte of disk space. In addition, the value of older files may not be clear at the present but a catastrophic event may make these files very valuable in the future. In almost all cases you’re better off holding on to business data files for as long as possible.
Backup systems also require a certain amount of slack space to work properly, usually around 200% of the space required for a full backup. You’ll need to keep this amount of space available at all times and monitor its availability.
Eliminate single point of failure: Use multiple storage devices and use backup rotation
Backing up data on the same disk or physical machine where it came from does not eliminate all risks of data loss. In fact, it only protects from accidental deletion. Data corruption, hard drive crashes, server failure, and other common events can quickly lead to a 100% loss of all data.
Ideally, you want to have at least three different backups: a local external drive backup, a network backup, and an offsite backup stored at a remote location. Each of these backups gives you benefits at a certain cost. External drives are very fast and cost very little; however, they are sensitive, provide no redundancy, and reside right next to the server or workstation and could be damaged by viruses, electromagnetic interference, and other environmental factors. Offsite backups protect from local disasters but usually take a long time to complete and restore, and consume expensive internet bandwidth.
Use internal, external, and offsite storage
Using three types of storage media gives you better control over all potential scenarios and their costs involved. Internal backups are amazingly fast and cheap. They ensure the most common scenarios of data loss are handled immediately, quickly, at a very low cost, and without major interventions.
External storage, such as NAS backup and network server backups, use centralized, redundant storage so that data is automatically protected by using multiple copies at a centralized location, where their functionality and reliability is monitored daily.
Offsite backups, aka cloud backups, are the premium solution to protect against local disasters, such as fires, power surges, lightning, floods, and viruses that spread through the organization’s LAN. Offsite backups also allow restore operations to be done from any location in the world.
Monitor backups frequently
Backups need to be monitored frequently to ensure everything is running smoothly.
It is important to check that backups are indeed running at the scheduled time and frequency, and that enough disk space is available at source and destination.
Do a trial restore several times a year
Storage media can easily be corrupted, but most users are not aware of it. CDs, DVDs, hard drives, and even tapes are sensitive to the environment. Bits can freeze up or flip, a phenomenon called bit rot, (also see this article) and hence cause data loss through corruption. Bad sectors in disks may show up after some time of wear and tear and even brand new drives are often found to be partially defective at times. As storage density increases, this is becoming a serious issue.
Flash disks and SSD drives have the advantage of offering lightning fast throughput rates; however, their failure rates are much higher and wear-and-tear issues are more likely to occur on those kinds of media.
Run a disk check on all source and backup drives and a RAM check on the server
Even though servers are usually equipped with better RAM chips that have an additional parity bit to detect single bit errors, as density is being increased by manufacturers, this parity covers only a small range of potential RAM issues.
Data corruption caused by RAM defects is extremely difficult to spot and detect.
Disks may also cause substantial data loss when data blocks decay and cause bit rot.
A problem at the source, be it RAM or a disk issue, will corrupt all backups, no matter how they were taken and where they were sent.
A bit rot issue at the target makes that particular backup unrecoverable. Depending on the backup software being used, the software may not be able to restore the entire image because of a single bad bit! That is because some software terminates on error rather than skipping it.
Use deduplication to save time and space
Deduplication is a great way to augment data compression. By using incremental backup or differential backup, you can reduce backup media usage by over 500% if not more. A typical incremental backup is usually around 5% per day of the original size.
Use encryption where necessary to protect against theft and manipulation
In some settings the requirement is strict on security, such as when data needs to be kept in a HIPAA compliant way. Use will need to use encryption to protect data against theft, unauthorized access, and manipulation.
When planning a new backup system, keep in mind that encryption requires additional CPU resources.
Have a retention plan
Not all data is created equally. Many files have a certain lifespan that is expected. After that, these files may be safely deleted. Enterprise level backup software can be instructed to retain certain files for a certain time period only, based on the type of file. For example, you may want to hold on to Microsoft Word files for 5 years, but keep large database files for only 3 months.
Storage costs money. Server time and resources also cost money. If you can reduce the amount of data being backed up and the time that data has to be retained, you could potentially save thousands of dollars. A good plan needs to be created to match your situation.
You may want to consider the following questions:
Which files are really big and how long should they be protected in a backup store?
How many files are there to be protected?
How often are these files being updated?
How many GB per day are effectively generated by addition and updates?
Who generates these files and how?
How much money will it cost if these files were lost?
For Hyper-V, see this article which lists additional questions to consider.
Plan for recovery times
Depending on the type of disaster you want to plan for, there is a cost involved and also you’ll need a procedure for data recovery.
Usually time is the most important factor. When you have a team of engineers or surgeons waiting for data to be recovered rather than working, it’s not only frustrating but also quite costly to the organization.
By having multiple backup strategies in place, you can avoid unnecessary delays.
Each backup strategy has a backup and recovery cost attached to it. Sometimes backups will be faster at the expense of a slower recovery; however, since recoveries are much less likely than the daily or hourly backup, you want to balance cost and benefit in your risk analysis.
This is usually done by deploying a hybrid backup plan:
Set up hybrid backup plans for each backup and recovery scenario
Hybrid backup plans focus on what kinds of disasters are expected to occur and how likely and costly they would be.
Accidental deletions may be very likely to occur. As mentioned previously, those events are best handled locally, ideally by the end user itself to minimize downtime and by having a local copy of all data that is easy and quick to access, such as an external drive or an additional internal drive. Note that this method also works even if the network was down.
You then consider the next likely scenario, such as laptop theft or complete breakdown of a machine. How would you recover quickly, and how much time is available for recovery?
These are all business related questions that need to be answered before getting to the technical implementation.
You would want to get a clear idea of how recovery would look like before investing in computer hardware because, depending on your needs, hardware costs could easily skyrocket.
Conversely, when the organization is presented the actual cost of a backup system that meets their initial “requirements”, the teams then usually change their mind and reduce their demands of total backup storage and recovery times.
Some Technical Ideas