Discussion of Backup Strategies and Samples

Backup strategies often revolve around certain constraints, such as:

• backup time is limited
• backups are performed around 500-1,000x more often than restores; hence, backup time windows tend to be prioritized over restore time
• storage space is limited
• human labor is expensive
• business consequences of data loss may range from expensive to fatal
• business data is complex to manage
• data structures can have a complex interrelationship that may not be immediately obvious
• there are different backup strategies with pros and cons for each
• each backup strategy requires a particular restore strategy

“One-size fits all” is hence a practical impossibility. There is no single great and simple strategy, other than one that is specifically tailored to your business and the kind of data you are dealing with. As much as IT admins would love to provide a fancy button ‘click here to back everything up’ to their users, they would be acting as if all of the above issues didn’t exist.
For that reason, a backup application with fancy graphics yet few options is doing disservice making users believe backups are something trivial, and lots of people fall into the trap thinking easy to use gets the job done. Indeed a simplistic tool may be simple to set up, and, along the way, users may be missing lots of the above issues because they are not aware of the consequences of the intentional and unintentional choices they made when they created backups with a narrow perspective of the actual problem at hand.

This is a great opportunity to remember to the great Albert Einstein’s quote: “Everything should be made as simple as possible, but not simpler

What a business would really want is a balanced backup strategy, not a simple one that satisfies all data protection needs to an adequate level.

Consider the following questions and facts:
• what sorts of data do we have?
• what is the total data volume we have in GBs?
• how much of it is static?
• how much data is being added daily? By whom? Why?
• can we group data into chunks based on importance?
• what are the labor costs in case there’s downtime?
• in contrast to that, how costly is storage and hardware?
• are there alternative types of storage that are cheaper / faster / more reliable than the ones currently used?
• how much time do we have to complete a daily backup?
• which groups of data should we back up hourly, daily, weekly, and monthly?
• how many TB or GB per day need to be backed up?
• how fast are the available networks (if backing up to a network or cloud server)?
• would we back up more than one server over the network at a time? (Packet collisions greatly reduce throughput)
• how much time do I have to restore from a catastrophic failure?
• in case of a data loss event, which data needs to be restored and in which order?
• how much money would a catastrophic failure cost in terms of downtime, $/h?
• is it ever cheaper to simply re-create the work lost rather than to archive it?
• do you really need to back up everything every day?
• do you really need to restore everything within minutes?

Facts to consider:
• in most restore scenarios, only small or few files need to be restored
• adding storage is usually much cheaper than figuring out ways to pack more stuff into existing storage. But sometimes it’s the other way around
• disk backups are much faster than tapes
• NAS backup devices range from very reliable to buggy to useless, same goes for hard drives
• hard drives are very fast and high density but their quality is on the decline, failure rates are going up
• are there single points of failure in your infrastructure: such as NAS devices or storage servers
• do we have a plan that prioritizes certain files over others? Usually certain data folders are much more important than others, in terms of loss risk and restore time.
• full backups are usually done only once or very infrequently.
• backup time can be dramatically reduced by using incremental backups (default BackupChain algorithm)

As you can see if your data is really important and you have a lot of data, you have lots of decisions to make.
Decide, originating from ‘-cide’ (killing off), also implies on figuring out what not to do. Many people forget that not making a decision is in itself actually a decision already made, not to follow all known and unknown courses of action.
In order to satisfy various business interests, hence, you need several backups to take place to compensate for and benefit from the pros and cons of each.
Just as on the stock market people talk about ‘diversification of stock portfolios’, backup strategies try to cover very much the same idea: diversification of data loss risks, because at the end of the day you could lose all your money in the stock market despite diversification, and the same goes for backups. Even if you back up in dozens of different ways, the risk of total loss is lower; however, a small risk of total failure still persists and as you diversify the only definite outcome is that various costs are going up as you add more diversification.

Sample Strategies

Most experienced users opt for a hybrid strategy. On each server one would set up several backup tasks, using different schedules.
Perhaps you would want:
A: uncompressed, always ready-to-use backup structure that reflects the original to a local destination, and one to a network destination.
B: compressed offsite storage, to protect from local disasters
C: compressed centralized, local storage, to protect from regular failures, such as disk crashes
D: compressed and deduplicated medium term storage, to also protect from virus and corruption risks and other long-term issues
E: compressed long-term storage

How often should the backup run?

This is an interesting question. It’s not uncommon for hard drives to fail during backups, for the simple reason that backups stress hard drives quite a bit. That’s one good reason to back up less often.
In addition, backing up often requires more management and obviously either more storage or a smaller backup history window. For example if 3 backups fit into device B, 3 could be 3 hours or 3 months, depending on the schedule. If a file is deleted accidently, backups are running hourly, and let’s say it takes 4 hours for the owner to realize it, the file is lost.
On the other hand, if backups run only once a week, people may create new files and lose them in between backup cycles.
It makes sense, hence, to group data into parts depending on available storage, size, and importance.
You would then set up different tasks using different schedules and timings to run each of the ABCDE strategies mentioned above.

 

If you require assistance in setting up backup tasks, please don’t hesitate to contact our helpline.

 

Full, Incremental or Differential?

BackupChain, by default, always creates a full backup first. Then only changed files are backed up.

When incremental deduplication is also being used, the file contents are scanned and only changes are backed up (incremental).

The differential deduplication strategy compares the changes to the last full backup, whereas the incremental strategy looks at the last backup cycle.

If you want restore times to be a two step at max, you would choose differential; however, this option is not so efficient with storage usage because the amount of changes compared to the last full backup tends to increase over time.

Incrementals are more storage efficient; however, as the chain of increments gets longer, restore time may suffer a little.

To compensate for all of the above and in order to keep backup chains short, so they can be deleted and cleaned up more frequently, BackupChain creates periodic full backups at configurable intervals. That way differentials as well as incremental backups get a fresh full copy to work with at predetermined intervals.

For long file revision histories and cloud backups, it’s best to use incremental deduplication.

VMs on a large and fast storage medium could do well with differential backup strategy.