Cleanup Settings for Virtual Machine Backups

In this article we discuss the Traditional Incremental Backup Scheme, and important concept in Backup Engineering, and how it relates to the cleanup of virtual machine backups. The first section of this article discusses the theory behind the incremental backup scheme and the last few slides show the cleanup in action, depicted over several backup cycles.

What is the Traditional Incremental Backup Scheme?

The traditional incremental backup scheme is a popular scheme because it prioritizes speed over storage usage. It’s popular because speed is generally, but not always, more important to the business and because storage costs are decreasing every year. It’s generally cheaper and easier to add more storage than to have backups processes spend additional time reorganizing archives unnecessarily.

A major problem with so-called ‘forever-incremental’ or ‘synthetic forever’ schemes is that at some point they require a reorganization, or a merge operation of some sort. This merging substantially increases the process time required to finish a backup cycle. Forever-incremental types of strategies also rely on the backup pieces stored in the backup storage, which may have become corrupt or damaged over time.

What is a Backup Chain?

The traditional incremental scheme does not require any additional processing in the backup folder. A full, generally compressed backup, is taken first, followed by a certain number of incremental backups or differential backups. This forms a backup chain  (hence, the product name BackupChain). Once the first backup chain is complete, a new one is begun and the process continues in this fashion forever.  At some point, to prevent the storage from filling up, a cleanup needs to take place. The cleanup is simply achieved by removing the oldest backup chain in its entirety from the storage. This is a very quick and efficient process.

Pros and Cons of the Traditional Incremental Backup Scheme

The main advantages of the traditional incremental backup scheme are the following. Backups are always written fresh. If, for any reason, files in the backup storage become corrupt, through any means such as bit rot or even vandalism, the software will at some point start a new backup chain automatically. The second main advantage is that the cleanup operation is instantaneous. There is no post-processing required, no merging, no reading, no writing. This is ideal for remote and slow storage but also saves considerably amounts of time even with high speed local backups.

The main disadvantage is that cleanup cannot occur at each backup cycle because a backup chain can only be deleted in its entirety. While the first backup chain is growing, no cleanup is possible because items in the chain cannot be individually removed and that’s because they are interdependent. In order for a cleanup to occur, there must be at least two backup chains present and the number of backups after the deletion must be over the limit specified in BackupChain. For example, if you wish to keep at least 10 backups (the last 10 backup cycles) in your backup storage, then deleting the first backup chain must leave behind at least 10 backups. Hence, there will have to be far more than 10 backups in the storage in order for a cleanup to take place and reduce the total to 10 (or more). Please see the last few slides for a depiction of the cleanup process.

 

Slide1

Slide2

An additional caveat presented in the slide above is that very long backup chains are more efficient with storage but require a longer restore process. First there is a full compressed backup which is then followed by N increments. If the last backup in the chain is to be restored, the entire chain needs to be restored, including all increments. Because each increment refers to the previous, the storage used will be minimal as it only contains the changes since the last backup. But the restore operation will take longer because it has to work through each increment. The exception here is a differential backup. In a differential backup the comparison is always made to the last full backup; hence, there are at the most two steps requires for a restore, which is faster. But because the differential backup looks back at the last full backup, it tends to use more and more backup storage as the total number of changed blocks in the backup increases. Over time, differential backup chains tend to grow large because they are more inefficient with storage, i.e. because the percentage of changed blocks between now and the last full backup tends to increase quickly.

For the above reasons, BackupChain allows users to configure the number of increments or differentials per backup chain. In certain situations it makes sense to create very long backup chains. In other scenarios, a short chain may be better. For example, if you know that your huge 4 TB VM has very little change per day, it will make perfect sense to create very long incremental backup chains, especially if the storage is also remote and slow. If the percentage of content change in each backup is high, this will result in larger increments per day. If the increments for a VM are rather large, it would make sense to keep backup chains short.

By breaking a backup chain, i.e. by starting a new backup chain, you can improve restore speeds and you increase the frequency at which a cleanup may occur. Conversely, short backup chains are not as storage efficient and ‘waste’ more backup space but involve fewer restore steps and allow more frequent cleanups to occur.

 

Slide3

Slide4

Slide5

Slide6

Slide7

Slide8

Slide9

Slide10

In the above slide we see that even though four backups completed and we want to keep just two, it’s still not possible to delete anything. The reason is simple. Since a backup chain may only be deleted in its entirety, deleting it would result in just one backup being left over. But the limit clearly defines two backups as the minimum. Hence, BackupChain waits for more backups to complete.

Slide11

Slide12

Note the above slide shows that the cleanup occurs at the end of day 5. The fifth backup must complete first, then there are two more backups on top of the first backup chain (3+2=5). Once that’s the case, BackupChain will delete the first backup chain and leave you with two backups, which is the defined limit in this example.

Slide13

 

Specific Examples

Let’s say you wanted a full backup once a week and your backups run daily. In our example below, we would define in the Deduplication tab the following:

In our example above, we chose 6 increments. A full backup plus six increments equals 7, a week. We chose differential above because we know that this particular VM changes the same blocks over and over again (the VM contains a specific kind of database with internals well known to the user).

In the File Versioning / Cleanup tab, we define 7 as the minimum because storage is limited:

Note that we set the same limit for all types of files involved. In this case it’s a VMware virtual machine backup. Note that based on the slides and other infos mentioned above, the backup will actually keep up to 14 backups in the backup storage. In the long-term, there will be a fluctuation between 7 and 14 backups in the backup folder. The reason is that the first backup chain will grow to 7 backups. BackupChain then starts a new backup chain. Once it also reaches 7 backups, the total is now 14, then a cleanup is initiated because deleting the first 7 backups (the oldest chain) results in 7 backups left over. Before that a cleanup is not allowed to occur because deleting the first chain too early would leave you with fewer than 7 backups and would violate the “Minimum Number of File Versions” setting shown above, which is set to 7 in this example.

Delayed Deletion for Snapshots / Checkpoint Cleanup

In the special case of virtual machine backups, you may want to consider setting a ‘delayed deletion period’. Backups are done, among many other things, to prevent data loss caused by accidental deletion. In the case of VM checkpoints, the checkpoint may have been deleted purposely. If that’s the case, it will be kept indefinitely in the backup storage unless you define a delayed deletion period. If you entered ’30 days’ without the quotes in the above cells that currently show ‘never delete’, then BackupChain will wait 30 days after it detects that the original files were deleted.

In the case of Hyper-V, and because Hyper-V stores checkpoints as AVHD and AVHDX files, you would define rules for *.AVHD and *.AVHDX to handle the checkpoint cleanup after 30 days or whatever time frame makes sense in your setting.

If you choose to use a delayed deletion period, it’s very important that you define the period to be much longer than your regular cleanup triggered by ‘minimum number of file versions’. From our above example we know that backups will be kept for 7 to 14 days. Hence, defining the delayed deletion period above 30 days (by adding some extra slack) is safe. Defining it as less than 14 days, say 7 days, would not be safe as the oldest backups might have contained checkpoints and be over 7 days old.

Backup Software Overview

The Best Backup Software in 2024
Download BackupChain®

BackupChain Backup Software is the all-in-one Windows Server backup solution and includes:
Server Backup
Disk Image Backup
Drive Cloning and Disk Copy
VirtualBox Backup
VMware Backup
FTP Backup
Cloud Backup
File Server Backup
Virtual Machine Backup
Server Backup Solution

Hyper-V Backup

  • 18 Hyper-V Tips & Strategies You Need to Know
  • How to Back up Windows 10 Hyper-V VMs
  • Hyper-V Backup

    Popular

    Resources