Pros and Cons of Deduplication Technologies, VHDX, Hyper-V

Tremendous storage savings

Data deduplication data basically means removing duplicate segments in data files; hence, deduplication is a form of data compression but it’s being used with a twist in various different ways.  Hyper-V backup software, for example, uses deduplication to achieve a dramatic reduction in backup storage usage by comparing the last backup of a Hyper-V machine to the current state. Since there is little data change when backups run often, the so-called delta, the actual difference between a file and its previous version, is also minuscule. Instead of copying a large virtual machine over and over in full length, Hyper-V backup software using deduplication is capable of using expensive storage space in a much more efficient way.

Additional CPU processing needed

Sometimes data isn’t compressed because the benefit of reducing its size is small compared to the CPU load required to compress it in the first place. Compression makes sense when storage space is very limited and relatively good CPU resources are available. When storage is abundant and/or CPU resources are inadequate, it may be better to leave data as-is rather than try to minimize it. In addition, some data, such as encrypted files, cannot be compressed.
Deduplication is no different. In order for deduplication to pay off, there needs to be additional processing. Blocks need to be tracked and individually compressed. To benefit from deduplication, ideally, the source system needs to have a good CPU. Another consideration is the target media. If it is rather slow, deduplication makes sense because any additional processing is offset by bandwidth savings.

Faster Processing time

If a very slow target is being used, such as the internet, the overall processing time may actually be much shorter when using deduplication. If the target is very fast and storage space is plentiful, it may be unnecessary to utilize deduplication, or worse, deduplication may slow down the entire process without providing benefits.

Multistep / multiphase restore

Multistep / multiphase restore may be necessary when accessing or restoring deduplicated data. Deduplication requires a complex storage mechanism to be written and read every time data is being accessed. This additional complexity may have an effect on access speeds. This is because deduplication uses references when data blocks repeat; hence, the target storage media needs to provide fast random access or else restore speed may become an issue.

Random Access Media

Target media needs to offer fast random access. Tape backups may a great disadvantage for the simple reason that unwinding tape takes forever compared to a hard disk, which has a seek time in the millisecond range. And even with hard drives, seek time becomes an issue when the disk is heavily fragmented because mechanical hard drives are optimized for sequential access and seek time quickly adds up. If data is deduplicated over very many generations, this seek time may increase access times significantly.

Slow Target Link / Media

Deduplication allows backups to be sent over slow, high latency links efficiently. Enterprise-grade cloud backup is usually powered by a deduplication engine because internet upload bandwidth is usually very limited. Any additional processing at the source for deduplication is minimal compared to the time needed to upload a large block of data.

Interdependencies and risk of data corruption

Interdependencies of backups increase risk of data corruption: when data is spread out in many pieces, possibly over several storage media, the risk of data loss increases because it only takes one bad block or bad disk to damage the file. Naturally there are ways to minimize this risk and the risk of bit rot; however, a certain level of risk always remains. Full backups / “full files” on the other hand are not dependent on other files; hence, the risk of loss or damage is limited to the file itself and the storage media carrying it isolated from other files and media.

Global Deduplication Offers Additional Potential Savings

While in-file deltas deduplicate individual files, global deduplication attempts to achieve additional savings by removing redundant data blocks across different files. For example, if a dozen servers are backed up to a central server and if each of these servers stores a large block of identical data, global deduplication can identify duplicate data and eliminate duplicates from backup storage. Naturally global deduplication involves a higher degree of complexity and interdependency. As mentioned above, since storage of digital information is not perfect, a single bit rot may corrupt far more than just one file a time, unless the deduplication system has been developed to cope with these issues.

Summary

The result of all this complexity limits the applicability of deduplication. Deduplication requires far more processing and on-disk complex storage formats due to interdependencies of data blocks. Regular file systems do not need to provide interconnectivity and are hence much simpler to access and work with. Data recovery is also far easier on damaged, “regular” file systems than on disks that utilize deduplication for the same reasons. When deciding whether to deploy deduplication, hence, one needs to take into account all the potential advantages and disadvantages, which in turn depend very much on the nature of the overall system and data being processed.

Tools

Significant savings for VHDX deduplication and Hyper-V deduplication may be achieved using BackupChain, our all-in-one backup solution. Interested in cloud backup? Try our unlimited cloud storage plans.

BackupChain Features

BackupChain backup software offers Server 2012 backup as well as cloud backup features to IT professionals worldwide. The strength of BackupChain is that it provides you with the tools needed to set up your own server backup system with live backup, Hyper-V 2012 backup, and cloud server backup when you deal with virtual machines or want to backup virtual machine guests. Moreover, BackupChain also adds a range of features that are important for physical backup server scenarios as well, such as version backup, Exchange Server backup, file backup, and FTP server backup. To make BackupChain as flexible of a backup solution as possible, we have added support for various platforms, such as backup for VirtualBox and sql database backup.

Naturally, features like backup for NAS and deduplication, and backup for cluster shared volume are built-in and available free of charge.

VMware is another component that is quite popular. Our Vmware backup software module covers VMDK backups and also supports granular recovery and granular backup. The same feature set is also available for VDI and VHD / VHDX. Furthermore, our freeware for Hyper-V and other platforms includes an FTP client freeware called DriveMaker, which allows access to FTP cloud storage via a mapped drive. Aiming to be the best backup software for servers on the market, our team is constantly looking for ways to make the IT administrator’s job easier by making BackupChain even more powerful. Download your trial today!