The following in-depth article summarizes common Hyper-V and HyperV virtual machine backup tips and tricks that our BackupChain tech support team gathered over time since we first rolled out in 2009. BackupChain now also supports backups on Windows Server 2016 as well as Windows 10.
While some of the points below are focusing on backup, others are more general and apply to regular Windows Server tuning without Hyper-V.
Tip #1: Use specialized backup software instead of scripts
Backup of virtual machines using scripts is common and often lacking reliability. The reason is simple: many scripts lack proper Hyper-V and VSS integration, error handling, logging, error detection, and verification. Files may be missing or corrupt and one would never notice it until the backup is actually used for a restore. If you factor in all the work needed to write a great backup script with complete error control and handling, you are better off buying a dedicated tool instead. Let’s not forget that specialized backup tools also update the backup configuration automatically whenever the VM changes and whenever Microsoft updates Hyper-V; that’s a crucial step overlooked in many scripts. Check out BackupChain for a complete backup solution for Hyper-V.
Tip #2: Back up often
Backups of virtual machines tend to be quite large and large files take a long time to back up. In addition, virtual machine disks take up a lot of space, too; hence, it costs time as well as storage space.
Those Hyper-V administrators utilizing scripts have a hard time with their Hyper-V copies. Because scripts can’t perform deduplication, each backup, even if compressed, uses quite a bit of storage space. The backup processing time is also several times higher because scripts can’t make use of multiple CPU cores during processing. The regular ZIP algorithm, by the way, uses a single CPU core only, even if your server has 32.
When the host server breaks down, you would want to have a most recent backup that is as recent as possible, as you could potentially lose lots of information since the last backup. Two additional problems are accidental deletion and software corruption: by the time the damage is noticed, the oldest “good” backup may have already been cleaned up.
Tip #3: How to uncover the interdependencies of virtual machines
Running a backup on a “regular” physical server is much less complicated than backing up a Hyper-V host, for the simple reason that you are dealing with several servers at a time that reside on shared storage and other resources offered by the host.
One of these shared resources, apart from the host’s hard disk space and RAM, is VSS (Volume Shadow Copy Service). This service is the heart of live backup. Whenever a live backup has to be performed, VSS is integrated into the backup process.
VSS, in turn, contacts all VMs. Via the Hyper-V Integration Services, each VM and each service inside that VM receives a signal to prepare for backup. When several VMs are backed up simultaneously, for example, when multiple-VM consistency is required, it’s not uncommon for VSS to contact a dozen different services, such as Microsoft Exchange, SQL Server, Oracle databases, IIS Servers, and so on. Obviously this puts a lot of stress on the host server and uses a tremendous amount of resources.
Unfortunately this type of stress is also error prone. When errors occur, some users intuitively blame the messenger: the backup software. However, the real underlying reason for almost all backup errors has to do with the host and the VM’s configuration. Each VM has a way of vetoing live backup, and each service inside that VM as well; hence, the number of potential causes rises quickly into the hundreds.
Upon inspection of all relevant Windows Event Viewer logs, one can witness the complexity of VSS and how all services integrate. Managing Hyper-V and backing it up is quite a different world from managing physical machines as Hyper-V interconnects many system components between several servers.
The IT admins “second home” is hence the Windows Event Viewer where usually most of the issues show up with supporting details.
Tip #4: Fine-tuning VSS (Volume Shadow Copy Service)
Microsoft’s default values for VSS are, unfortunately, in need of some tuning; otherwise, backups are prone to fail. Microsoft’s out-of-the-box settings also cause the server to run slower due to unnecessary system shadows accumulating in the system.
Using VSSUIRUN.EXE or vssadmin you can check on each drive’s storage area limits. See http://backupchain.com/Cannot-Find-anymore-diff-area-candidates-for-volume-Volume-shadow-copy-service-error-troubleshooting.html
It’s recommended to use No Limit or a significantly high number to buffer all write operations during lengthy backups. If disk space is tight, you will need to buy a new disk array or move VMs off to free up space, as it is of extremely important to have enough free disk space available for VSS:
Tip #5: Keep enough free disk space and RAM on host and inside VMs
Having plenty of free disk space is crucial for good NTFS and VSS performance.
A disk with less than 10% free space left is likely to run 50% slower when data is added or updated frequently.
Apart from speed issues, VSS requires a lot of temp space, too. When backups run for a long time, disk write accesses (changed disk blocks) need to be buffered in a dedicated area on disk. That area is known as “VSS Storage Area” and keeps growing until the backup finishes; hence, the longer the backup takes and the more data is being updated during backups, the more space is needed for VSS to work properly.
When backups start up, all services integrated into VSS receive a signal and prepare for backup. These services are also likely to consume considerable amounts of resources to get everything prepared. For that reason you would want to err on the side of caution and allocate at least 15% and 10 GB free disk space.
In addition, the time for all VSS aware services to prepare is fixed and very limited. A fragmented disk or almost full disk will delay the process unnecessarily, resulting in the famous “VSS flush timeout” error.
You would also want to be aware of the phenomenon of fragmented RAM. For example, a VM is turned off and you try to start it up again but it won’t start because not enough contiguous RAM is available. The server may have X GB RAM available and the VM should fit in it, but if it isn’t free RAM all in one piece, it won’t suffice. This is something that often occurs when VSS takes a VM into Saved State for a brief moment and then powers the VM up again. The power up event fails, however, when there isn’t enough contiguous space available.
Another recommendation: The system paging file should be fixed sized and ideally located on a dedicated disk. Size it to be 3x the physical RAM size of your server.
Tip #6: Avoid dynamically growing / expanding virtual disks
Dynamically expanding disks are a huge no-no. Yes, they are convenient but even Microsoft recommends you don’t use them on a production system.
The reason for this is they create heavy file fragmentation, which results in unnecessary jumps from sector to sector. In addition, the VM host needs to do additional work to determine where each block is located.
Regardless of whether Hyper-V or VMware is being used, the problem is in the “physics” of how data is stored on a mechanical disk. When you see performance plummeting after adding just a couple of VMs, you may be wondering what the issue is: it’s most likely expanding disks.
When backups are triggered, new blocks are written inside the VMs, too, and all dynamic VHDs will need to grow simultaneously. Hence, using these kinds of disk layouts will significantly delay backups and potentially break VSS if there’s too much disk activity.
Tip #7: Back up just often enough
If you ask a manager how often backups should run, the answer is frequently: as often as possible; or better: one after the other for 24 hours.
Do we really want backups to run as often as possible? VSS is the only way to get a “proper” application and crash consistent backup in Windows and Hyper-V. When backups are started, there is an enormous spike of activity going on as each VSS aware service is asked to prepare its data structures for backup. All this activity consumes CPU resources and hard drives are running at max speed as all disk caches are flushed out.
As a backup company that has been around for over a decade, we often receive reports of hard drives dying during backups. That isn’t a coincidence. Backups are quite heavy on the disk drive. SSD drives have an additional issue: they may be fast but each cell can only be written a certain number of times before wearing out. Hence, apart from overheating, wear and tear is real, causes data loss, and is expensive.
Ideally you wouldn’t want backups run too often; only as often as necessary. Backups consume server time and resources. Backups also need storage space and use up labor (supervising and managing backups, for example). Assuming total backup space remains constant, backing up “too often” results in shorter time frames being held in the backup history. In business terms this means you may not be able to go back in time far enough to catch accidental deletions and other data loss issues, unless considerably more backup space is available.
It may be better to use a hybrid storage plan: full backups once a month with a year retention, and daily incremental backups with just a week or two weeks retention, for example.
Tip #8: Check Disk Warnings and take them serious
In case you have never come across a Disk Warning before in Windows: it’s 99.9% always a bad disk. The system is actually doing you a favor letting you know in advance that the disk is bad. Unfortunately, the guys at Microsoft chose the word “warning” when “error” would have been more appropriate. Probably because there’s an optimistic 1% chance you are lucky and it’s “just” a driver issue and not really a bad disk.
Please do not overlook disk warnings: they are almost certainly caused by bad disks. You need to copy your data and replace the disk immediately.
Tip #9: Keep Hyper-V Integration Services up-to-date
Hyper-V Integration Services control all kinds of things. The proper communication between host and VM requires up-to-date integration services. In addition, critical driver software is also contained in integration service components.
When it comes to live virtual machine backups, Hyper-V Integration Services are of paramount importance because the VSS backup signal is passed from the host into the VM via integration service components. Without this working flawlessly, Hyper-V can’t perform live backups at all. Backups are then run in Saved State mode, meaning the VMs are saved, powered off, and come back online after the VSS shadow has completed.
Unfortunately Microsoft hasn’t found a convenient way to fix this issue. Each time you update Windows on the Hyper-V host, Microsoft may or may not update the Hyper-V Integration Services on the host side only. You then need to mount the vmguest.iso into each VM to check if an update is required. On Windows Server 2012 you can see if the integration services are up-to-date in the Hyper-V Manager. Yet, you need to check manually and update manually. In addition, these updates require the VM to be rebooted. It’s no surprise then that many IT admins push this off and later forget about it.
We do a have script (requires Windows Server 2012 or later) to help you out with this one; it checks each VM’s integration service version:
Tip #10: Keep enough disk space available in your backup target
BackupChain and other pro tools don’t overwrite files in the backup target; instead, files are added and old ones are cleaned up later. This strategy ensures you always have a good backup in the backup store.
The side effect of this strategy is that it requires some slack space. You need to keep at least 25% or more of backup space free in the backup media to ensure there is enough room to buffer fluctuations. Having plenty of space free also speeds up backups considerably as NTFS works more efficiently when large chunks of space are free.
Tip #11: Buy more disk space than you think you need
Trying to squeeze a lot of data into a small container is usually a bad idea because it takes a lot of work and trial-and-error to find better ways to pack data. Considering the amazingly low cost of storage space, especially hard drives, it makes economic sense to simply add more storage instead.
A lot of backups fail because VMs and data files grow unexpectedly. It’s good practice to allocate 3 to 5x the amount for backups than initially needed.
Disk space should be plenty in Windows, too, for reasons state in previous sections.
Tip #12: Keep your software up-to-date
Backup tools depend on detailed system APIs and work very closely integrated with Windows. As a result, when Microsoft makes breaking changes or adds new features and APIs, backup tools need to adapt to the new infrastructure.
Especially in the case of Hyper-V, Microsoft has been actively developing how virtual machines are managed and backed up and continues to do so. Some industry analysts claim that Hyper-V has become one of the biggest selling points for Windows Server altogether.
An outdated backup tool isn’t capable of detecting new APIs and extensions to Hyper-V; hence, keeping it up-to-date is critical for good backups.
Tip #13: Why you need to install Windows Updates in VMs and on Hyper-V host often
Not installing Windows Updates doesn’t only increase the vulnerability of the server for malware attacks, it’s also bad practice because Microsoft continuously fixes its own bugs and interoperability issues.
In case you are not aware of it, an operating system these days ships with over 100,000 known bugs! Most of those bugs are reported by customers and fixed after release in subsequent Windows Updates; hence, if you don’t update Windows regularly, you’re running a buggy OS that could have been fixed long time ago…
If you don’t want to be a ‘test pig’ for Microsoft skip the cutting edge and wait a couple of years before upgrading to a new OS, like most corporations do. Procrastination does indeed pay off sometimes.
Hyper-V is a hot topic for Microsoft. Microsoft adds lots of extensions and bug fixes to Hyper-V on a regular basis. Many of those bug fixes address the Hyper-V Integration Services and require those to be added, too. For that reason, Windows Update needs to be run inside the VMs as well. Furthermore, Hyper-V Integration Services need to be updated inside each VM each time you update Windows on the host, see Tip #9.
And all of the above will require reboots! We all know how much we hate them. But the seasoned IT admin knows Microsoft. And Microsoft requires reboots for pretty much everything, sometimes even after checking an email. But let’s not be sad, each reboot is a great opportunity for a new beginning!
Tip #14: Test for RAM and disk errors periodically
RAM errors and disk errors are facts of life. It’s only a matter of time until a disk fails or a bit of RAM freezes and corrupts your valuable data at random places. Each experienced IT admin knows well, RAM and disk errors are amongst the worst nightmares to deal with. They are indeed like a cancerous tumor that goes unnoticed for a long time and by the time it is noticed, lots of damage has already been done.
RAM and disk checks should be done when a new server is bought and at regular intervals thereafter. It is amazing how “malicious” RAM and disk failures can be, hiding out in the system for months if not years, go unnoticed and corrupt various files and databases before they are finally noticed by chance.
Think about it: a file, say a Word document of a VM disk, is loaded in to a broken RAM cell where it gets corrupted. It is then saved back to disk. Voila, the file is now damaged. The $1,000 entry in Excel turned into a $2,000. If you are lucky, next time the file is opened the application will crash or complain. If not, the error goes again unnoticed for some time and the imaginary $1,000 are now missing and no one knows where they came from. And, to make matters worse, the damaged file is now being backed up with the damage in it as a new file version, overwriting one day the original “good” version…
Because people trust technology almost blindly, they are more likely to put blame on the person entering the info than suspecting a hardware fault. After all computers are infallible, or aren’t they?
While it takes a night of downtime, it’s definitely a good idea to schedule these tests to happen at least twice a year for peace of mind. Unfortunately, as RAM and disk densities go up, so do error rates and probabilities.
Tip #15: Choosing the right compression mode
Choosing a compression that’s too low rate wastes disk space but keeps processing speeds high. The idea here is that switching to the next higher compression rate may give you better storage usage at a minimal increase in CPU time.
For example, making a 1 to 1 copy of a large file (hence no compression is involved) may take longer than storing it compressed, if access to the storage media is slower than access to the source, which is the case when network backups are taken.
At the other side of the spectrum, choosing a compression rate that is too high will burn a lot of CPU time and provide very limited gains.
How do we choose the best compression mode? Since it depends on the data and hardware you have, it’s a matter of trial-and-succeed and requires experimentation.
Tip #16: Run the right number of VMs from the same physical disk
Running too many VMs off the same disk or partition isn’t a good idea. How much is too much depends on the quality of your hardware and the activity level of all VMs.
First, VSS works on the partition level; hence, even if you have a huge disk, if you could cut it into separate partitions you could reduce VSS overhead as backups will focus on just one partition at a time (provided you back up one VM at a time, of course).
But during normal operation, splitting one physical disk into multiple partitions isn’t as good as having several physical disks or disk arrays and spreading the I/O load evenly.
Having several disk arrays is better than one huge array, even if it’s a good RAID stripe system. Separate disk arrays spread the load in a determined way and if one array crashes, you still have the other ones working (hopefully).
However, having more physical disks increases error probabilities: more heat, more vibration, more power being drawn from the power source, more air movement required, and each device adds its own failure rate simply by being a separate device.
Tip #17: Plan for backups when purchasing infrastructure
Backups place a significant load on server CPU, hard drives, and network interfaces. In addition, backups require space and sometimes temporary space as well.
All these factors need to be considered when purchasing servers and storage.
Ideally you would want to allocate CPUs and CPU cores to specific VMs permanently and leave a certain number of cores free for the host system and backup processes.
Tip #18: Avoid NTFS Compression, Encryption, and Bitlocker
Apart from the fact that Windows still contains bugs in its NTFS compression implementation, compression and encryption are handled, much more efficiently by the backup application. For those reasons it is not recommended using NTFS compression or encryption on backup target folders.
Under no circumstances would it be advisable to use a compressed or encrypted folder as the source folder for your virtual machines. Doing so would be sacrificing most the of the expensive hardware resources you paid for. By using a compressed and/or encrypted VHD file, you probably burn over 70% of available resources. In other words, the host server could probably host more than twice as many VMs if they are hosted on regular disks.
BackupChain — Backup Software for Windows Server and Much More
BackupChain is a Windows backup tool offering many features beyond Hyper-V backup: cloud backup, live backup and virtual machine backup for various hypervisors, FTP backup, SQL backup, granular backup for virtual machine image backups and much more.
By default, BackupChain backup software uses a version backup mechanism and to create incremental backups in all kinds of infrastructure protection scenarios: Exchange Server backup, online backup, or typical file server backup.
Apart from Hyper-V backup, BackupChain also supports VirtualBox backup and VMware backup for VMware Workstation, Player, and Server.
Download BackupChain today and make it your favorite Windows Server 2016 backup software.