What impact does the use of snapshots during backup have on Hyper-V VM performance?

melissa@backupchain · 10-15-2019, 08:26 AM

When you create a backup of a Hyper-V VM, employing snapshots can really change the performance dynamics of that virtual machine. You might think it’s a simple process, but there are several things to consider regarding how snapshots impact performance.

Snapshots, or checkpoints as Microsoft calls them, allow you to capture the state of a virtual machine at a specific point in time. This feature is great for quick backups and restoring VMs to a known good state. However, as with most things in IT, it’s not just rainbows and butterflies. There are performance overheads associated with using snapshots, and it's essential to understand what those are before you start leaning on them.

When you take a snapshot of a VM, it's like taking a picture of the VM's current state, including its memory, CPU, and disk states. Once the snapshot is created, all changes to the VM are written to a new differencing disk, not the original virtual hard disk (VHD). This is where the performance can begin to take a hit. The more snapshots you have, the more complex the disk I/O becomes because Hyper-V has to keep track of multiple disks. As a result, every write operation may require more processing to manage these differencing disks.

There’s this performance hit that often goes unnoticed until it's too late. Imagine you have a production VM running mission-critical applications. You decide to create a snapshot for backup purposes, thinking it’s just a simple precaution. As soon as you do this, write operations to the VM might slow down noticeably, especially if you have multiple snapshots piled up. The overhead can grow exponentially because every disk write now has to traverse down the tree of differencing disks to find the right location for storage, adding latency to I/O operations. You might notice applications lagging or, even worse, experiencing timeouts during heavy loads.

A real-life example comes to mind from a project at my last job. A colleague had a VM running a database server that took a snapshot as a quick backup. There wasn’t much thought given to the potential repercussions. Over a couple of weeks, that snapshot multiplied like rabbits; each change made a new differencing disk, making the I/O operation a more cumbersome task. Eventually, users started experiencing delays in data retrieval, which was traced back to the snapshot management. It became painstaking to resolve, requiring the snapshots to be consolidated into a single VHD, and downtime was incurred during the process.

Using tools like BackupChain, the impact of running too many snapshots might not be felt immediately since features designed for efficient management help mitigate performance hits. Snapshot management becomes a lot easier, and some overhead can be minimized, but you still have to be cautious about how many snapshots you keep and for how long. It’s crucial to think about your backup strategy holistically and not just focus on the perils of snapshot management.

The cumulative performance impacts can indeed be significant, but there’s more to it. When you’re running workloads that demand heavy I/O, such as SQL servers or multi-user applications, the latency introduced by snapshots can push the performance beyond acceptable limits. It’s not uncommon to witness an increase in CPU usage as the VM tries to manage all the additional I/O demands created by having these snapshots in play. You might find yourself questioning if you’re running a VM or a sluggish old PC.

Many admins often overlook one factor that can be easily controllable: the frequency of creating snapshots. You might think that taking snapshots frequently provides an extra layer of protection, and it does, but the cumulative effect ends up being detrimental if managed poorly. Each snapshot you generate brings additional complexity and overhead, especially under high load. If your snapshots are taken as part of a backup routine, make sure it's well thought out, placing heavy checks on how many are running simultaneously.

Something to think about is how VMs share physical resources on the host machine. Each VM requires its share of CPU resources, memory, and, crucially, disk I/O. Thus, if you’ve got multiple VMs with snapshots, they’ll compete for those I/O resources, leading to even slower performance across the board. Have you ever faced a scenario where the mouse lagged while running several VMs? That’s the I/O contention kicking in, and snapshots can amplify that contention.

You may find that, especially during backup operations, throughput can drop significantly if other VMs on the host are also generating their snapshots. For instance, if I am running three VMs and decide to take a snapshot of each one, the combined effect could mean each VM's backup takes significantly longer because the backup process is competing with all of them.

However, it's crucial to also look at the broadened implications of using the snapshots for backups. Take a step back and evaluate how long you keep these snapshots before merging them back to the VHD. Keeping snapshots around for weeks or even days can compound performance issues since every I/O must resolve these stacked layers of differencing disks. I've often noticed that a well-timed consolidation can serve wonders in reclaiming performance levels, cutting down on latency spikes dramatically.

An additional layer of complexity arises with dynamic disks. If I'm working with dynamic VHDs versus fixed-size ones, the impact of snapshots can vary further. A dynamic disk executes additional layers of I/O overhead because it must allocate disk space as data is written, and the overhead becomes more pronounced with live snapshots. Simply put, the more dynamic disks there are, the more complicated the storage pathways become, amplifying the negative effects of having active snapshots.

Consider testing a snapshot management plan. It’s invaluable to monitor performance metrics before, during, and after taking snapshots. Track the changes in throughput, I/O wait times, and application response times. I always suggest implementing performance monitoring tools to understand the exact impact snapshots impose on performance. You can begin to recognize a pattern and tweak your strategy accordingly.

Don’t forget to prepare for restoration if that time comes. You could have an ideal backup plan set up, but if restoration takes too long or impacts performance during peak periods, those backups become less useful. When snapshots are the only form of backup utilized, and their presence complicates restorations, you’ll find you may have prioritized short-term gains over more effective long-term solutions.

Adopt a strategic approach to incorporating snapshots into your backup plan, giving thought to how, when, and why. While the ease of taking snapshots can be tempting, they come with an implicit cost. Knowing the potential performance impacts can save you from unforeseen headaches down the line. Focusing on balanced management will let you enjoy the benefits of Hyper-V snapshots while minimizing their downsides.