08-12-2018, 12:41 PM
Let's break down the Windows error message:
"The shadow copies of volume C: were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied."
Compressed error messages like above need to be read multiple times with high concentration and lot's of coffee...
When a backup is done using VSS, a shadow of the partition (example C is created.
The backup proceeds reading from the shadow, not the from actual C: drive.
What is a shadow? A 'fake' and consistent view of the hard drive from the past. The hard drive is being backed up live, that means other applications are constantly changing files. But we want a consistent view, not a live view.
Remember what happens when you take a photo with very long shutter speed at night and there is motion? It will end up looking blurred.
What you want is a snapshot, so fast, that there is no motion and the pic is sharp. With backups we want a 100% exact representation of what the C: drive looked like at a certain time.
And this representation must be kept alive for as long as the backup takes. That representation of the drive's exact past content is a 'volume shadow' provided by VSS.
Now VSS has to keep 'faking' this past view of the drive's contents, and it's hard work when there are lots of applications writing to the hard drive, changing content. VSS has to buffer all the changed sectors, in order to preserve the past content of the changed sectors.
Hence, the more write activity occurs, the more buffering VSS has to do. VSS uses a hidden file to buffer these changes to sectors. hence, it can potentially run out of space if that buffer file gets too big (as a result of too many sectors being changed on disk during backup).
But the problem here is not space.
The error says " storage could not grow in time" and that means TIME is the issue. When these buffering activities take too long, because they are too many, caused by "too much" writing to disk, VSS would potentially have to choke the entire system in order to catch up in order to keep the shadow alive. Because of a hard fixed timeout inside VSS, it then decides to abandon ship and delete the shadow. Naturally the backup then fails because the device it reads from suddenly disappears.
So to recap, this is a situation where "too much" writing occurs to disk, more than the HW can handle in reasonable time. "Reasonable" is simply a timeout inside VSS that can't be changed.
What causes "too much" writing to disk? The message actually tells you what to look into: "Consider reducing the IO load on the system". This means there are services/applications that are hammering the disk with block changes. Remember every changed block has to be 'rescued' by VSS in a buffer file, so each written sector will end up hitting the disk twice.
Now factor in that this is a mechanical disk. So there are sector seek times of 10-50 ms, potentially for most affected sectors.
Factor in, the drive may be heavily fragmented. Fragmentation requires heads to move around much more, and this increases seek time.
Factor in, a potentially unfortunate setup: someone took a desktop drive and loaded it up with 10 VMs that are likely disk-heavy. A desktop drive is not a suitable medium for VM activity, which is very heavy. 10 VMs is the load of 10 entire servers! Putting that on a cheap desktop drive for production purposes, is, well let's call it 'uninformed'....
This brings us to another important guideline:
When you design and build servers (or order them custom-built if you're allergic to electronics), you absolutely must isolate the load to separate RAID arrays (or at least disks).
You could build a huge RAID array for your VMs or DBs with say 64TB space, all in one chunk. Problem: VSS right now won't work with more than 16TB (which is a problem as the biggest HDD is already 14TB today).
Second problem, even if it did support 64TB, VSS works at the partition level.
If you put 64TB all in one partition, then VSS has to buffer all changes of the entire 64TB partition for the duration of a backup, even if you just want to back up a small 100GB VM....
You see that would also be unfortunate....but you will find it in too many companies...
At the very least, you could split the 64TB into smaller partitions, still using the same RAID array underneath, that's OK. Then strategically place VMs in a way so that the write load of all VMs combined is distributed, not most inside one partition.
Even better for performance and backups: use completely separate RAID arrays.
"The shadow copies of volume C: were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied."
Compressed error messages like above need to be read multiple times with high concentration and lot's of coffee...
When a backup is done using VSS, a shadow of the partition (example C is created.
The backup proceeds reading from the shadow, not the from actual C: drive.
What is a shadow? A 'fake' and consistent view of the hard drive from the past. The hard drive is being backed up live, that means other applications are constantly changing files. But we want a consistent view, not a live view.
Remember what happens when you take a photo with very long shutter speed at night and there is motion? It will end up looking blurred.
What you want is a snapshot, so fast, that there is no motion and the pic is sharp. With backups we want a 100% exact representation of what the C: drive looked like at a certain time.
And this representation must be kept alive for as long as the backup takes. That representation of the drive's exact past content is a 'volume shadow' provided by VSS.
Now VSS has to keep 'faking' this past view of the drive's contents, and it's hard work when there are lots of applications writing to the hard drive, changing content. VSS has to buffer all the changed sectors, in order to preserve the past content of the changed sectors.
Hence, the more write activity occurs, the more buffering VSS has to do. VSS uses a hidden file to buffer these changes to sectors. hence, it can potentially run out of space if that buffer file gets too big (as a result of too many sectors being changed on disk during backup).
But the problem here is not space.
The error says " storage could not grow in time" and that means TIME is the issue. When these buffering activities take too long, because they are too many, caused by "too much" writing to disk, VSS would potentially have to choke the entire system in order to catch up in order to keep the shadow alive. Because of a hard fixed timeout inside VSS, it then decides to abandon ship and delete the shadow. Naturally the backup then fails because the device it reads from suddenly disappears.
So to recap, this is a situation where "too much" writing occurs to disk, more than the HW can handle in reasonable time. "Reasonable" is simply a timeout inside VSS that can't be changed.
What causes "too much" writing to disk? The message actually tells you what to look into: "Consider reducing the IO load on the system". This means there are services/applications that are hammering the disk with block changes. Remember every changed block has to be 'rescued' by VSS in a buffer file, so each written sector will end up hitting the disk twice.
Now factor in that this is a mechanical disk. So there are sector seek times of 10-50 ms, potentially for most affected sectors.
Factor in, the drive may be heavily fragmented. Fragmentation requires heads to move around much more, and this increases seek time.
Factor in, a potentially unfortunate setup: someone took a desktop drive and loaded it up with 10 VMs that are likely disk-heavy. A desktop drive is not a suitable medium for VM activity, which is very heavy. 10 VMs is the load of 10 entire servers! Putting that on a cheap desktop drive for production purposes, is, well let's call it 'uninformed'....
This brings us to another important guideline:
When you design and build servers (or order them custom-built if you're allergic to electronics), you absolutely must isolate the load to separate RAID arrays (or at least disks).
You could build a huge RAID array for your VMs or DBs with say 64TB space, all in one chunk. Problem: VSS right now won't work with more than 16TB (which is a problem as the biggest HDD is already 14TB today).
Second problem, even if it did support 64TB, VSS works at the partition level.
If you put 64TB all in one partition, then VSS has to buffer all changes of the entire 64TB partition for the duration of a backup, even if you just want to back up a small 100GB VM....
You see that would also be unfortunate....but you will find it in too many companies...
At the very least, you could split the 64TB into smaller partitions, still using the same RAID array underneath, that's OK. Then strategically place VMs in a way so that the write load of all VMs combined is distributed, not most inside one partition.
Even better for performance and backups: use completely separate RAID arrays.