By Mike Zolla, Director, Corporate Systems Engineering, EMC Backup Recovery Systems Integrations Lab
In my last post, I walked through what VMware’s Changed Block Tracking (CBT) enables from a VM protection standpoint and how – at a high level – EMC Avamar takes advantage of the API.
Now, let’s dig a little deeper and look at how backup software integrates with VMware’s vStorage API for Data Protection. Remember, it’s this API that enables backup vendors to leverage CBT and it’s the way that EMC Avamar does this that differentiates it from competitive solutions.
The Challenge…
If you recall, CBT provides supporting backup software with the modified data that needs to be protected. Typical backup software has to take the modified data (i.e., the list of blocks) from the CBT log and figure out which files the blocks are part of.
For example, let’s consider an actual example of Exchange running within a VM. Though the application may only change 1 block (4k) within the vmdk, the CBT log refers to an aggregate 64K total change for the vmdk file. The backup application needs to scan it completely even though only 4k was modified because VMware marks the entire 64k block for scan. That 64k block may consist of many different parts of different files from the guest OS perspective.
This may not seem like an issue, but most backup applications need to store this data natively as the VM sees the data. The backup application can then provide file level recovery from an image backup. The backup’s performance is now related to the amount of change within the VM as well as the number of files within the VM, which makes sizing a solution difficult.
The Solution…
What makes Avamar’s “block-based” architecture more efficient is its ability to identify only the unique blocks (e.g., the 4K Exchange block described above) from the CBT log, compress and store them.
Because Avamar scans the CBT log on a block-by-block basis, it doesn’t have to do any translation, meaning that it doesn’t have to figure out which blocks belong with which files. As a result, backup performance scales linearly as the VMDK/change rate goes up – regardless of the number of files within a VM. Plus, users get the added benefit of file-level recovery from the image-based backup.