For most of IT, riding technology curves is like jumping on a kid’s trampoline: dangerous, sure, but oh-so exhilarating (note: ignore trampoline manufacturer’s “weight limits” at your own peril).
Faster processors, larger disk capacities, flash storage and higher-performance networking enable new applications, drive server and storage virtualization and allow businesses to generate and analyze data 24×7. And all this puts enormous stress on the backup environment.
Bigger storage capacities don’t just seem to make backup window feel smaller, they actually do make them smaller. Server virtualization squeezes the resources available for the backup. Just imagine trying to drink the Pacific Ocean through a straw (ignore the whole “salt water” thing and focus on the amount of water), and that’s the backup challenge created by the technology curves powering the rest of IT.
How will we meet backup windows in highly virtualized production environments or in the world of big data? Inevitably, there will be a discontinuity in how we run backups. Over the past 15 years, there have been two competing backup data flows. First is the dominant “backup client” approach: Backup client reads data from the server and then sends data to an intermediary server, which writes the data to a storage device (tape, optical device, disk, etc.). To meet backup windows, backup clients now leverage incrementals, synthetic fulls and source-side deduplication. Second is the “client-free” approach: Primary data owner reads data and sends stream to a storage device. This includes database and NDMP backups as well as versioned replicas.
Traditionally, “client-free” backups have lacked either the features or the optimizations of “backup client” approaches (e.g., catalogue, source-side deduplication and optimized recovery workflows). The reasons for the disparity are too complex to analyze in a simple blog post (it’s about money) and may never be fully understood (it’s also about controlling your data by putting it in a vendor-specific format), but it’s enough to say that the market works in funny ways (don’t forget about the desire to have a footprint on every server in your environment). But that’s all about to change.
The inexorable increase in data growth will drive the ascension of “client-free” driven backup flows. The “backup client” approach is reaching its apex with source-side deduplicated backups (e.g., via Avamar or Data Domain Boost that are stored as deduped full backups on disk. What happens on a heavily loaded ESX server that lacks the CPU cycles and I/O bandwidth to scan for changed blocks? When can you pummel the mission-critical Oracle database to identify the changed data? Will the client scan for changed files on the NAS server ever complete?
The answer is simple: The backup application must depend on the data owner to tell it what data needs protection. After all, who better than the VMware, Oracle, or the NAS server to efficiently identify the data that has changed since the last backup?
To scale with the environment, backup applications and primary data owning applications/systems must collaborate: data owners need to efficiently identify changed data and backup applications need to turn that changed data into first-class backups. The partnership between primary data owners and backup application has already begun. Some common examples:
- VMware Changed Block Tracking (CBT) ─ With VMware tracking the changed blocks between backups, applications like Avamar dramatically increase their source-side deduplication performance.
- NAS incremental backups ─ Solutions like the Avamar NDMP Accelerator for NetApp and Celerra/VNX and CommVault-managed NetApp SnapVault transform rapidly identified changed data into full backups for long-term retention and rapid recovery.
- Oracle’s Incrementally Updated Backups and Block Change Tracking ─ Solutions like Data Domain and Avamar combine Oracle’s high-performance, low-impact incremental forever backups with deduplication to securely store multiple full backups for reliable, rapid recovery.
Backup applications will continue to orchestrate both the backup and recovery processes, but the data owner becomes an equal stakeholder in optimizing both the backup and recovery workflows.
It seems simple enough: Over time, you won’t be able to meet your backup window with the current methods – even source-side deduplication. The applications and systems that own the primary data hold the keys to meeting the backup window, by identifying the new data to protect. Backup applications must collaborate with server virtualization vendors, primary applications and primary storage systems to deliver complete solutions around “client-free” backup workflows.
But, of course, there is nothing simple about alliances between large companies. There is nothing simple about one company ceding control and influence to another. And there is nothing simple about the ramifications to your IT team, its critical vendors, and your job responsibilities.
The technology curves driving server virtualization and expanding data sets do not lead to simple answers for the backup team. But they are implacable. Backup processes will change to cope with the curves because the existing solutions will ultimately stretch, rip and then fail completely (not unlike my son’s trampoline).
How are you scaling your backups to meet your ever-compressing backup windows today? Are you feeling the pain of Moore’s Backup Administrator Law? Have you adopted any of the collaborative solutions between primary data owner with backup solution? If so, what has been your experience?