For most of IT, riding technology curves is like jumping on a kid’s trampoline: dangerous, sure, but oh-so exhilarating (note: ignore trampoline manufacturer’s “weight limits” at your own peril).
Faster processors, larger disk capacities, flash storage and higher-performance networking enable new applications, drive server and storage virtualization and allow businesses to generate and analyze data 24×7. And all this puts enormous stress on the backup environment.
Bigger storage capacities don’t just seem to make backup window feel smaller, they actually do make them smaller. Server virtualization squeezes the resources available for the backup. Just imagine trying to drink the Pacific Ocean through a straw (ignore the whole “salt water” thing and focus on the amount of water), and that’s the backup challenge created by the technology curves powering the rest of IT.
How will we meet backup windows in highly virtualized production environments or in the world of big data? Inevitably, there will be a discontinuity in how we run backups. Over the past 15 years, there have been two competing backup data flows. First is the dominant “backup client” approach: Backup client reads data from the server and then sends data to an intermediary server, which writes the data to a storage device (tape, optical device, disk, etc.). To meet backup windows, backup clients now leverage incrementals, synthetic fulls and source-side deduplication. Second is the “client-free” approach: Primary data owner reads data and sends stream to a storage device. This includes database and NDMP backups as well as versioned replicas.
Traditionally, “client-free” backups have lacked either the features or the optimizations of “backup client” approaches (e.g., catalogue, source-side deduplication and optimized recovery workflows). The reasons for the disparity are too complex to analyze in a simple blog post (it’s about money) and may never be fully understood (it’s also about controlling your data by putting it in a vendor-specific format), but it’s enough to say that the market works in funny ways (don’t forget about the desire to have a footprint on every server in your environment). But that’s all about to change.
The inexorable increase in data growth will drive the ascension of “client-free” driven backup flows. The “backup client” approach is reaching its apex with source-side deduplicated backups (e.g., via Avamar or Data Domain Boost that are stored as deduped full backups on disk. What happens on a heavily loaded ESX server that lacks the CPU cycles and I/O bandwidth to scan for changed blocks? When can you pummel the mission-critical Oracle database to identify the changed data? Will the client scan for changed files on the NAS server ever complete?
The answer is simple: The backup application must depend on the data owner to tell it what data needs protection. After all, who better than the VMware, Oracle, or the NAS server to efficiently identify the data that has changed since the last backup?
To scale with the environment, backup applications and primary data owning applications/systems must collaborate: data owners need to efficiently identify changed data and backup applications need to turn that changed data into first-class backups. The partnership between primary data owners and backup application has already begun. Some common examples:
- VMware Changed Block Tracking (CBT) ─ With VMware tracking the changed blocks between backups, applications like Avamar dramatically increase their source-side deduplication performance.
- NAS incremental backups ─ Solutions like the Avamar NDMP Accelerator for NetApp and Celerra/VNX and CommVault-managed NetApp SnapVault transform rapidly identified changed data into full backups for long-term retention and rapid recovery.
- Oracle’s Incrementally Updated Backups and Block Change Tracking ─ Solutions like Data Domain and Avamar combine Oracle’s high-performance, low-impact incremental forever backups with deduplication to securely store multiple full backups for reliable, rapid recovery.
Backup applications will continue to orchestrate both the backup and recovery processes, but the data owner becomes an equal stakeholder in optimizing both the backup and recovery workflows.
It seems simple enough: Over time, you won’t be able to meet your backup window with the current methods – even source-side deduplication. The applications and systems that own the primary data hold the keys to meeting the backup window, by identifying the new data to protect. Backup applications must collaborate with server virtualization vendors, primary applications and primary storage systems to deliver complete solutions around “client-free” backup workflows.
But, of course, there is nothing simple about alliances between large companies. There is nothing simple about one company ceding control and influence to another. And there is nothing simple about the ramifications to your IT team, its critical vendors, and your job responsibilities.
The technology curves driving server virtualization and expanding data sets do not lead to simple answers for the backup team. But they are implacable. Backup processes will change to cope with the curves because the existing solutions will ultimately stretch, rip and then fail completely (not unlike my son’s trampoline).
How are you scaling your backups to meet your ever-compressing backup windows today? Are you feeling the pain of Moore’s Backup Administrator Law? Have you adopted any of the collaborative solutions between primary data owner with backup solution? If so, what has been your experience?
The power bills aren't bad. I have good equipment so it dumbs down a lttlie when it's not in heavy use. Just pick out stuff that is green. Like hard drives. Don't get lots of small ones.. Get one drive that can handle you data but make it a green drive. And get a decent power supply. Don't go to cheap as they will run hot and that makes it so you have to cool your place down. Just a few simple things can make a server run with less power.
- Monica, September 14, 2012 at 2:08 amLenny, Caitlin Moore, EMC senior product marketing manager for Data Domain, suggests you check out this paper: http://www.emc.com/collateral/hardware/white-papers/h8110-oracle-rman-data- domain-wp.pdf. It documents best practices for Oracle and Data Domain - including using incrementally updated backups. Let me know if it answers your questions. Happy to help further.
- Heidi Biggar, March 28, 2012 at 1:29 pmI'm interested in any good articles/postings on using Oracle’s Incrementally Updated Backups and Block Change Tracking, with Data Domain's ability to de-dupe. Using RMAN, NetBackup (7.1) and Oracle DB 10gR2 and 11gR2. Thank You! Lenny
- Lenny, March 27, 2012 at 8:55 pmYou make a good point on controlling backup windows. I was very focused on integrating with the data owning technology (storage, application, hypervisor), but it also helps tremendously to work with the data-owning human beings. Proper data layout of files and applications can dramatically reduce backup times. Typical engineer – I forgot how big a difference the people can make! As for the rate of pain increase, I’m glad to hear your pain isn’t doubling quite as fast as I projected. But, really, we need to start REDUCING the pain. Baseline budgeting for pain is simply unacceptable; backup teams are paying more than their fair share in agony taxes. OK. I better stop before I slash interest rates in the blog via my horrid puns. Thanks for being the first to post to my blog posts. We’ll make sure you get a prize (a lame one, but a prize nonetheless)! -Sm
- Stephen Manley, September 23, 2011 at 8:49 amBackup windows are a problem. I find you have to structure the information you are backing-up in such a way as to not require frequent backups of your data. The partnership you talk about between the owners and the backup team is critical. Also, while Moore's Law applies to some degree, it is at a slower rate. Backup technology is improving, but there are limits and you don't acquire it as fast. Maybe every 24-32 months backup pains double. -Pie
- Pie, September 15, 2011 at 5:01 pm