“Architecture should speak of its time and place, but yearn for timelessness,” Frank Gehry.
During the EMC Backup Recovery Systems’ keynote at EMC World, Guy “Haybale” Churchward shared his perspective as a British homeowner. His house was built 150+ years ago, and it will stand for another 150+ years. Therefore, while he makes it his home right now, he feels a responsibility to improve it for the next owner (check out his recent blog post). The home ties together people who will never meet. The right architecture, from St Paul’s in London to Hagia Sophia in Istanbul to Guy’s house, can both connect and inspire across generations.
In this series, I introduced the Protection Storage Architecture and explored the Protection Storage component. This time – Data Source Integration. (To start the series at the beginning, click here.)
Data Source Integration – Why Does it Matter?
Performance and visibility. When they are missing, users lose confidence in the protection team. They slow their development. They roll their own solutions. They lose data.
Performance and visibility. That’s how the protection team can drive the business. Faster backups and restores minimize data loss and downtime, reduce management complexity, and increase the likelihood of data recovery. With visibility into the data protection, application teams and end users gain confidence, accelerate innovation, and remain safe.
Performance and visibility. How can the protection team deliver? Data source integration. Each team believes its data source – the application, the hypervisor, the storage array or the server – “owns” the data (in a virtualized world, multiple teams claim data ownership, until things go wrong; then, all of a sudden, it’s the backup team’s data). The data source touches every bit of information that its users generate or access; its management interface provides administrative control. By sitting in the data path, the data source can optimize protection performance. By incorporating protection controls into its UI, the data source can provide visibility to the data owners in their preferred interfaces.
Data source integration delivers the protection performance and visibility that organizations need.
Data Source Integration – Performance
Data sources optimize protection performance compared to traditional backup clients because they sit in the data path.
A standard backup agent works very hard, but not very smart. The agent sits idle until backup time, when it wakes up and looks for new data to protect (I’m assuming you’re running incremental forever versioned replication– if you’re still running frequent fulls, this discussion may feel like you’re sitting in a Peugeot 306, watching the TGV train thunder by). Backup agents look at every file in the data set, checking timestamps to detect whether it has been modified. Yes, the agents look at every … single … file. Once it locates a new or modified file, modern agents then checksum the data to identify the new data within that file (a critical optimization for protecting large files or using a low-bandwidth network). Backup clients run the storage equivalent of a search for needles in haystacks. While this approach is far better than running a full backup (maybe you’re sitting in a Ford Aspire watching South Korea’s KTX2 train zoom past), but customers continue to reach traditional backup clients’ scalability limits.
The data source, on the other hand, can track exactly what data needs to be protected. Whether it is the application, the hypervisor, the storage, or the server, it owns the data. The data source executes the users’ every data creation, modification, and deletion, so it can keep a log, a journal, or a bitmap of those changes. Therefore, at backup time, the data source already knows exactly what to protect. There is no need to look at every file, no need to checksum every chunk. Instead of searching for needles in a haystack, the data source hands the backup process a pre-ordered set of needles. Even better, when it comes time to restore, it can ask for just those needles back!
Some of the leading vendors that can optimize backups via tracking changes include: VMware (Changed Block Tracking), Oracle (Block Change Tracking), EMC (RecoverPoint, TimeFinder Clones, SyncIQ, …), NetApp (SnapDiff), and Microsoft (Filter Drivers and Change Journal). In other words, the options are widespread.
Because they sit in the data path and can track the new and modified data, integration with data sources can reduce backup and recovery time from days and hours to minutes or seconds.
Data Source Integration – Visibility
Data sources optimize protection visibility by connecting to users via their preferred interfaces.
Technology developers define ‘simple’ differently from the rest of us. Take EMC’s “very simple” goal management system. Every quarter, I must approve my employees’ MBOs in this application. While it has a well-designed UI and management flow, you can guess what I’m doing 5 minutes before close-of-business on the MBO deadline. I’m screaming at my computer about the incomprehensibility of the system, the pointlessness of MBOs and the series of Palahniuk-level horrors I want to visit upon HR, IT and the application developers. When you login to an interface once a quarter, no matter how simple, you re-learn it each time. If I could approve via my normal tools – email, bug tracking system or source code repository – MBOs would take under a minute. Now that’s simple!
Regardless of how simple, elegant, or fun… another interface adds complexity, especially when the customer rarely uses the interface. End-users and administrators do not want to log into a backup application interface. They want to see and manage their protection from their primary tool – vSphere, Oracle, SAP, Unisphere, NFS/CIFS share, etc. If their application does not support a protection view, the backup vendor should provide an interface with the same look and feel as their common tool. Only then will they feel comfortable with the protection environment.
Not surprisingly, the same data source vendors who are optimizing the protection data path are also enhancing the protection control path.
Because they are the users’ central interface, integration with the data sources can improve visibility and confidence into the protection environment.
Data Source Integration – Proof that Data Protection Matters
Performance and visibility have driven the decade-long renaissance of protection innovation. The industry’s data source titans understand that data protection matters. Oracle, Microsoft, VMware, NetApp, and EMC have optimized protection performance and visibility. Ten years ago, these vendors would have said, “That’s a backup software problem” or “Upgrade your hardware to get better backup performance” because companies do not spend resources solving “somebody else’s problem”.
Today, they invest because protection has become their problem. Protection is the primary inhibitor to the growth of their big data applications and infrastructure. As the data sources, they have both the incentive and the unique ability to help solve the problem. Their investment to deliver solution demonstrates the importance of data protection to your environment.
Therefore, as you design your protection environment for today and the future, data source integration is a critical component of your architecture. Protection has become integral to the data sources, so the data sources must be integrated into your protection architecture.