I will be a delegate for Cloud Field Day 5 on April 10-12. During the event we will be attending presentations from several vendors, which will be livestreamed. Before I leave on this grand adventure, I wanted to familiarize myself with each of the presenters and consider how their product/solution integrates with cloud computing. I’m also interested to hear from you about what questions you might have for each vendor, or topics you’d like me to bring up. As a delegate, I am meant to represent the larger IT community, so I want to know what you think! In this post I am going to consider Cohesity and how a backup company can become a data aggregator.
Cohesity at first glance appears to be a backup product vendor hocking a hardware appliance. In fact, I first heard of them in direct comparison with Rubrik. Whether or not that’s a fair comparison, it is fair to say that both vendors have a hardware appliance that performs backup and recovery of data.
And just like Rubrik, Cohesity has set its sights on loftier goals than being a backup company. The reason why has to do with aggregation theory and the relationship of backup products to their consumers and the data being backed up. I am largely basing this on the aggregation theory ideas that Ben Thompson has laid out on his blog Stratechery. When you think about a value chain, there are three players. The supplier, the distributor, and the consumer. It’s a reductive approach, and I realize there’s more to it than that. But the simple idea stands. The supplier is the one or more organizations that produce a good or service that they want to sell to a consumer. The distributor brokers the buying and selling of goods and services between suppliers and consumers. And the consumer purchases the goods or services.
Let’s apply that model to a backup appliance and the software it is running. The thing being supplied is data. Every device that stores data is a potential supplier. Backup software serves to collect this data in a central repository and then make that collected data available to a consumer. The consumer in this case is the target for a recovery of the data. Backup software is serving as a distribution channel by aggregating the information stored in disparate systems and making that information available when and where it is needed.
One of the original focuses of aggregation theory was Google. Google aggregates information from disparate sources and surfaces that data up to consumers. Suppliers who are interested in improving their contact with consumers can pay Google to surface results in a way that is favorable. Each individual supplier has very limited power, and the consumers also have very little power. By aggregating search results, Google has put itself in a place of power and their earnings have born this out. In the last 10 years, Google’s ad business has accumulated over $620B in revenue. Being able to aggregate the world’s data and serve it up to hungry consumers was key. The other major key was simplicity and performant user interaction. Yahoo was once considered a rival to Google, but they chose to take a curated portal approach. That made their website slow, clunky, and harder to use. Google kept things spartan and simple, and also managed create some amazing search algorithms that blew the competition out of the water.
What does all this have to do with a backup company? Backups take all your important information and keep a historical record for however long you need it. That sounds a bit like Google doesn’t it? Since it has all that information, it would probably make sense to index and catalog it. Maybe it also makes sense to run some data analysis tools on it. In fact, you could run data analysis tools across multiple datasets that aren’t traditionally stored together, since the backup software is aggregating all the data in your organization. What can you do with all that information?
Those are some quick and easy ones, but I’m certain there are plenty more. The main point here is that the backup software in your organization has aggregated all your important data in one location. Why aren’t we doing more with it?
In the past there were a lot of technical limitations on the amount of data you could keep, and how rapid the retrieval of that data was. Storing backups on tape and shipping them to an offsite location was fine for disaster recovery and restoring missing data, but not practical for quick indexing and retrieval. Modern storage solutions and the public cloud have removed many of those blockers. Instead of doing incremental backups throughout the week and a full every weekend, now almost all solutions perform some kind of incremental backup that gets turned into a synthetic full programmatically. Data deduplication is basically a given for these solutions, and the only questions is how much data is included in each data dedupe library. Physical disk storage has been plummeting in price, with the average cost per GB now in the neighborhood of 2.5 cents. And with the advent of public cloud storage, the costs have dropped even further with support to backup locally and archive to the cloud for longer term storage. In most ways we’ve solved the data storage problem.
The other problem, frankly, is the backup software itself. As someone who has worked with multiple vendors, I can say that almost all of them have a terrible interface that is mired in the old way of doing things. With very few exceptions the backup UI is ugly and confusing. The backup policies and process are arcane and needlessly complex. And the backup clients are unreliable and constantly in need of tending. If backup vendors want to embrace the potential of being an aggregator, they have to do what other aggregators have done; build a compelling and easy user interface. That innovation is highly unlikely to come from one of the incumbents. They are deathlocked with their existing user base, being forced to continue supporting their terrible user interface and processes for those who have spent enough time with them to be trapped in a Stockholm syndrome type arrangement. I’ve seen it at clients who have these bizarre and incredibly inefficient processes they have built on top of their backup solution, and any shift in the solution - even a positive one - will result in great wailing and gnashing of teeth.
The innovation must come from a new vendor. One who can pioneer a better interface and user experience, leading the way to embrace all of the amazing possibilities a data aggregator makes available. Is that Cohesity’s vision? I have no idea, but I intend to find out at Cloud Field Day 5.
Do you have questions for Cohesity? LMK and I’ll be happy to ask them too.
October 18, 2024
What's New in the AzureRM Provider Version 4?
August 27, 2024
Debugging the AzureRM Provider with VSCode
August 20, 2024
State Encryption with OpenTofu
August 1, 2024