Our Take on the Data Deluge, and What’s Next
There’s been a stunning rise lately in the number—and type—of private companies worldwide offering technology in various corners of the data sector. As Matthew Scullion, the CEO of our portfolio company Matillion*, put it in TechCrunch recently as his firm raised another $150 million in financing, data is “the new currency” for business.
Matillion, founded in the U.K., focuses on helping organizations extract, transform and migrate critical data from the disparate corporate silos in which it often resides. But that’s just one of roughly six data sub-sectors we’ve identified in this market, all of which hold the potential for rapid growth. Others include storing and managing data once it’s extracted; analyzing and modeling data to glean useful insights that could help a business make better decisions; and taking that process even further by visualizing and dashboarding that data in an easily understandable way.
Today, our company Collibra*, which focuses on data intelligence—particularly around areas like compliance—also hit a corporate milestone when it announced its latest $250 million financing. It all underscores just how detailed and granular the data market has become, and how much market value is up for grabs as companies both 1) increasingly seek out better data to make more-informed decisions, and 2) use data to improve customers’ experiences.
So what’s driving this data deluge? And how long can it continue? Our research and discussions with hundreds of companies over the last five or more years have highlighted six key factors driving the creation and growth of data and business-intelligence (BI) companies. They’ve also given us insights around how the market may shift in the coming years, so we’re sharing some predictions here too.
Literally, zettabytes of data
The first factor driving the growth of new, data-focused technology is simply the unbelievable volume of data being produced today—data that needs to go somewhere to be useful. Data is being produced from all around us whenever we interact with mobile applications, shop online or even through customer support interactions. If technology is being used, data is being created. Research firm IDC predicts that the global datasphere will grow to 143 zettabytes (for context, each zettabyte is 1 trillion gigabytes) by 2024—a 26% increase from the 45 zettabytes of data that were around in 2019.
It’s obvious, but important we say it anyway. The shift to the cloud is real!
We are still very early in the public-cloud adoption journey, as the majority of data still resides in legacy, on-premise data centers. By 2025, IDC estimates that approximately 46% of the world’s stored data will reside in public-cloud environments. This is a direct driver of the massive increase in data, and new data technologies, as the cost of compute and storage in the public cloud is much lower–there are no upfront capital-expenditure requirements, and access to data is often governed by reasonable, pay-as-you-go or consumption-based pricing. In addition, the automation that comes with the cloud allows companies to free up system engineers from worrying about customizing on-premise systems, and instead focus on other data-management priorities. The migration to cloud promotes flexibility, scalability, and cost efficiency in a way not previously possible with on-premise deployments.
Consumers need information, and they need it now.
Old-style, batch data sets historically have been used for many analytics needs; in this method, data is gathered over time prior to being analyzed. There are and will continue to be great use-cases for batch analytics, including managing payroll or customer billing. But with the advent of mobile computing and the Internet of Things, among other trends, there has been a pressing, new need for analyzing data in real time. Use cases here include fraud detection, tracking real-time ETAs on ridesharing applications, managing the temperature of your home as the day progresses, and many more. Per IDC, the market for real- time or continuous analytics is expected to grow to $4.4 billion by 2024. Aside from enabling a different set of applications, real-time analytics contributes heavily to the growth of data given the constant need for up-to-date data.
Data is messy, and it’s everywhere.
It’s clear that data is exploding in many different forms. The second-order problem is that the data lives everywhere. For example, an enterprise’s customer data may live in Salesforce, its employee data in Workday, log data in Sumo Logic*, event data in Segment—the list of potential repositories goes on and on. As a result of this massive data sprawl, a modern data-technology stack has emerged to effectively integrate and join these disparate data siloes, unify them in a central system of record and prepare the data living in them for batch or real-time analytics. These broad step-functions have created massive market opportunities, as shown below, for data companies as customers seek measures of unification.
The rise of the chief data officer (CDO)
The CDO was a position that barely existed a decade ago. Today, many organizations have one, wagering that data is a key asset that must be protected and mined correctly. The rise of the CDO has shaped an attractive go-to-market opportunity for data-infrastructure companies, as the CDO normally has a well-defined budget to buy new technologies. People in this position usually have teams staffed with data analysts, data scientists, data engineers, data architects and business analysts, all of whom also can advocate for new data technology. The rise of the CDO is a key reason many new data companies have flourished.
Removing sales friction with open-source software
Open-source software has provided an attractive way for data-infrastructure companies to get their products into the hands of engineers faster, driving the bottoms-up adoption of many data technologies. As we’ve written previously, including when we introduced our Battery Open-Source Software (BOSS) Index way back in 2017, open-source software helps many organizations struggling to manage huge volumes or structured and unstructured data as they can download and modify source code from relevant open-source projects and tailor it specifically to their own needs. This has resonated across the modern data stack through open-source platforms including Databricks*, Confluent*, dbt, Prefect and others and continues to be a preferred mode of consumption.
We don’t expect the data deluge to slow down anytime soon. In our recent, annual State of the OpenCloud presentation, we noted that there remains a 75% disruption potential for cloud, which serves as a leading indicator of the market opportunity for data management. The implication of the explosion of data is the explosion in the market opportunity for platforms sitting across the data toolkit, and we’re excited to continue investing heavily across this thesis.