If you want to see the value of data in the AI market, look at Databricks and its latest funding round.
Databricks is raising $10 billion in its Series J round. The data and AI company has raised $8.6 billion so far in the round, which is led by Thrive Capital, an investor in OpenAI, and co-led by Andreessen Horowitz and other investors, according to Databricks last month. The round and Databricks’ $62-billion valuation are measures of enterprise data’s paramount significance in technology and AI.
Here, we look at lessons for AI execs in Databricks’ rise to this point, with insights from AI and data leaders.
Databricks' AI Path
- Building Category Leadership Into AI Leadership
- Making Strategic Investments in AI
- Maximizing Enterprise Data for AI Success
Building Category Leadership Into AI Leadership
Databricks’ Series J round represents the peak segment of fundraising rounds in the AI market. The round places Databricks in a small group of AI companies with rounds in the billions last year, such as OpenAI’s $6.6-billion round, xAI’s $6-billion round and Anthropic’s $4-billion investment from Amazon. The difference between Databricks and the others is that Databricks isn’t known for a mass-market AI foundation model.
Databricks was able to attract this level of investment for several reasons: it’s an established data company, founded in 2013, with pioneering open-source roots in big data and machine learning (ML); it built an open-source data platform that’s supported by an ecosystem of over 1,000 partners; it developed and integrated a line of AI capabilities into its data platform for enterprises to build and deploy AI applications with their data; and it reports over 10,000 customers are using its platform.
While many companies in the data market were competing for niches, Databricks emerged as a leader in the unified data market. And while many companies in both the data and B2B software markets were pivoting to AI, Databricks was more native to AI technology and comparatively quick to present its customer base with enterprise data integrations that bridge data and AI development. Investors see Databricks as a rare gem in the AI market: It’s a relatively mature player in data, which is foundational for enterprises, and it’s built for AI.
Capitalizing on the AI Market
Databricks has been a “strong player in the data space for almost a decade now,” said Randall Hunt, CTO, Caylent.
“Databricks is becoming a strong player in the AI space as well, and it's not contrived marketing moments, like some of their competitors — there's actual real software and value there,” Hunt said.
“This $10-billion raise is a chance to give long-term employees liquidity before an IPO, which could come with long lockout periods.”
There’s a high degree of investor interest in AI and Databricks specifically, because there are “very few options for late-stage investors to deploy capital at the scale at which Databricks is operating,” said Chris Resch, chief revenue officer at Indicium.
“AI is clearly a segment that has attracted an enormous amount of investment, but it has largely been across fragmented, smaller companies with lower requirements for capital,” Resch said.
Transformational infrastructure shifts don't happen often, and the early leaders can gain a “disproportionate first-mover advantage,” said Krishna Subramanian, co-founder and COO, Komprise.
“This is why many investors are interested in opportunities at the intersection of infrastructure and AI, especially data management for AI, since there is no AI without data,” Subramanian said.
Related Article: What NVIDIA's No. 1 Market Cap Tells the AI Industry
Making Strategic Investments in AI
As the generative AI market surged, Databricks made a number of AI moves as it re-described itself as a data and AI company and developed and released an AI foundation model called Dolly.
Most importantly, Databricks developed and released a series of AI capabilities for its flagship Data Intelligence Platform: Mosaic, tooling to build, deploy and govern AI; Vector Search, a vector database with governance and that’s designed for retrieval-augmented generation (RAG); Mosaic Agent Framework, to build RAG apps; Model Serving; Mosaic AI Gateway, to manage and govern generative AI models; Model Training; Feature Store; and AutoML, to accelerate the work of data scientists and enable low-code model development.
With the exception of big tech companies and specialized providers, Databricks’ AI capabilities are far broader and deeper than many data and B2B software companies with vague AI-enabled platforms. In those cases, AI’s role in the platform’s technology and functionality aren’t immediately clear to customers and prospective customers. Often, the new AI is enhancing pre-existing and older machine learning technologies. Investors and enterprises recognize the AI substance in Databricks’ platform.
Meeting AI Demand
Databricks’ go-to-market focus on AI was the “same one every other player in the space made to take advantage of the hype and zeitgeist, but the difference between the fakers and the makers is what drives long-term value,” Hunt said.
“Databricks had real software and AI offerings, not just marchitecture — marketing-driven architectures,” Hunt said.
Databricks’ founders also have a “rich background and pedigree in open-source,” which can attract a related set of employees and customers, including in the machine learning market, Hunt said.
“The market demands drove Databricks to where they are, and their ability to leverage the tools that ML engineers actually enjoy using only benefits them,” Hunt said. “Developers and ML engineers have significantly more purchasing power and influence than they've had in the past, and developers tend to prefer the toolchain they're already familiar with.”
A factor in Databricks’ rise in the market is that “every” enterprise is looking for ways to deploy AI into their products or internally to drive efficiency, Resch said.
Subramanian added that IT leaders are focused on “preparing for AI and getting their data ready for AI.”
“Infrastructure vendors, including Databricks, can help organizations with this transformation, which is strategic, but will take a few quarters to translate into meaningful revenue gains. This is especially because we are largely in the model-building phase of AI,” Subramanian said.
Related Article: How is Big Tech Growing AI Revenue?
Maximizing Enterprise Data for AI Success
Databricks is benefiting from the fact that AI leaders consider data the fuel of AI. Without data repositories for training, companies can’t create foundation AI models, off-the-shelf models or custom models. Companies are also relying on data repositories for RAG to give a model supplemental data outside its training set.
Even with an AI model in place, targeted data is essential to developing internal and external AI applications, including AI agents, designed for business outcomes. A company must source, collect, process, manage, store and integrate data with an AI model and any APIs to create generative AI software. The data must serve the AI’s end-user functionality, as bad or irrelevant data will produce a poor AI tool or a tool with low integrity.
As a result, the mission-cricital nature of data in AI technologies requires that companies apply sound data science and data management practices — using a centralized data platform that’s integrated with third-party tools and supports end-to-end AI development. The selection of a data platform, such as Databricks, is a foundational choice for enterprises looking to compete in the AI era.
Driving AI With Data
“The right data platform and organization is really an optimization play,” Hunt said. “What are the minimum viable pieces of context I can provide for training or inference? The right data platform makes that data lineage tracking, cataloging, searching, etc. all very straightforward.”
“When foundation models are all trained on the same data sets, it is your enterprise's unique data that becomes the differentiator.”
A simple wrapper around a foundation model was “good enough to make some demos” at the start of the generative AI surge, but now companies “must use their data to enrich the inferences these models make,” Hunt said.
“They can do it through RAG or fine-tuning or some combination, but without your enterprise's own data, why would anyone purchase inferences from you when they could go directly to the model provider?” Hunt said.
A “properly architected data platform” is essential for the deployment of AI-enabled applications, supporting availability, throughput, governance and cost optimization, Resch said.
“Data and AI are inextricably interwoven with each other,” Resch said.
AI models, particularly in generative AI, can address enterprise use cases “only if they have access to corporate data,” Subramanian said.
“The right data platform should be able to look across all the corporate data stores, curate and find the right data, filter out sensitive data and move the right data to the right application with the right privileges and with proper data auditing and data governance,” Subramanian said.
“Since doing this at scale manually is technically unfeasible, data management platforms that can index, search, curate, move and govern data ingestion to AI are critical.”
Related Article: Clean Your Data! Why Clean Data Is Foundational for Effective AI