News emerged in mid-August that Microsoft was planning to build an AI service with Databricks and make it available through its Azure cloud-server unit.
The move, first reported in The Information, could damage the Microsoft-OpenAI partnership because it would allow Microsoft to sell Databricks machine learning and data analytics tools to Azure customers. These tools would enable other organizations to build their own AI models in turn, thus cornering OpenAI.
The fly in the ointment — depending on how you view these things — is that this would likely distance Microsoft from OpenAI, whose LLM it used to build the copilot for Microsoft 365 and other Microsoft platforms.
However, there is no evidence to suggest this is going to happen. If anything, it underlines Microsoft’s commitment to serving and expanding its AI portfolio. After all, in its Q4 earnings call at the end of July, Microsoft outlined a comprehensive plan of spending and investment in AI that foresaw more data centers to support its generative AI ambitions over the next quarter. And Chief Financial Officer Amy Hood told analysts that this kind of spending is likely to increase every quarter for at least the next year.
This is the context of the Databricks rumors, and to suggest that Microsoft has somehow fallen out with OpenAI is, at best, speculative. Microsoft is simply diversifying and opening its platform to as wide a range of open source and proprietary AI offerings as possible.
Should the move happen, the choice of Databricks by Microsoft is an interesting one. Here's why.
Databricks vs. OpenAI
According to Vladislav Bilay, a cloud DevOps engineer with experience in implementing AI and supporting Big Data services, what sets this move apart is the strategic contrast between Databricks and OpenAI's strategy.
OpenAI focuses on creating exclusive AI models and licensing them to partners — including Microsoft — for integration into various services such as Microsoft 365, Windows and Bing. The synergy with Microsoft on multiple fronts underscores their collaborative ethos.
Meanwhile, Databricks' platform acts as a springboard for businesses seeking to harness the power of artificial intelligence to derive insights and solutions from their data.
“The infusion of Databricks' software into Azure serves as a strategic response to the surging requisites of businesses for tailored AI tools,” Bilay said. “Microsoft recognizes the escalating need within the corporate landscape for AI solutions that are intricately aligned with distinct operational contexts.”
By enabling the deployment of Databricks' technology on Azure, he added, Microsoft has embarked on a journey to facilitate the creation of bespoke AI applications, empowering companies to wield AI as a precision instrument in crafting solutions that cater to their precise business imperatives.
Related Article: Microsoft Suggests AI Can Fix Work, But It Can’t Fix Everything
Databricks and Azure
Databricks is a good example of commercialized open source, Erik Gfesser, director and chief architect at Deloitte Global, told us.
This means open source (e.g., Apache Spark) that is bundled alongside commercially supported proprietary software that is often offered by a vendor to provide additional features not otherwise available, such as infrastructure management that simplifies or alleviates the need for customers to do this work themselves.
At this time, AI (and all its subsets, including ML) can already be executed on Databricks, alongside what has been traditionally called big data processing.
But while Databricks is available on all three major public clouds — AWS, Azure and Google Cloud Platform — only Azure provides Databricks as a native service, Gfesser said. This means that Databricks is only available on AWS and GCP via third-party marketplaces, unlike on Azure.
“Microsoft and Databricks worked together to offer Databricks as a first-party Azure service, and Microsoft continues to stand behind this offering," he said.
It is important to note that Microsoft already offers Azure services (e.g., Azure Synapse, and on its future roadmap, Microsoft Fabric), which overlap to some extent with Azure Databricks. It also already offers Databricks services through Azure, though the company has yet to include AI in those services.
While there are many reasons why Microsoft likely chooses to do so, Gfesser believes there are two key reasons behind this:
- AWS: Databricks was a first mover on AWS, bringing a competitive advantage that Microsoft sought to replicate when it partnered with Databricks with a native service.
- Ecosystem: Microsoft tends to cater to enterprise technology shops that are arguably often not as technically savvy as smaller firms or tech firms, and many engineers have become accustomed to the Microsoft ecosystem over the years.
OpenAI is different, Gfesser said, because it was originally open source, well before Microsoft got involved. The timing of this involvement, however, coincided with it becoming a household name seemingly overnight, leading to the emergence of Azure OpenAI.
Yet, OpenAI is "black boxed" and not trained with customer data — or not trained with customer data unless a given customer opts in to do so, and opting in means sharing data, something which many firms seek to prevent. We know, however, that models trained with customer data are likely going to be the most accurate, with the given that the training performed is on par with the training that a proprietary model such as OpenAI can provide. But there are always tradeoffs to be made.
Databricks, Gfesser added, continues to follow the commercialized open source route, offering open source models that can be trained with customer data while keeping the data plane separate from the control plane, essentially meaning that data stays in its current location. This means that the data used to build LLMs in the datalake doesn't move anywhere.
“Yes, training costs will be incurred, but again this is one of the trade-offs that every firm looking to implement AI needs to consider,” he said, adding that in many cases, these trade-offs may be worth the competitive advantage that is gained as a result.
“Based on offerings that Databricks continues to churn out for Azure customers, the additional AI tooling that Databricks will be rolling out are expected to provide additional ease of use for their target markets."
Related Article: Why OpenAI's Data Management Changes Only Partially Solve the Privacy Dilemma
Competing Offerings
All things considered, competition is not necessarily a bad thing either — particularly in this space.
Yaoshiang Ho, co-founder and head of product at Masterful AI, said Azure and its main rivals in the cloud computing space, namely Amazon Web Services and Google Cloud, frequently provide products that are competitive within their own offerings. They do this, he said, so that they can provide platforms customers want. For instance, although Amazon Web Services has its own SQL database, it also offers Microsoft's SQL Server and Oracle.
Azure has historically had a very close relationship with Databricks. In fact, DataBricks is a first party partner to Azure, which means Databricks is well integrated with other Azure services and has integrated billing, Ho said.
“I would read this latest development as just a continuation of the strategy of Azure supporting whatever products are best able to serve customers, even if competitive with its own, rather than any signal about its commitment to its partnership with OpenAI,” he said.