Resource Data

Is the Data Cloud Alliance for open data or for Google?

The Alliance believes that by committing to open data standards, access and integration across data platforms and applications, it can dramatically accelerate business transformations and close the gap between data and value.

Have you ever felt like your organization has spent more time creating integrations between your various business systems and platforms or testing another analytics tool that promises to bridge the gap between a set of data and another? A new alliance, the Data Cloud Alliance, could soon provide answers.

Announced in early April, the Data Cloud Alliance already includes heavyweights and newcomers to databases and data management, including Google Cloud, Accenture, Deloitte, Elastic, MongoDB, Redis, and more. For now, they are focused on breaking down the barriers between different systems and platforms to ensure that organizations are not prevented from undergoing the necessary digital transformation due to inaccessible data.

“By committing to open data standards, access and integration between the most popular data platforms and applications today, we believe we can dramatically accelerate business transformations and close the gap between data and value,” said Gerrit Kazmaier, vice president and general manager of databases, data. Analytics and Business Intelligence at Google Cloud.

See also: Adoption of API-centric models

What does the Data Cloud Alliance think?

Alliance members believe that solving the problems of managing and analyzing this proliferation of data requires common digital data standards and a “commitment to open data”.

Members of the Data Cloud Alliance will contribute to common industry data models, open standards, and end-to-end integrations that simplify the deployment and maintenance of complex data lakes and analytics pipelines. They also look at challenges around data governance, privacy, and loss prevention, which are key concerns for organizations in heavily regulated industries or those dealing with personally identifiable information (PII).

But the Alliance also seems to recognize that none of these complex pipelines will work without qualified people to maintain them. In its release, the Data Cloud Alliance says its members will implement new educational efforts to close the skills gap and attract more people to modern data and analytics platforms.

Each Alliance member will provide the APIs, infrastructure, and integrations necessary for organizations to move data between any number of platforms and environments, whether on-premises or in public/private/hybrid clouds . These commitments combine ideally to accelerate the adoption of best practices in data analytics and AI/ML applications across industries, and especially for organizations that have traditionally been left behind by the sheer complexity of data.

See also: Business-led integration: uniting IT, non-tech and iPaaS

What does this mean for data-hungry organizations?

The answer seems to be “nothing” – at least for a while. The Data Cloud Alliance website is light on details, aside from the commitments above.

The only specific initiative, platform, or environment mentioned other than Google Cloud itself is Delta Lake, an open-source framework for building a Lakehouse data lake compatible with Apache Spark, PrestoDB, Kafka, Snowflake, and more. It solves data reliability issues by making transactions ACID compliant, with petabyte scale and access to previous versions of data for full audit trails.

David Meyer, Senior Vice President of Products at Databricks, said, “Databricks is thrilled to partner with Google Cloud to drive data sharing based on open standards like Delta Lake. The Data Cloud Alliance reinforces our commitment to the open data sharing and the open data lakehouse paradigm, which enables data teams to collaborate more effectively.

Delta Lake was launched by Databricks in October 2017, then made open source in early 2019. Later that year, the Linux Foundation announced that it was taking ownership of the project to drive adoption and contributions in the framework of a neutral and open governance model that could develop. its community beyond existing Databricks customers. Delta Lake has since been implemented by thousands of organizations, including big names like Comcast, Viacom, Alibaba, Tencent, and more.

Another interesting point was from Mark Van de Wiel, Field CTO at Fivetran, who highlighted the “first step of analytics – data integration, especially from SaaS and database sources” as a major area of ​​concern. They seem to be primarily concerned with that “time-to-data-driven,” when an organization manages enough data – and gives it meaningful meaning – to confidently claim that it is undergoing true digital transformation.

Time will tell if the Data Cloud Alliance will deliver meaningful open standards like OpenMetrics and OpenTelemetry have done for the observability industry, especially since there is a lack of truly neutral groups at this point. A pessimistic view is that the Alliance will centralize all of its efforts around Google Cloud itself, which would limit its impact even if it results in new standards, easier integrations, or better skills development resources.

But the Alliance seems bullish on itself – Lan Guan, Head of Accenture Cloud First Data & AI, said, “With the Data Cloud Alliance, we are joining with our ecosystem partners to focus manic about open standards for data exchange on the cloud continuum.”