Resource Data

How Estuary Helps Businesses Harness Real-Time, Historical Data Pipelines


The The Transform Technology Summits begin October 13 with Low-Code / No Code: Enabling Enterprise Agility. Register now!

Leave him OSS Company Newsletter guide your open The source journey! Register here.

Data may well be the most valuable resource in the world today, given the role it plays in driving all kinds of business decisions. But combining data from SaaS applications and other sources to unlock information is a major undertaking, all the more difficult when it comes to real-time, low-latency data streaming.

New York-based Estuary is working to solve this problem with a “data mining platform” that combines the advantages of “batch” and “stream” data processing pipelines.

“There is a Cambrian explosion of databases and other data tools that are extremely valuable for businesses but difficult to use,” Estuary co-founder and CEO David Yaffe told VentureBeat. “We help customers extract their data from their current systems and integrate it into these cloud-based systems without having to maintain the infrastructure, in a way that is optimized for each of them. “

To achieve this goal, Estuary announced today that it has raised $ 7 million in a funding round led by FirstMark Capital, with the participation of a large number of angel investors, including Datadog CEO Olivier Pomel and CEO by Cockroach Labs Spencer Kimball.

The state of play

Batch Data Processing, for the uninitiated, describes the concept of integrating data into batches at fixed intervals and can be used for things like processing last week’s sales data and compiling data. a departmental report. The processing of flow data, on the other hand, consists of exploiting the data in real time. This is especially useful if a business wants to generate information about sales as they occur, for example, or if a customer support team needs all recent data on a customer, including their purchases. and its interactions on the website.

While there have been significant advancements in the field of batch data processing in terms of the ability to extract data from SaaS systems with minimal technical support, the same has not been true for real-time data. . “Engineers who work with low latency operational systems still have to manage and maintain a huge infrastructure load,” said Yaffe. “At Estuary, we bring the best of both worlds to data integrations: the simplicity and data retention of batch processing systems and the [low] streaming latency.

Above: a conceptualization of the estuary

All of the above is already possible using existing technologies, of course. If a business wants low latency data capture, they can use open source tools like Plusar or Kafka to set up and manage their own infrastructure. Or they can use existing tools led by vendors such as HVR, which Fivetran recently acquired, although they are primarily focused on capturing real-time data from databases, with limited support for SaaS applications.

But Estuary offers a fully managed ELT (extraction, loading, transformation) service.

“We are creating a new paradigm,” Yaffe said. “So far, there have been no products to extract data from SaaS applications in real time – for the most part, this is a new concept. We’re essentially releasing a millisecond latency version of Airbyte, which runs on SaaS, databases, ad / sub, and file stores.

There has been an explosion of activity in the data integration space lately, with Dbt Labs raising $ 150 million to help analysts transform data in the warehouse and Airbyte closing a round of $ 26 million. Elsewhere, GitLab has created an open source data integration platform called Meltano. Estuary certainly fits in with these players, but it intends to set itself apart by focusing on both batch and streaming data processing and covering more use cases in the process.

“It’s such a different goal that we don’t see ourselves as competitive with them, but some of the same use cases could be accomplished by either system,” Yaffe said.

The story so far

Yaffe was previously the co-founder and CEO of Arbor, a data-driven marketing technology company that he sold to LiveRamp in 2016. Arbor created Gazette, the backbone of its managed business department Flow, which is currently in operation. private beta.

Businesses can use Gazette “as a replacement for Kafka,” according to Yaffe, and it has been fully open source since 2018. Gazette creates a real-time data lake that stores data as regular files in the cloud and allows users to integrate with others. tools. This can be a useful solution on its own, but using it as part of a holistic ELT toolset requires considerable engineering resources, which is where Flow comes in. Businesses use Flow to integrate all of them. the systems they need to generate, process and consume data, unifying “batch versus streaming paradigms” to ensure that a company’s current and future systems are “synchronized around the same data sets”.

Flow is available in source, which means that it offers many freedoms associated with open source, except that its Business Source (BSL) license prevents developers from creating competing products from source code. In addition to this, Estuary licenses a fully managed version of Flow.

“Gazette is a great solution to what many businesses are doing today, but it still requires talented engineering teams to build and operate applications that will move and process their data. the ergonomics of the tooling in the batch space, ”explained Yaffe. “Flow takes the concept of streaming [that] Gazette enables and makes it as easy as Fivetran for data capture. The company uses it to achieve this kind of advantage without having to manage the infrastructure or be an expert in building and operating flow processing pipelines.

Although Estuary doesn’t publish its prices, Yaffe said it charges based on the amount of input data Flow captures and processes each month. Regarding existing clients, Yaffe was not free to disclose specific names, but he said the typical client operates in marketing or advertising technology and companies also use it to migrate database data. on-premise data to the cloud.


VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member



Your email address will not be published.