Resource Data

Striim passes data “upstream” to Google BigQuery

Data travels faster. As we noted here recently, modern enterprises are now increasingly using data streaming technologies designed to funnel a stream (in a positive sense) of real-time data to and from applications, through engines analysis and database structures.

Some of this data flow will now reside and be processed in well-known corporate databases from data providers that even the average non-technical layperson will have heard of. Other elements of this streaming data stream must be processed and manipulated through the new, more powerful services offered by leading hyperscaler Cloud Service Providers (CSPs).

Getting data from an (often legacy) database into a hyperscaler data service involves more than investing in a new cable or clicking a button.

Stream on Striim

Logically named to give a sense of data flow from the get-go, Striim, Inc. (pronounced flow, as in river) not only works to create and build the data pipeline to get data from traditional databases into new services cloud, it also functions to filter, transform, enrich and correlate this data on its journey.

The company’s Striim for BigQuery is a cloud-based streaming service that uses Change Data Capture (CDC) technologies (a database process designed to track, identify, and then work on changed data in an information set given) to integrate and replicate data from enterprise databases such as Oracle, MS-SQL, PostgreSQL, MySQL and others to the Google Cloud BigQuery enterprise data warehouse.

In short, the Google BigQuery cloud data service for business intelligence.

To explain the technology in full, Google BigQuery is a fully managed (cloud-based platform as a service) serverless data warehouse (a virtualized server technique to more accurately meet server resource requirements at the actual point of use). management technique created by bringing together information from multiple sources) that enables scalable analysis across petabytes (1024 terabytes) of data with built-in machine learning capabilities.

Organizations using this technology can now create a new data pipeline to stream transactional data from hundreds and thousands of tables to Google BigQuery with sub-second end-to-end latencies. This is the kind of intelligence needed if we are to enable real-time analytics and solve pressing operational issues.

“Enterprises are increasingly looking for solutions that can import critical data stored in databases into Google BigQuery with speed and reliability,” said Sudhir Hasbe, senior director of product management, Google Cloud.

Water-based data flow analogs

If it seems like we’ll never run out of water-based dataflow analogies, we probably won’t. This is a technology area where organizations need to replicate data from multiple databases (which they previously operated, many of them before the so-called era of digital transformation) and transfer that data to cloud data warehouses. , data lakes and data lakes.

Why would companies need to do this and move data in this direction? To enable their data science and analytics teams to optimize their decision-making and business workflows. But, there are traditionally two problems a) legacy data warehouses are not easily scalable or powerful enough to provide real-time analytics capabilities and b) cloud-based data ingestion platforms require often great efforts to be configured.

Striim for BigQuery provides a user interface that allows users to configure and observe the current and historical status and performance of their data pipelines, reconfigure their data pipelines to add or remove tables on the fly, and to repair their pipelines in the event of a breakdown.

Fresh data, come get it

The Executive Vice President of Engineering and Products at Striim is Alok Pareek. Highlighting the need for what he calls “fresh data” (i.e. real-time streaming data that operates at the speed of modern life and business with the ubiquity of mobile devices from users and new intelligent machines creating their own permanent information channels) to make good business decisions.

“Our customers are increasingly using BigQuery for their data analytics needs. We designed Striim for BigQuery for operational ease, simplicity, and resiliency so users can quickly and easily extract business value from their data. We have automated schema management, snapshot functionality [a means of saving the current state of a data stream to start a new version or for backup & recovery purposes]CDC coordination [see above definition] and managing failures in data pipelines to deliver a pleasant user experience,” Pareek said.

There is automation here too. Striim for BigQuery continuously monitors and reports pipeline status and performance. When it detects tables that can’t be synced to BigQuery, it automatically quarantines errant tables and keeps the rest of the pipeline operational, avoiding hours of pipeline downtime.

Striim for BigQuery Striim strives to continuously ingest, process, and deliver large volumes of real-time data from a variety of sources (on-premises or in the cloud) to support multi and hybrid cloud infrastructures. It collects real-time data from enterprise databases (using non-intrusive change data capture), log files, messaging systems, and sensors and transmits it to virtually no any target on-premises or in the cloud with sub-second latency enabling time-based and analytical operations.

Hyperscaler indifference?

This is all great, i.e. we can get data from Oracle and other aforementioned databases to hyperscaler Cloud Service Provider (CSP) clouds from Google, AWS and Microsoft better, faster, more easily and at a more cost-effective price. We can even do this with a greater number of additional services (cleaning, filtering, etc.).

So why don’t the big cloud players offer this type of technology?

In truth, they do – remember when we said that cloud-based data ingestion platforms often require significant effort to set up? Many of these functions are possible with hyperscalers and it’s not hard to find tons of documentation on the web from the big three clouds detailing the internal mechanics of snapshots, streaming, and schema management. It’s just more expensive and usually not as dedicated a service (they have the biggest clouds on the planet to run, after all) and usually without all the types of add-ons discussed here.

The water-based data stream analogies will continue – coming next: the data jet, probably.