Dataflow Accelerator

Organisations requiring ETL at scale for batch and streamed data are moving to Dataflow from Hadoop for improved cost and manageability. This accelerator brings the expertise and tooling to ensure your Dataflow migration is a success.

20% Quicker to production*

30% Cost Savings*

Typical Challenges

  • Translation of existing Hadoop/Spark pipelines to Apache Beam (dataflow) is complex and time consuming
  • Infrastructure management can prove a problem if not handled properly during transition period (systems running in parallel)
  • Orchestration of the new Dataflow powered pipelines can prove challenging when migrating away from Hadoop's legacy orchestrators (Oozie, etc.)
  • Identifying and hiding PII data proves a challenge
  • Data reconciliation tests need to be re-written in the new dataflow tech

What we do

  • Rapidly translate and engineer your Hadoop powered pipelines to Dataflow ETL processes based on templates, patterns and bespoke encoders.
  • Automate the infrastructure management with Terraform scripts.
  • Implement auto-scaling to cover your needs.
  • Implement data pipeline management using orchestration tools like Composer
  • Provision Stackdriver to ensure each stage of the pipeline has monitoring and automated testing

What you get

  • Faster migration of your pipelines
  • Cost reductions (typically ~30% in comparison to typical Hadoop setup) by taking advantage of auto scaling capabilities and unification of batch and stream processes
  • Significant productivity increase as engineers can provision fast their own lab-environments without the need and oversight of central IT
  • Comprehensive monitoring and testing

* Example savings seen after implementation

Augmented intelligence, expertly engineered​

For any inquiries please email

info@datasparq.ai

Orion House, 5 Upper St Martin’s Lane
London WC2H 9EA