Software Alternatives & Reviews

Evolutionary Data Infrastructure

Kafka Streams Apache Flink Google BigQuery Apache Airflow
  1. Apache Kafka: A Distributed Streaming Platform.
    Therefore, I still recommend using a streaming framework such as Apache Flink or Apache Kafka Streams.

    #Stream Processing #Analytics #Workflow Automation 14 social mentions

  2. A fully managed data warehouse for large-scale data analytics.
    Pricing:
    • Open Source
    In addition, batch tasks require knowledge of the data schema of each service in order to get the data correctly and save it to the corresponding warehouse table. Assuming our data warehouse is GCP BigQuery, the schema in the warehouse table also needs to be created and modified manually.

    #Data Management #Data Warehousing #Data Dashboard 35 social mentions

  3. Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.
    Pricing:
    • Open Source
    For batch processing, I recommend using Apache Airflow, which is easy to manage and easy to script for various DAGs to address the needs of multiple batch processing scenarios.

    #Workflows #Workflow Automation #Data Pipelines 65 social mentions

Discuss: Evolutionary Data Infrastructure

Log in or Post with