Software Alternatives, Accelerators & Startups

How to Build a Streaming Deduplication Pipeline with Kafka, GlassFlow, and ClickHouse

Apache Kafka ClickHouse
  1. Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala.
    Pricing:
    • Open Source
    Kafka: Our trusty message bus. Events land here first.

    #Stream Processing #Data Integration #ETL 144 social mentions

  2. ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real time.
    Pricing:
    • Open Source
    ClickHouse: A fast columnar database. It will be our final destination for clean data. And, for simplicity in this tutorial, we'll cleverly use it as our "memory" or state store to remember which events we've already seen recently.

    #Databases #Relational Databases #Data Warehousing 57 social mentions

Discuss: How to Build a Streaming Deduplication Pipeline with Kafka, GlassFlow, and ClickHouse

Log in or Post with