Software Alternatives & Reviews

Hydrating a Data Lake using Query-based CDC with Apache Kafka Connect and Kubernetes on AWS

Apache Parquet Amazon S3 Amazon RDS for PostgreSQL
  1. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.
    Pricing:
    • Open Source
    This post describes how to use Kafka Connect to move data out of an Amazon RDS for PostgreSQL relational database and into Kafka. It continues by moving the data out of Kafka into a data lake built on Amazon Simple Storage Service (Amazon S3). The data imported into S3 will be converted to Apache Parquet columnar storage file format, compressed, and partitioned for optimal analytics performance by Kafka Connect.

    #Databases #Big Data #Relational Databases 19 social mentions

  2. Amazon S3 is an object storage where users can store data from their business on a safe, cloud-based platform. Amazon S3 operates in 54 availability zones within 18 graphic regions and 1 local region.
    This post describes how to use Kafka Connect to move data out of an Amazon RDS for PostgreSQL relational database and into Kafka. It continues by moving the data out of Kafka into a data lake built on Amazon Simple Storage Service (Amazon S3). The data imported into S3 will be converted to Apache Parquet columnar storage file format, compressed, and partitioned for optimal analytics performance by Kafka Connect.

    #Cloud Hosting #Object Storage #Cloud Storage 171 social mentions

  3. PostgreSQL as a Service
    This post describes how to use Kafka Connect to move data out of an Amazon RDS for PostgreSQL relational database and into Kafka. It continues by moving the data out of Kafka into a data lake built on Amazon Simple Storage Service (Amazon S3). The data imported into S3 will be converted to Apache Parquet columnar storage file format, compressed, and partitioned for optimal analytics performance by Kafka Connect.

    #Databases #Relational Databases #Cloud Hosting 14 social mentions

Discuss: Hydrating a Data Lake using Query-based CDC with Apache Kafka Connect and Kubernetes on AWS

Log in or Post with