Software Alternatives & Reviews

Help me figure out ETL and storage for user search and click logs. So lost in all the DB alternatives.

Apache Parquet Minio Amazon S3
  1. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.
    Pricing:
    • Open Source
    Write a python script to convert the JSON data you receive to Parquet. Put the parquet data in the S3 bucket. Parquet is an open source data serialization format. Almost all the major tooling (including pandas) supports it so even if you move solutions (to Snowflake, etc) your data will already be setup. It also compresses the data and makes it faster to do aggregate queries like counts and sums.

    #Databases #Big Data #Relational Databases 19 social mentions

  2. 2
    Minio is an open-source minimal cloud storage server.
    Buy a server for your office and get a big SSD (multiple terabytes). Run Minio on it. Minio is an open source, self hosted version of S3.

    #Cloud Storage #Cloud Computing #Object Storage 154 social mentions

  3. Amazon S3 is an object storage where users can store data from their business on a safe, cloud-based platform. Amazon S3 operates in 54 availability zones within 18 graphic regions and 1 local region.
    Setup an [AWS S3 Bucket](https://aws.amazon.com/s3/) to store the data. This will likely cost less than $100 a year to store the data. These buckets can be setup to be fully restricted to only you and your coworker. They can even be restricted to only be accessible via your office wifi network if you need that kind of security.

    #Cloud Hosting #Object Storage #Cloud Storage 170 social mentions

Discuss: Help me figure out ETL and storage for user search and click logs. So lost in all the DB alternatives.

Log in or Post with