Software Alternatives & Reviews

Beginner question about transformation

Apache Spark Apache Parquet
  1. Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
    Pricing:
    • Open Source
    You should also consider distribution of data because in a company that has machine learning workflows, the same data may need to go through different workflows using different technologies and stored in something other than a data warehouse, e.g. Feature engineering in Spark and loaded/stored in binary format such as Parquet in a data lake/object store.

    #Databases #Big Data #Big Data Analytics 56 social mentions

  2. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.
    Pricing:
    • Open Source
    You should also consider distribution of data because in a company that has machine learning workflows, the same data may need to go through different workflows using different technologies and stored in something other than a data warehouse, e.g. Feature engineering in Spark and loaded/stored in binary format such as Parquet in a data lake/object store.

    #Databases #Big Data #Relational Databases 19 social mentions

Discuss: Beginner question about transformation

Log in or Post with