Beginner question about transformation

This page summarizes and extends the software alternatives mentioned in the source post on Reddit.

2023-03-02

Databases Big Data Big Data Analytics

Apache Spark Landing Page
1

Apache Spark

Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Pricing:
- Open Source
You should also consider distribution of data because in a company that has machine learning workflows, the same data may need to go through different workflows using different technologies and stored in something other than a data warehouse, e.g. Feature engineering in Spark and loaded/stored in binary format such as Parquet in a data lake/object store.

#Databases #Big Data #Big Data Analytics 56 social mentions
Apache Parquet Landing Page
2

Apache Parquet

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.
Pricing:
- Open Source
You should also consider distribution of data because in a company that has machine learning workflows, the same data may need to go through different workflows using different technologies and stored in something other than a data warehouse, e.g. Feature engineering in Spark and loaded/stored in binary format such as Parquet in a data lake/object store.

#Databases #Big Data #Relational Databases 19 social mentions

Discuss: Beginner question about transformation

Related Posts

14 Websites to Download Research Paper for Free – 2024

ilovephd.com // about 2 months ago

IMDb Alternatives

tutorialspoint.com // 10 months ago

Log analysis: Elasticsearch vs Apache Doris

doris.apache.org // 7 months ago

Rockset, ClickHouse, Apache Druid, or Apache Pinot? Which is the best database for customer-facing analytics?

embeddable.com // 5 months ago

ReductStore vs. MinIO & InfluxDB on LTE Network: Who Really Wins the Speed Race?

reduct.store // 8 months ago

KeyDB: A Multithreaded Redis Fork | Hacker News

news.ycombinator.com // about 5 years ago