Perform computation over 500 million vectors

This page summarizes and extends the software alternatives mentioned in the source post on Reddit.

2022-06-08

Big Data Databases Data Management

Apache Spark Landing Page
1

Apache Spark

Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Pricing:
- Open Source
I would guess that Apache Spark would be an okay choice. With data stored locally in avro or parquet files. Just processing the data in python would also work, IMO.

#Databases #Big Data #Big Data Analytics 56 social mentions
Apache Parquet Landing Page
2

Apache Parquet

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.
Pricing:
- Open Source
I would guess that Apache Spark would be an okay choice. With data stored locally in avro or parquet files. Just processing the data in python would also work, IMO.

#Databases #Big Data #Relational Databases 19 social mentions

Discuss: Perform computation over 500 million vectors

Related Posts

14 Websites to Download Research Paper for Free – 2024

ilovephd.com // 2 months ago

IMDb Alternatives

tutorialspoint.com // 10 months ago

Log analysis: Elasticsearch vs Apache Doris

doris.apache.org // 7 months ago

Rockset, ClickHouse, Apache Druid, or Apache Pinot? Which is the best database for customer-facing analytics?

embeddable.com // 6 months ago

ReductStore vs. MinIO & InfluxDB on LTE Network: Who Really Wins the Speed Race?

reduct.store // 8 months ago

Top 5 Cloud Data Warehouses in 2023

shipyardapp.com // over 1 year ago