Apache Spark
Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Apache Spark Alternatives
The best Apache Spark alternatives based on verified products, community votes, reviews and other factors.
Latest update:
-
Flink is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations.
-
Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.
-
Everything home service contractors need in one app.
-
Open-source software for reliable, scalable, distributed computing
-
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala.
-
Databricks provides a Unified Analytics Platform that accelerates innovation by unifying data science, engineering and business.What is Apache Spark?
-
Apache Storm is a free and open source distributed realtime computation system.
-
Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage.
-
Google Cloud Dataflow is a fully-managed cloud service and programming model for batch and streaming big data processing.
-
Amazon Kinesis services make it easy to work with real-time streaming data in the AWS cloud.
-
Amazon Elastic MapReduce is a web service that makes it easy to quickly process vast amounts of data.
-
Fast column-oriented distributed data store
-
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
-
Distributed SQL Query Engine for Big Data (by Facebook)
Generic Apache Spark discussion
Apache Spark Reviews
External sources with reviews and comparisons of Apache Spark
Categories: Databases, Big Data, Big Data Analytics, Big Data Infrastructure
Blogs
-
Public Sector
-
Compression Faceoff: Postgres TOAST vs Timescale Compression
-
SQLAlchemy 2.0.23 Released
-
Announcing the Schedule for GraphQLConf
-
Blog: Apache Airflow 2.7.0 is here
-
IBM watsonx.data - a modern open data lakehouse architecture, built on Presto!
-
Database sync: Diving deeper into Qlik and Talend data integration and quality scenarios
-
Apache Flink 1.16.1 Release Announcement
-
ClickHouse + Deepnote: Data Notebooks & Collaborative Analytics