Spark for beginners - and you

Big Data Databases Stream Processing

Apache Storm Landing Page
1

Apache Storm

Apache Storm is a free and open source distributed realtime computation system.
Pricing:
- Open Source
Streaming: Sparks Streamings's latency is at least 500ms, since it operates on micro-batches of records, instead of processing one record at a time. Native streaming tools like Storm, Apex or Flink might be better for low-latency applications.

#Big Data #Data Dashboard #Data Warehousing 11 social mentions
Spark Streaming Landing Page

2

Spark Streaming

Spark Streaming makes it easy to build scalable and fault-tolerant streaming applications.

Is a big data framework and currently one of the most popular tools for big data analytics. It contains libraries for data analysis, machine learning, graph analysis and streaming live data. In general Spark is faster than Hadoop, as it does not write intermediate results to disk. It is not a data storage system. We can use Spark on top of HDFS or read data from other sources like Amazon S3. It is the designed for Data Analytics, Machine Learning, Streaming and Graph Analytics.

#Big Data #Stream Processing #Data Management 5 social mentions
Apache Spark Landing Page
3

Apache Spark

Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Pricing:
- Open Source
#Databases #Big Data #Big Data Analytics 70 social mentions
Apache Mesos Landing Page
4

Apache Mesos

Apache Mesos abstracts resources away from machines, enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
Pricing:
- Open Source
Cluster Modes: We can use a cluster in Standalone version or via a clustermanager either YARN or Mesos.

#Developer Tools #Containers As A Service #DevOps Tools 11 social mentions
Hadoop Landing Page
5

Hadoop

Open-source software for reliable, scalable, distributed computing
Pricing:
- Open Source
Hadoop is an ecosystem of tools for big data storage and data analysis. It is older than Spark and writes intermediate results to disk whereas Spark tires to keep data in memory whenever possible, so this is faster in many use cases.

#Databases #Relational Databases #Big Data 25 social mentions
Apache Flink Landing Page
6

Apache Flink

Flink is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations.
Pricing:
- Open Source
Streaming: Sparks Streamings's latency is at least 500ms, since it operates on micro-batches of records, instead of processing one record at a time. Native streaming tools like Storm, Apex or Flink might be better for low-latency applications.

#Stream Processing #Developer Tools #Web Frameworks 41 social mentions
Apache Apex Landing Page

7

Apache Apex

Apache Apex is an enterprise-grade unified stream and batch processing engine.

Streaming: Sparks Streamings's latency is at least 500ms, since it operates on micro-batches of records, instead of processing one record at a time. Native streaming tools like Storm, Apex or Flink might be better for low-latency applications.

#Big Data #Data Dashboard #Data Warehousing 1 social mentions