Machine Learning Pipelines with Spark: Introductory Guide (Part 1)

Developer Tools Data Science And Machine Learning Big Data

Spark Streaming Landing Page

1

Spark Streaming

Spark Streaming makes it easy to build scalable and fault-tolerant streaming applications.

Spark Streaming: The component for real-time data processing and analytics.

#Stream Processing #Big Data #Data Management 3 social mentions
Apache Spark Landing Page
2

Apache Spark

Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Pricing:
- Open Source
Apache Spark is a fast and general open-source engine for large-scale, distributed data processing. Its flexible in-memory framework allows it to handle batch and real-time analytics alongside distributed data processing.

#Databases #Big Data #Big Data Analytics 56 social mentions
Scikit-learn Landing Page
3

Scikit-learn

scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.
Pricing:
- Open Source
The concepts are similar to the Scikit-learn project. They follow Spark’s “ease of use” characteristic giving you one more reason for adoption. You will learn more about these main concepts in this guide.

#Data Science And Machine Learning #Data Science Tools #Python Tools 28 social mentions
Pandas Landing Page
4

Pandas

Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python.
Pricing:
- Open Source
DataFrames are a Pandas-like, intuitive high-level API for working with data in Spark. It organizes data in a structured and tabular format in rows and columns, similar to a spreadsheet and a relational database management system. If you have worked with Pandas before, you should be familiar with DataFrames.

#Data Science And Machine Learning #Data Science Tools #Python Tools 199 social mentions
Apache Mesos Landing Page
5

Apache Mesos

Apache Mesos abstracts resources away from machines, enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
Pricing:
- Open Source
Spark works locally on stand-alone clusters and on Hadoop YARN, Apache Mesos, Kubernetes, and other managed Hadoop platforms.

#Developer Tools #Containers As A Service #DevOps Tools 7 social mentions
Kubernetes Landing Page
6

Kubernetes

Kubernetes is an open source orchestration system for Docker containers
Pricing:
- Open Source
Spark works locally on stand-alone clusters and on Hadoop YARN, Apache Mesos, Kubernetes, and other managed Hadoop platforms.

#Developer Tools #DevOps Tools #Containers As A Service 287 social mentions