Software Alternatives, Accelerators & Startups

Machine Learning Pipelines with Spark: Introductory Guide (Part 1)

Spark Streaming Apache Spark Scikit-learn Pandas Apache Mesos Kubernetes
  1. Spark Streaming makes it easy to build scalable and fault-tolerant streaming applications.
    Spark Streaming: The component for real-time data processing and analytics.

    #Stream Processing #Big Data #Data Management 3 social mentions

  2. Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
    Pricing:
    • Open Source
    Apache Spark is a fast and general open-source engine for large-scale, distributed data processing. Its flexible in-memory framework allows it to handle batch and real-time analytics alongside distributed data processing.

    #Databases #Big Data #Big Data Analytics 56 social mentions

  3. scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.
    Pricing:
    • Open Source
    The concepts are similar to the Scikit-learn project. They follow Spark’s “ease of use” characteristic giving you one more reason for adoption. You will learn more about these main concepts in this guide.

    #Data Science And Machine Learning #Data Science Tools #Python Tools 28 social mentions

  4. 4
    Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python.
    Pricing:
    • Open Source
    DataFrames are a Pandas-like, intuitive high-level API for working with data in Spark. It organizes data in a structured and tabular format in rows and columns, similar to a spreadsheet and a relational database management system. If you have worked with Pandas before, you should be familiar with DataFrames.

    #Data Science And Machine Learning #Data Science Tools #Python Tools 199 social mentions

  5. Apache Mesos abstracts resources away from machines, enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
    Pricing:
    • Open Source
    Spark works locally on stand-alone clusters and on Hadoop YARN, Apache Mesos, Kubernetes, and other managed Hadoop platforms.

    #Developer Tools #Containers As A Service #DevOps Tools 7 social mentions

  6. Kubernetes is an open source orchestration system for Docker containers
    Pricing:
    • Open Source
    Spark works locally on stand-alone clusters and on Hadoop YARN, Apache Mesos, Kubernetes, and other managed Hadoop platforms.

    #Developer Tools #DevOps Tools #Containers As A Service 287 social mentions

Discuss: Machine Learning Pipelines with Spark: Introductory Guide (Part 1)

Log in or Post with