Software Alternatives & Reviews

Python & ETL 2020: A List and Comparison of the Top Python ETL Tools

Apache Airflow Luigi Pandas Bubbles Blaze Dask
  1. Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.
    Pricing:
    • Open Source
    When does Apache Airflow make sense? If you're performing long ETL jobs or your ETL has multiple steps, Airflow will let you restart from any point during the ETL process. That being said, Apache Airflows IS NOT a library, so it has to be deployed and may make less sense on small ETL jobs.

    #Workflows #Workflow Automation #Data Pipelines 65 social mentions

  2. 2
    Luigi is a Python module that helps you build complex pipelines of batch jobs.
    When does Luigi make sense? If you need to automate simple ETL processes (like logs) Luigi can handle them rapidly and without much setup. When it comes to complex tasks, Luigi is limited by its strict pipeline-like structure.

    #DevOps Tools #Workflow Automation #Workflows 9 social mentions

  3. 3
    Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python.
    Pricing:
    • Open Source
    When it comes to ETL, you can do almost anything with Pandas if you're willing to put in the time. Plus, pandas is extraordinarily easy to run. You can set up a simple script to load data from a Postgre table, transform and clean that data, and then write that data to another Postgre table.

    #Data Science And Machine Learning #Data Science Tools #Python Tools 196 social mentions

  4. 4
    p

    petl

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  5. 5
    B

    Bonobo

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  6. Drop comment bubbles anywhere on the screen, on any website.
    Bubbles is another Python framework that you can use to run ETL. Unlike some other ETL frameworks, Bubbles uses metadata to describe pipelines as opposed to script-based. While Bubbles is written for Python, the author claims that it's not meant to be Python-exclusive in nature. A driving theme of Bubbles is that it's technologically agnostic, so you don't have to worry about working with or accessing data — just the transformation.

    #Chrome Extensions #Screen Recording #Video Maker 1 social mentions

  7. Beautiful Soup: a library designed for screen-scraping HTML and XML.
    BeautifulSoup: This Python tool pulls data out of webpages (XML, HTML). Has integrations with tons of ETL tools like petl.

    #Web Scraping #Data Extraction #Data

  8. 8
    PQ

    PyQuery

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  9. 9
    Blaze is an application launcher that distinguishes from amongst the others by being able to automate recurrent tasks performed in the file-system or even any application on Microsoft Windows.
    Blaze: This is an interface that queries data. Also, this is part of the "Blaze Ecosystem" which is a framework for an ETL process using Blaze, Dask, Datashape, DyND, and Odo.

    #Apple Watch #Design Tools #Sales

  10. 10
    Dask natively scales Python Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
    Pricing:
    • Open Source
    Dask: You can use Dask for Parallel computing via task scheduling. It can also process continuous data streams. Again, this is part of the "Blaze Ecosystem."

    #Workflows #Databases #Software Development 16 social mentions

  11. 11
    D

    Datashape

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  12. 12
    DND

    DyND

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  13. 13
    NOTE: Odo.io has been discontinued.
    Easy and secure network access management, without VPNs
    Odo: This lets you move data between multiple containers. Odo lets you use the native CSV loading capabilities of SQL databases, which is faster than trying to load with Python.

    #Productivity #AI #Writing Tools

  14. 14
    J

    Joblib

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  15. 15
    l

    lxml

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  16. 16
    R

    Retrying

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  17. 17
    r

    riko

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  18. 18
    Xplenty is the #1 SecurETL - allowing you to build low-code data pipelines on the most secure and flexible data transformation platform. No longer worry about manual data transformations. Start your free 14-day trial now.
    Pricing:
    • Free Trial
    Customer Story Keith connected multiple data sources with Amazon Redshift to transform, organize and analyze their customer data. Amazon Redshift Keith Slater Senior Developer at Creative Anvil Before we started with Xplenty, we were trying to move data from many different data sources into Redshift. Xplenty has helped us do that quickly and easily. The best feature of the platform is having the ability to manipulate data as needed without the process being overly complex. Also, the support is great - they’re always responsive and willing to help. FIND OUT IF WE CAN INTEGRATE YOUR DATA TRUSTED BY COMPANIES WORLDWIDE

    #ETL #Data Integration #Monitoring Tools

  19. 19
    AWS

    AWS Data Pipeline

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  20. Fully managed extract, transform, and load (ETL) service

    #Data Integration #ETL #Data Workflow 13 social mentions

  21. AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.
    AWS Batch: This is used for batch computing jobs on AWS resources. It has insane scalability and is well-suited for engineers look to do large compute jobs.

    #Cloud Computing #Cloud Hosting #Development 14 social mentions

  22. Google Cloud Dataflow is a fully-managed cloud service and programming model for batch and streaming big data processing.

    #Big Data #Data Dashboard #Data Warehousing 14 social mentions

  23. Learn more about Azure Data Factory, the easiest cloud-based hybrid data integration solution at an enterprise scale. Build data factories without the need to code.

    #ETL #Data Integration #Workflow Automation 3 social mentions

  24. 24
    T

    Toil

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  25. Pachyderm is an open source analytics engine that uses Docker containers for distributed computations.
    Pricing:
    • Open Source
    Pachyderm: This is another great alternative to tools like Airflow. Here's a great GitHub writeup about some of the simple differences between Airflow and Pachyderm. Note: Paychyderm has an open-source edition on their website.

    #Data Science And Machine Learning #Data Science Notebooks #Machine Learning Tools 1 social mentions

  26. 26
    Probably the best marketing automation software. Fully featured, from email to product recommendations, surveys, A/B testing and lots more. All-in-one marketing automation software to increase sales and conversion rate.
    Mara: This is another ETL framework for Python. It's a middle-ground between pure Python and Apache Airflow, so it's fast and simple to set up.

    #ETL #Data Integration #Data Pipelines

  27. 27
    Pinterest's open source, scalable workflow manager
    Pinball: This is Pinterest's workflow manager. It has auto-retries, priorities, overrun policies, and tons of horizontal scalability.

    #Tech #ETL #Web Service Automation

  28. 28
    Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs.
    Pricing:
    • Open Source

    #Workflow Automation #DevOps Tools #Automation 3 social mentions

  29. 29
    D

    Dray.it

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  30. Spark helps you take your inbox under control. Instantly see what’s important and quickly clean up the rest. Spark for Teams allows you to create, discuss, and share email with your colleagues
    Pricing:
    Spark: This is a full-blown toolkit that has tons of helpful tools. With Spark Streaming you can set up your entire batch streaming ETL.

    #Email #Email Clients #Calendar 30 social mentions

  31. Pentaho Data Integration ( ETL ) a.k.a Kettle

    #Data Integration #Workflows #ETL

  32. Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java.

    #Data Integration #ETL #Stream Processing

  33. 33
    NOTE: apatar has been discontinued.
    Apatar provides a set of tools for data integration and migration.

    #Data Integration #ETL #Monitoring Tools

  34. Learn about how the world's most widely used business intelligence suite leverages open source for the best and most cost-effective reporting, dashboards and analytics available.

    #Data Dashboard #Business Intelligence #Data Visualization

  35. Level up your Java code and explore what Spring can do for you.

    #Workflow Automation #Databases #Data Dashboard 2 social mentions

  36. 36
    EB

    EasyBatch

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  37. 37
    GET

    GETL

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  38. 38
    JSR

    JSR 352

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  39. 39
    Crunch is a No Judgment Gym that believes in making serious exercise fun by fusing fitness and entertainment. Join Crunch for all your fitness needs!

    #Health & Wellness #Health And Fitness #Gym And Fitness Studio Management

  40. 40
    NoFlo is a JavaScript implementation of Flow-Based Programming (FBP).

    #Business Intelligence #Automation #Data Dashboard 2 social mentions

  41. 41
    E

    Extraload

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  42. 42
    E

    Empujar

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  43. 43
    D

    Datapumps

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

  44. 44
    p

    proc-that

    This product hasn't been added to SaaSHub yet
    Dray.it: This is a Docker workflow engine that helps tremendously with resource management.

Discuss: Python & ETL 2020: A List and Comparison of the Top Python ETL Tools

Log in or Post with