Software Alternatives & Reviews

Python & ETL 2020: A List and Comparison of the Top Python ETL Tools

Recommended and mentioned products

  1. Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.

    Air flow maximization. about 7 days ago

    LOL, not sure if your joking or not but ... This sub is for a software package called Airflow (https://airflow.apache.org/), not physical airflow.
  2. Luigi is a Python module that helps you build complex pipelines of batch jobs.

    Data pipelines with Luigi about 28 days ago:

    At Wonderflow we're doing a lot of ML / NLP using Python and recently we are enjoying writing data pipelines using Spotify's Luigi.
  3. Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python.

    Best Data Structure for this? about 1 day ago

    If you really want to store it all (labels included) in one data structure, you should look up pandas.
  4. p

    petl

    This hasn't been added to SaaSHub yet

  5. B

    Bonobo

    This hasn't been added to SaaSHub yet

  6. Drop comment bubbles anywhere on the screen, on any website.

    Ask HN: Who is hiring? (November 2021) about 3 months ago:

    Bubbles | Remote (US timezones) | Senior Full-stack Engineer | Full-time | https://usebubbles.com Bubbles is enabling the world to work asynchronously by building the next generation communication platform. Rather than being stuck in Zoom all day and having to work around other people's schedules, Bubbles enables you to work on your own time and at your own pace while still having the context you need to discuss...
  7. Beautiful Soup: a library designed for screen-scraping HTML and XML.

  8. PQ

    PyQuery

    This hasn't been added to SaaSHub yet

  9. Blaze is an application launcher that distinguishes from amongst the others by being able to automate recurrent tasks performed in the file-system or even any application on Microsoft Windows.

  10. Dask natively scales Python Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love

    What does it mean to scale your python powered pipeline? about 16 days ago:

    Dask: Distributed data frames, machine learning and more.
  11. D

    Datashape

    This hasn't been added to SaaSHub yet

  12. DND

    DyND

    This hasn't been added to SaaSHub yet

  13. Easy and secure network access management, without VPNs

  14. J

    Joblib

    This hasn't been added to SaaSHub yet

  15. l

    lxml

    This hasn't been added to SaaSHub yet

  16. R

    Retrying

    This hasn't been added to SaaSHub yet

  17. r

    riko

    This hasn't been added to SaaSHub yet

  18. Xplenty is the #1 SecurETL - allowing you to build low-code data pipelines on the most secure and flexible data transformation platform. No longer worry about manual data transformations. Start your free 14-day trial now.

    Free Trial

  19. AWS Data Pipeline is a cloud-based data workflow service that helps you process and move data between different AWS services and on-premise.

    Any data engineers familiar with building pipelines in AWS? about 9 months ago

    Unfortunately there's just so many options for data ingest. Any programming language could be used, and there's plenty of off-the-shelf software and SaaS solutions to do it too. For example it could be done with AWS Data Pipeline (https://aws.amazon.com/datapipeline) or maybe there's just a EC2 virual machine running some custom python code that is doing it.
  20. Fully managed extract, transform, and load (ETL) service

    Machine Learning Best Practices for Public Sector Organizations about 3 months ago:

    AWS Glue is a fully managed ETL service that makes it simple and cost-effective to categorize, clean, enrich, and migrate data from a source system to a data store for ML.
  21. AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.

    Launching VM instances only when needed about 3 months ago

    If you're looking for something a bit more managed, check out Batch. It's basically a managed AWS service that does a lot of what I describe, kind of the same, but kind of differently. Batch has its own workflow peculiarities, but you may prefer dealing with those rather than dealing with something custom hacked together to behave like it.
  22. Google Cloud Dataflow is a fully-managed cloud service and programming model for batch and streaming big data processing.

    Google Pub/Sub client library for R about 2 months ago:

    Stream data into Dataflow pipelines from R.
  23. Learn more about Azure Data Factory, the easiest cloud-based hybrid data integration solution at an enterprise scale. Build data factories without the need to code.

    Deploying Azure Data Factory using Bicep about 8 months ago

    I'm also planning to do more content with Azure Data Factory, so I'd thought it be good to make a video combining the two.
  24. T

    Toil

    This hasn't been added to SaaSHub yet

  25. Pachyderm is an open source analytics engine that uses Docker containers for distributed computations.

  26. Probably the best marketing automation software. Fully featured, from email to product recommendations, surveys, A/B testing and lots more. All-in-one marketing automation software to increase sales and conversion rate.

  27. P

    Pinball

    This hasn't been added to SaaSHub yet

  28. Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs.

  29. D

    Dray.it

    This hasn't been added to SaaSHub yet

  30. Spark helps you take your inbox under control. Instantly see what’s important and quickly clean up the rest. Spark for Teams allows you to create, discuss, and share email with your colleagues

    Hello Reddit, I'm Product Lead with 7yrs of experience from... about about 1 month ago

    For email I've been using Spark in the past year and it helps me lot in reaching near mailbox 0.
  31. Pentaho Data Integration ( ETL ) a.k.a Kettle

  32. Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java.

  33. Apatar provides a set of tools for data integration and migration.

  34. Learn about how the world's most widely used business intelligence suite leverages open source for the best and most cost-effective reporting, dashboards and analytics available.

  35. Level up your Java code and explore what Spring can do for you.

    Ask HN: What's your Go-to web stack for Java? about 7 months ago:

    Spring Boot: is my go to for REST APIs or workers https://spring.io/projects/spring-boot Spring Batch: for async/batch work https://spring.io/projects/spring-batch.
  36. EB

    EasyBatch

    This hasn't been added to SaaSHub yet

  37. GET

    GETL

    This hasn't been added to SaaSHub yet

  38. JSR

    JSR 352

    This hasn't been added to SaaSHub yet

  39. Crunch is a No Judgment Gym that believes in making serious exercise fun by fusing fitness and entertainment. Join Crunch for all your fitness needs!

  40. NoFlo is a JavaScript implementation of Flow-Based Programming (FBP).

    Building a basic app using open-source instead of coding it... about 5 days ago:

    Well you can build independent services for you project and use some OOB solutions for each service, but you still will need to write some code to properly call your APIs and utilize the result. There is some visual blocks-based project: Https://noflojs.org/.
  41. E

    Extraload

    This hasn't been added to SaaSHub yet

  42. E

    Empujar

    This hasn't been added to SaaSHub yet

  43. D

    Datapumps

    This hasn't been added to SaaSHub yet

  44. p

    proc-that

    This hasn't been added to SaaSHub yet