Software Alternatives & Reviews

Data pipelines with Luigi

spaCy Luigi Dask Apache Airflow
  1. 1
    spaCy is a library for advanced natural language processing in Python and Cython.
    Pricing:
    • Open Source
    We have tasks which actually require lots of different Spacy language models to be loaded at once, and we load them on many processes at once.

    #Natural Language Processing #NLP And Text Analytics #Spreadsheets 58 social mentions

  2. 2
    Luigi is a Python module that helps you build complex pipelines of batch jobs.
    At Wonderflow we're doing a lot of ML / NLP using Python and recently we are enjoying writing data pipelines using Spotify's Luigi.

    #DevOps Tools #Workflow Automation #Workflows 9 social mentions

  3. 3
    Dask natively scales Python Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
    Pricing:
    • Open Source
    To do that, we are efficiently using Dask, simply creating on-demand local (or remote) clusters on task run() method:.

    #Workflows #Databases #Software Development 16 social mentions

  4. Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.
    Pricing:
    • Open Source
    Moreover, configure and deploy the Luigi's Scheduler on a server / pod for production use is easy, while it might be not for other similar tools like Apache AirFlow.

    #Workflows #Workflow Automation #Data Pipelines 65 social mentions

Discuss: Data pipelines with Luigi

Log in or Post with