I work at VMware and we use one tool for the whole ELT, it was made internally as there was no good alternative at the time and now we opensourced it, here it is: https://github.com/vmware/versatile-data-kit. Source: about 1 year ago
"suggestions on how to reduce the time spent on initially generating and adjusting the code" is using some tools that automate ELT. Here's one open-source tool I'm working on with my team: https://github.com/vmware/versatile-data-kit. Source: over 1 year ago
Have you heard about versatile data kit (https://github.com/vmware/versatile-data-kit)? I think it meets your needs perfectly:. Source: over 1 year ago
Versatile Data Kit is a framework to bBuild, run and manage your data pipelines with Python or SQL on any cloud https://github.com/vmware/versatile-data-kit Here's a list of good first issues: https://github.com/vmware/versatile-data-kit/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 Join our slack channel to connect with our team: https://cloud-native.slack.com/archives/C033PSLKCPR. Source: over 1 year ago
There are some DE tools now that provide automation, so you don't need to have advanced Python to build your pipelines, like this one here: https://github.com/vmware/versatile-data-kit. Source: over 1 year ago
Okay, I will explain what I am doing and how I see the "fun" in the project. I work with an open-source framework for data engineers. The community members are developers and people who use the tool - DEs. Indeed, I am facilitating a monthly community meeting for everyone to meet and discuss important topics, but that's the only part that takes their direct time, and it's totally voluntary, so DEs usually don't... Source: over 1 year ago
If you're looking for a one-tool solution for your ETL, I think you should check out Versatile Data Kit. You can build workloads for each of the steps in your process - i.e. However many Selenium scrapers, transformation batches, etc. Assuming you aren't using anything besides Python and SQL, and then schedule them to run periodically, in your case weekly. You can configure a connection to whatever SQL database... Source: over 1 year ago
I think for your case using only Versatile Data Kit might work. Source: over 1 year ago
Opensource, good for basic SQL and/or Python skills, extensible and provides support in setup/adoption of the framework. Https://github.com/vmware/versatile-data-kit I'm the community manager for this project, I built my first full ELT pipeline (tracking GitHub stats) with no previous experience on my first month totally by myself. It's covering the full data journey. Oh, and it has Airflow integration, with that... Source: over 1 year ago
Here's Versatile Data Kit, a framework for anyone with basic Python or SQL skills to build their data pipelines, hope someone finds it useful! Cheers! Source: almost 2 years ago
Do you know an article comparing Versatile Data Kit to other products?
Suggest a link to a post with product alternatives.
This is an informative page about Versatile Data Kit. You can review and discuss the product here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.