Amazon SageMaker might be a bit more popular than Apache Arrow. We know about 36 links to it since March 2021 and only 33 links to Apache Arrow. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
In comes Polars: a brand new dataframe library, or how the author Ritchie Vink describes it... a query engine with a dataframe frontend. Polars is built on top of the Arrow memory format and is written in Rust, which is a modern performant and memory-safe systems programming language similar to C/C++. - Source: dev.to / 2 months ago
One is related to the heritage of being built around the NumPy library, which is great for processing numerical data, but becomes an issue as soon as the data is anything else. Pandas 2.0 has started to bring in Arrow, but it's not yet the standard (you have to opt-in and according to the developers it's going to stay that way for the foreseeable future). Also, pandas's Arrow-based features are not yet entirely on... - Source: dev.to / 5 months ago
IMO a good first step would be to use the txr FFI to write a library for Apache arrow: https://arrow.apache.org/. - Source: Hacker News / 5 months ago
Polars is an open-source library for Python, Rust, and NodeJS that provides in-memory dataframes, out-of-core processing capabilities, and more. It is based on the Rust implementation of the Apache Arrow columnar data format (you can read more about Arrow on my earlier blog post “Demystifying Apache Arrow”), and it is optimised to be blazing fast. - Source: dev.to / 12 months ago
Apache Arrow (Arrow for short) is an open source project that defines itself as "a language-independent columnar memory format" (more on that later). It is part of the Apache Software Foundation, and as such is governed by a community of several stakeholders. It has implementations in several languages (C++ and also Rust, Julia, Go, and even JavaScript) and bindings for Python, R and others that wrap the C++... - Source: dev.to / 12 months ago
Damn straight. Oh, wait, some vendors have claimed to build an end-to-end solution. But, meh, that’s marketing talk. Take, for example, a well-known platform like Amazon Sagemaker, which describes itself as “a fully managed service that brings together a broad set of tools to enable high-performance, low-cost machine learning (ML) for any use case.” It’s a great platform. My startup has even partnered with them.... - Source: dev.to / 11 days ago
At this point, probably everyone has heard about OpenAI, GPT-4, Claude or any of the popular Large Language Models (LLMs). However, using these LLMs in a production environment can be expensive or nondeterministic regarding its results. I guess that is the downside of being good at everything; you could be better at performing one specific task. This is where HuggingFace can utilized. HuggingFace provides... - Source: dev.to / about 1 month ago
Generative Artificial Intelligence (GenAI) is a type of artificial intelligence that can generate text, images, or other media using generative models. AWS offers a range of services for building and scaling generative AI applications, including Amazon SageMaker, Amazon Rekognition, AWS DeepRacer, and Amazon Forecast. AWS has also invested in developing foundation models (FMs) for generative AI, which are... - Source: dev.to / 4 months ago
Amazon and Azure already have much of what you're talking about in AWS SageMaker and Azure MLOps. Source: 11 months ago
And there have been several platforms that help fine-tune pretrained models, such as Google Cloud AutoML and Amazon Sagemaker. These tools are often fairly easy to use, but they come at a cost. They can be expensive, depending on the size of your dataset. Another option is Finetuner+, that also fine-tunes like AutoML and Sagemaker. The big advantage is that you don't need to transfer your data to other GPUs,... Source: about 1 year ago
Delta Lake - Application and Data, Data Stores, and Big Data Tools
TensorFlow - TensorFlow is an open-source machine learning framework designed and published by Google. It tracks data flow graphs over time. Nodes in the data flow graphs represent machine learning algorithms. Read more about TensorFlow.
Redis - Redis is an open source in-memory data structure project implementing a distributed, in-memory key-value database with optional durability.
IBM Watson Studio - Learn more about Watson Studio. Increase productivity by giving your team a single environment to work with the best of open source and IBM software, to build and deploy an AI solution.
Apache Parquet - Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.
Pega Platform - The best-in-class, rapid no-code Pega Platform is unified for building BPM, CRM, case management, and real-time decisioning apps.