ACID Transactions
Delta Lake provides ACID transaction capabilities, which ensure data integrity and reliability across operations, allowing for data consistency even in the case of concurrent reads and writes.
Time Travel
Delta Lake enables time travel, allowing users to query snapshots of data at different points in the past. This feature is useful for auditing, debugging, and recovering data.
Scalability
Delta Lake is built on top of Apache Spark, allowing it to scale efficiently across big data workloads and handle large volumes of data with ease.
Schema Evolution
Delta Lake supports schema evolution, allowing schema changes such as adding or deleting columns, without significantly affecting data ingestion or requiring rewrite of historical data.
Unified Batch and Streaming
Delta Lake offers support for both batch and streaming data processing, simplifying data pipelines and reducing the complexity of data workflows.
We have collected here some useful links to help you find out if Delta Lake is good.
Check the traffic stats of Delta Lake on SimilarWeb. The key metrics to look for are: monthly visits, average visit duration, pages per visit, and traffic by country. Moreoever, check the traffic sources. For example "Direct" traffic is a good sign.
Check the "Domain Rating" of Delta Lake on Ahrefs. The domain rating is a measure of the strength of a website's backlink profile on a scale from 0 to 100. It shows the strength of Delta Lake's backlink profile compared to the other websites. In most cases a domain rating of 60+ is considered good and 70+ is considered very good.
Check the "Domain Authority" of Delta Lake on MOZ. A website's domain authority (DA) is a search engine ranking score that predicts how well a website will rank on search engine result pages (SERPs). It is based on a 100-point logarithmic scale, with higher scores corresponding to a greater likelihood of ranking. This is another useful metric to check if a website is good.
The latest comments about Delta Lake on Reddit. This can help you find out how popualr the product is and what people think about it.
A common solution is using open table formats like Apache Iceberg(others are Delta lake and Apache Hudi). With these tools you get the benefits of traditional database functionality on your data lake i.e ACID guarantees, transactions. The Iceberg specification defines an open table format that enables accessing related data stored in separate files in a distributed storage system, as one table. - Source: dev.to / 8 months ago
Delta Lake: Delta Lake is an open-source storage layer that provides ACID transactions, scalable metadata management, and data versioning on top of existing data lakes. It aims to bring reliability and performance optimizations to big data workloads while ensuring data integrity and consistency. - Source: dev.to / about 1 year ago
When it comes to stream processing systems, Iceberg support varies across vendors. Databricks, which oversees Spark Streaming, focuses on Delta Lake. Apache Flink, heavily influenced by Alibabaโs contributions, promotes Paimon, an alternative to Iceberg. RisingWave, on the other hand, fully embraces Iceberg. Rather than focusing solely on one table format, RisingWave aims to support various catalog services,... - Source: dev.to / over 1 year ago
Delta Lake is a storage layer framework that provides reliability to data lakes. It addresses the challenges of managing large-scale data in lakehouse architectures, where data is stored in an open format and used for various purposes, like machine learning (ML). Data engineers can build real-time pipelines or ML applications using Delta Lake because it supports both batch and streaming data processing. It also... - Source: dev.to / almost 2 years ago
There is a neat example, of how a third party project belonging to the Linux Foundation, is implementing UserDefinedLogicalNodeCore: MetricObserver in delta-rs. The developer had to use only #[derive(Debug, Hash, Eq, PartialEq)] to get dyn_eq and dyn_hash implemented. - Source: dev.to / almost 2 years ago
Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it. I think the website is here: https://delta.io. - Source: Hacker News / over 2 years ago
Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake. - Source: dev.to / over 2 years ago
The Apache Spark / Databricks community prefers Apache parquet or Linux Fundation's delta.io over json. Source: over 2 years ago
Databricks provides Jupyter lab like notebooks for analysis and ETL pipelines using spark through pyspark, sparkql or scala. I think R is supported as well but it doesn't interop as well with their newer features as well as python and SQL do. It interfaces with cloud storage backend like S3 and offers some improvements to the parquet format of data querying that allows for updating, ordering and merged through... - Source: Hacker News / almost 3 years ago
Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond). Source: almost 3 years ago
Take a look at Delta Lake https://delta.io, it enables a lot of database-like actions on files. Source: about 3 years ago
This sounds like a new trending destination to take selfies in front of, but itโs even better than that. Delta Lake is an โopen-source storage layer designed to run on top of an existing data lake and improve its reliability, security, and performance.โ (source). It letโs you interact with an object storage system like you would with a database. - Source: dev.to / about 3 years ago
You are right, delta.io is just a framework. Sorry for the unclear question. Another try: when you host spark on your own with delta as table format compared to usage of Databricks, what are the differences? Source: about 3 years ago
I mean the different between using the delta.io framework to let it run on your own machines/ vms vs using databricks and have clusters defined. Source: about 3 years ago
Is there actually any company implementing delta.io self hosted beside microsoft/synapse and databricks? Would it be worth the effort compared to the features microsoft/databricks bring to the table? Source: about 3 years ago
We are happy to announce our third opensource project - Delta Fetch. Delta Fetch is a configurable HTTP API service for accessing Delta Lake tables. Service is highly configurable, with possibility to filter your Delta tables by selected columns. - Source: dev.to / about 3 years ago
Iโd suggest looking at the open table formats. Delta lake does an excellent job at providing batch and streaming APIs for Spark. This would unify your workloads. It would follow the medallion architecture which is a bit more popular lately. Aspects of the lamda architecture can still be present in the medallion model, especially when real-time requirements are present. Source: over 3 years ago
I've installed the stack (Hadoop, Hive, Spark) into a Centos VM, built everything from sources to make sure it fits together. Then added Delta Lake (delta.io) from their maven repo. Source: over 3 years ago
(I configured delta.io in $SPARK_HOME/conf/spark-defaults.xml so it's loaded & available). Source: over 3 years ago
You can query data organized in many open table formats like Apache Iceberg and Delta Lake. (Here is a good article on what is a table format and the differences between different ones). - Source: dev.to / almost 4 years ago
Bit more specific: https://delta.io/. Source: almost 4 years ago
Do you know an article comparing Delta Lake to other products?
Suggest a link to a post with product alternatives.
Is Delta Lake good? This is an informative page that will help you find out. Moreover, you can review and discuss Delta Lake here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.