Software Alternatives, Accelerators & Startups

Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB

Apache Spark Google Cloud Spanner
  1. Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
    Pricing:
    • Open Source
    Shuffle and broadcast joins are more suitable for batch or near real-time analytics. For example, they are used in Apache Spark as the main join strategies. Co-located and pre-computed joins are faster and can be used for online transaction processing with real-time applications. They frequently rely on organizing data based on unique storage schemes supported by a database.

    #Databases #Big Data #Big Data Analytics 70 social mentions

  2. Google Cloud Spanner is a horizontally scalable, globally consistent, relational database service.
    Pricing:
    • Open Source
    Having fast distributed joins is an important consideration when it comes to selecting a scalable database that can support real-time, high-throughput, data-driven applications. In this article, we discussed how shuffle, broadcast, co-located, and pre-computed joins work. We explained that shuffle and broadcast joins are more suitable for batch or near real-time analytics because they may require moving data among nodes in a cluster, which is expensive. Co-located and pre-computed joins are faster and can do well with real-time applications. Using Google Cloud Spanner, we demonstrated how a fully managed, cloud-native relational database can take advantage of fast co-located joins. Using DataStax Astra DB, we demonstrated how a serverless, cloud-native NoSQL database can take advantage of even faster pre-computed joins.

    #Databases #Relational Databases #Tool 17 social mentions

Discuss: Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB

Log in or Post with