The SnappyData Blog

  • Real-Time Streaming ETL with SnappyData

    Sudhir Menon,

    In this blog we introduce the rationale for real time streaming ETL and the advantages of the SnappyData approach to real time streaming ETL. We also compare the SnappyData approach to old approaches toward ETL and show how it overcomes limitations. SnappyData's ETL tool is currently under development and will be GA later this year.

  • SnappyData takes on Aerospike: a Performance Benchmark

    Swati Sawant & Sumedh Wale,

    In this blog we compare performance between SnappyData and Aerospike when executing analytics-class and point-lookup class queries.

  • How Mutable DataFrames improve join performance in Spark SQL

    Sudhir Menon,

    In this blog we showcase a credit card fraud detection example where performance is limited by a vanilla Spark solution to joining a streaming DataFrame with a static DataFrame. We demonstrate how performance is improved by using Mutable DataFrames inside SnappyData. Code examples are provided.

  • Running Spark SQL CERN queries 5x faster on SnappyData

    Sudhir Menon,

    In a recent blog post, Luca Canali from CERN tested the performance improvement betwen Spark 1.6 and Spark 2.0 using a Spark SQL join with two conditions. CERN discovered a 7x performance improvement from 1.6 -> 2.0. We ran the same query on equivalent hardware on SnappyData and discovered a 5x performance improvement from Spark 2.0 to Snappy. Learn more inside.

  • Joining a billion rows 20x faster than Apache Spark

    Sumedh Wale,

    One of Databricks’ most well-known blogs is the blog where they describe joining a billion rows in a second on a laptop. Since this is a fairly easy benchmark to replicate, we thought, why not try it on SnappyData and see what happens? We found that for joining two columns with a billion rows, SnappyData is nearly 20x faster.

  • SnappyData 0.7 now available: Up to 20x faster than Spark SQL and many more enhancements

    Neeraj Kumar,

    In this release, we are excited to demonstrate performance of up to 20X over Apache Spark 2.0, depending on the SparkSQL workload in question. Scan dependent workloads perform much better on SnappyData (the changes are discussed in this blog). We have improved the developer experience through one-click cloud services, better documentation, a new UI that extends the Spark console a dedicated section in our documentation for readymade code snippets to understand different aspects of the product better and many Synopses Data Engine improvements