The SnappyData Blog

  • The 5 most-read SnappyData blog posts in 2017

    Pierce Lamb,

    What SnappyData, Spark, and in-memory database topics got the most attention on our blog in 2017? Find out inside.

  • Benchmarking Apache Spark with Cassandra, Kudu, Alluxio, Spark cache and SnappyData

    Swati Sawant, Kishor Bachav & Shyja Prabhu,

    In this blog, we will compare SnappyData with the Spark cache, Kudu, Alluxio, and Cassandra while using their Spark connector and show that SnappyData is roughly 1-3 orders of magnitude faster than these other stores in loading data, performing analytics queries, point lookups and point updates.

  • Making Apache Spark the most versatile, fast data platform ever

    Jags Ramnarayan,

    SnappyData's 1.0 version is now generally available. In the last year, the team closed about 1000 JIRA tickets, improved performance 5-10 fold while supporting several customers and the community. The project roughly added 200K source lines and another 70K lines of test code. Learn more in this blog.

  • How Mutable DataFrames improve join performance in Spark SQL

    Sudhir Menon,

    In this blog we showcase a credit card fraud detection example where performance is limited by a vanilla Spark solution to joining a streaming DataFrame with a static DataFrame. We demonstrate how performance is improved by using Mutable DataFrames inside SnappyData. Code examples are provided.

  • Running Spark SQL CERN queries 5x faster on SnappyData

    Sudhir Menon,

    In a recent blog post, Luca Canali from CERN tested the performance improvement betwen Spark 1.6 and Spark 2.0 using a Spark SQL join with two conditions. CERN discovered a 7x performance improvement from 1.6 -> 2.0. We ran the same query on equivalent hardware on SnappyData and discovered a 5x performance improvement from Spark 2.0 to Snappy. Learn more inside.

  • Joining a billion rows 20x faster than Apache Spark

    Sumedh Wale,

    One of Databricks’ most well-known blogs is the blog where they describe joining a billion rows in a second on a laptop. Since this is a fairly easy benchmark to replicate, we thought, why not try it on SnappyData and see what happens? We found that for joining two columns with a billion rows, SnappyData is nearly 20x faster.

  • SnappyData 0.7 now available: Up to 20x faster than Spark SQL and many more enhancements

    Neeraj Kumar,

    In this release, we are excited to demonstrate performance of up to 20X over Apache Spark 2.0, depending on the SparkSQL workload in question. Scan dependent workloads perform much better on SnappyData (the changes are discussed in this blog). We have improved the developer experience through one-click cloud services, better documentation, a new UI that extends the Spark console a dedicated section in our documentation for readymade code snippets to understand different aspects of the product better and many Synopses Data Engine improvements

  • SnappyData as The Data Store for Spark

    Rishitesh Mishra,

    SnappyData changes Spark into a datastore that supports real time Spark applications. It supports a high volume of writes, point updates, and point queries. SnappyData can store data in the same executor JVMs as that of Spark (Unified Mode) or out of process (Split Mode). The focus of this document is on Unified Mode.