The SnappyData Blog

  • TPC-H Benchmark: Apache Spark vs SnappyData

    Trilok Khairnar,

    In this blog we present our findings comparing Apache Spark to SnappyData on the industry-standard TPC-H benchmark. SnappyData ends up executing every single query faster than Apache Spark with a total execution time that is one fourth as long as it takes Apache Spark.

  • Why every Spark developer should care about Kubernetes

    Jags Ramnarayan, Amogh Shetkar, Shirish Deshmukh,

    Kubernetes is all the rage right now, but why should Spark developers care about Kubernetes? How can Kubernetes impact the problems Spark developers run into provisioning, deploying, scaling, monitoring and transitioning Spark clusters? Find out more inside.

  • Real-Time Streaming ETL with SnappyData

    Sudhir Menon,

    In this blog we introduce the rationale for real time streaming ETL and the advantages of the SnappyData approach to real time streaming ETL. We also compare the SnappyData approach to old approaches toward ETL and show how it overcomes limitations. SnappyData's ETL tool is currently under development and will be GA later this year.

  • SnappyData takes on Aerospike: a Performance Benchmark

    Swati Sawant & Sumedh Wale,

    In this blog we compare performance between SnappyData and Aerospike when executing analytics-class and point-lookup class queries.

  • How to get SnappyData running on Amazon Web Services in a few clicks

    Pierce Lamb,

    In this blog we detail a new feature on SnappyData's site called "CloudBuilder." CloudBuilder makes it easy to get a SnappyData cluster up and running on AWS in a few clicks, often finding the cheapest deployment possible.

  • The 5 most-read SnappyData blog posts in 2017

    Pierce Lamb,

    What SnappyData, Spark, and in-memory database topics got the most attention on our blog in 2017? Find out inside.

  • Benchmarking Apache Spark with Cassandra, Kudu, Alluxio, Spark cache and SnappyData

    Swati Sawant, Kishor Bachav & Shyja Prabhu,

    In this blog, we will compare SnappyData with the Spark cache, Kudu, Alluxio, and Cassandra while using their Spark connector and show that SnappyData is roughly 1-3 orders of magnitude faster than these other stores in loading data, performing analytics queries, point lookups and point updates.

  • Making Apache Spark the most versatile, fast data platform ever

    Jags Ramnarayan,

    SnappyData's 1.0 version is now generally available. In the last year, the team closed about 1000 JIRA tickets, improved performance 5-10 fold while supporting several customers and the community. The project roughly added 200K source lines and another 70K lines of test code. Learn more in this blog.

  • How Mutable DataFrames improve join performance in Spark SQL

    Sudhir Menon,

    In this blog we showcase a credit card fraud detection example where performance is limited by a vanilla Spark solution to joining a streaming DataFrame with a static DataFrame. We demonstrate how performance is improved by using Mutable DataFrames inside SnappyData. Code examples are provided.

  • Running Spark SQL CERN queries 5x faster on SnappyData

    Sudhir Menon,

    In a recent blog post, Luca Canali from CERN tested the performance improvement betwen Spark 1.6 and Spark 2.0 using a Spark SQL join with two conditions. CERN discovered a 7x performance improvement from 1.6 -> 2.0. We ran the same query on equivalent hardware on SnappyData and discovered a 5x performance improvement from Spark 2.0 to Snappy. Learn more inside.