The SnappyData Blog

  • SnappyData Preview Officially Launched

    Jags Ramnarayan,

    The SnappyData v1.0 preview officially became available for download on Feb. 1st 2016. The code is also available in our public Github Repo. Learn more about the state of SnappyData and how we got here in this blog.

  • When Should Approximate Query Processing Be Used?

    Barzan Mozafari,

    There are a lot of conflicting opinions out there about if Approximate Query Processing (AQP) should be used and under what circumstances. This post demonstrates 4 cases where AQP makes more sense than traditional processing of the full dataset.

  • We are hiring in India

    Yogesh Mahajan,

    SnappyData is looking for engineers interested in tackling the problems of building large-scale, distributed- and stream-based processing systems.

  • Approximate is the New Precise

    Jags Ramnarayan,

    Do we always need to know the exact answer to every question especially if that answer is slow? Or is it far more valuable to have near correct answers based on a clear understanding of the past, but delivered much more quickly? Read about the methods for approximation SnappyData uses to deliver extremely fast answers

  • The SnappyData Airline Data Demo

    Pierce Lamb,

    This demo uses Apache Zeppelin to show SparkSQL analytic queries executing over about 40m records of historical airline data in Spark as well as executing over about 3% of that data in SnappyData's stratified samples. Accuracy and latency are compared.

  • The SnappyData Smart Meter Analytics Demo

    Sudhir Menon,

    In this demo we use the SnappyData platform to answer important trending and TopK queries for a data set comprised of 2.6 billion data points spread across 100000 homes (1000 homes in 100 different zip codes in California)

  • The SnappyData Technology Vision

    Jags Ramnarayan, Sudhir Menon,

    The SnappyData vision is to create a real-time analytics platform that combines probabilistic data structures, approximate query processing and in memory distributed data management to deliver powerful analytic querying and alerting capabilities on Apache Spark at a fraction of the cost of traditional big data analytics platforms.