Blink and it's done: Interactive queries on very large data

In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for run-ning interactive queries on large volumes of data. The key obser-vation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H bench-mark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150 × faster than Hive on MapReduce and 10−150 × faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2 − 10%. 1.

Discussion(0)

No comments yet. Be the first to comment.

Publication Info

Year: 2012
Published: —
Language: en

Article Details

Link Of The Paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.649.704

Timeline

Created:June 19, 2026

Related publications

Article2012

Blink and it's done: Interactive queries on very large data

Abstract

Discussion(0)

Related publications

Blink and it's done

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB

Knowing when you're wrong: Building fast and reliable approximate query processing systems

Knowing when you're wrong