Despite continuous investments in data technologies, the latency of querying data still poses a significant challenge. Modern analytic solutions require near real-time responsiveness both to make them interactive and to support automated processing. Current technologies (Hadoop, Spark, Dataflow) scan the dataset to execute queries and focus on providing scalable data storage and in-memory concurrent data processing to maximize task execution speed. We argue that these solutions fail to offer an adequate level of interactivity, since they depend on continual access to data. In this paper, we present a method for query approximation, also known as approximate query processing (AQP), that reduces the need to scan data during inference (query calculation), thus enabling a rapid query processing tool. We use an LSTM network to learn the relationship between queries and their results, and to provide a rapid inference layer for the prediction of query results. Our method (referred to as "Hunch") produces a lightweight LSTM network which provides high query throughput. We evaluated our method using 12 datasets and compared it to state-of-the-art AQP engines (VerdictDB, BlinkDB) in terms of the query latency, model weight, and accuracy. The results show that our method predicted query results with a normalized root mean squared error (NRMSE) ranging from approximately 1% to 4%, which, for the majority of our datasets, was better than the results of the benchmarks. Moreover, our method was able to predict up to 120,000 queries in a second (streamed together) and with a single query latency of no more than 2 ms.
Discussion(0)
No comments yet. Be the first to comment.