Abstract
4 min readDIFS: A Distributed Index for Features in Sensor Networks Benjamin Greenstein , Deborah Estrin , Ramesh Govindan , Sylvia Ratnasamy , and Scott Shenker Department of Computer Science, University of California at Los Angeles, email: ben,destrin @cs.ucla.edu Department of Computer Science, University of Southern California, email: ramesh@usc.edu Intel Research, Berkeley, email: sylvia@intel-research.net International Computer Science Institute, email: shenker@icsi.berkeley.edu Abstract— Sensor networks pose new challenges in the collec- tion and distribution of data. Recently, much attention has been focused on standing queries that use in-network aggregation of time series data to return data statistics in a communication- efficient manner. In this work, rather than consider searches over time series data, we consider searches over semantically rich high-level events, and present the design, analysis, and numerical simulations of a spatially distributed index that provides for efficient index construction and range searches. The scheme provides load balanced communication over index nodes by using the governing property that the wider the spatial extent known to an index node, the more constrained is the value range covered by that node. I. I NTRODUCTION There has been considerable interest recently in research in wireless sensor networks, a technology that promises analysis of and interaction with the environment at spatial and temporal densities not possible using conventional approaches. The nodes in such networks are equipped with sensors, local storage, CPUs and radio communication facilities, allowing them to both sense the local environment and communicate locally with other sensors in order to construct semantically rich conclusions about the environment that they are sensing, such as detecting the presence of animals, or of hotspots, or of other “events” [12], [15], [16], [23]. The primary resource constraint of nodes in such networks is energy. Nodes are expected to be long-lived (deployed not for hours, but for years), untethered (both in terms of communication and power), and unattended (and so must be self-configuring and self-adapting). Energy must be carefully budgeted and conserved, so all sensornet algorithms must minimize energy use. The primary energy consumer in such systems is radio transmission. For one scenario, Pottie and Kaiser explain that the cost of transmitting 1Kb a distance of 100 meters is approximately equal to the cost of executing three million CPU instructions [21]. Furthermore, the cost of reception in these systems is often almost as much as that of transmission. Sensornets collect a tremendous amount of detailed time series data about the environment. As sensornet research and experience has accumulated, many different approaches for accessing this data have been proposed. The conventional approach to storing time series data is to have all sensing nodes feed their data to a central repository external to the sensing environment. While this approach provides for complete flexibility in processing the data, it incurs signifi- cant energy expenditure to send every sensor reading to an external site. Furthermore, links near a gateway to an external storage repository will become communication bottlenecks as the network size and amount of data produced increase. Consequently, in energy-constrained sensor networks it may be necessary to store data locally at or near the location of generation. One approach to retrieve this stored data is to flood a query to all nodes that could potentially have suitable data, and have them send their response to the (perhaps external) querying node. In such an approach, data is sent when (i.e., in response to an actual query) and where it is required. Some queries will originate within the sensornet itself, and in that case it makes little sense to send the data to an external site only to have it shipped back to the internal querying node. Furthermore, if far more data is collected than is actually required to answer queries, then this local storage approach results in significant energy savings. There are two extensions of this approach that lead to further energy savings. First, the data can be processed, aggregated, and/or pruned as it propagates toward the query sink. The authors of Directed Diffusion, TAG, and others describe par- ticular forms of in-network aggregation and pruning of data that can select relevant data and produce statistics such as medians, averages or maximums [16], [1]. This approach uses “data-centric” routing in that queries are not directed towards individual nodes, but rather are stated only in terms of the desired data. Second, the data can be processed locally to identify high-level “events” that are of interest. These events can refer directly to sensor readings, such as areas of relatively high temperatures, or to the conclusions of rather sophisticated identification algorithms, such as animal or vehicle sightings. In either case, the queries are directly for such events, and the responses contain summarized data about such events. Here, too, the routing is data-centric, but the queries (and responses) deal with higher-level abstractions. These energy saving extensions reduce the energy required to respond to queries, but do not alter the basic “flood-then- respond” approach which incurs an inherent cost of flooding each query to all nodes (or at least to all nodes that could possibly have relevant data). If the rate of queries is relatively
Discussion(0)
No comments yet. Be the first to comment.