The problem of searching over a large number of data streams for identifying one that holds certain features of interest is considered. The data streams are assumed to be generated by one of two possible statistical distributions with cumulative distribution functions F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> and F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> and the objective is to identify one sequence generated by F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> as quickly as possible, and prior to a pre-specified deadline. Furthermore, it is assumed that the generation of the data streams follows a known dependency kernel such that the likelihood of a sequence being generated by F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> depends on the underlying distributions of the other data streams. The optimal sequential sampling strategy is characterized, and numerical evaluations are provided to illustrate the gains of incorporating the information about the dependency structure into the design of the sampling process.
Discussion(0)
No comments yet. Be the first to comment.