Mid-level vision refers to a collection of visual processes and their associated principles, including contour grouping, region grouping, and figure/ground organization. Evidence from psychophysics strongly suggests that mid-level vision is a key component of the visual system and heavily interacts with object and scene recognition. We develop a unified probabilistic framework of mid-level vision, both motivated by and evaluated on human-annotated collections of natural images.
We use human-marked boundaries in natural images as ground-truth data. Empirical studies of these boundaries reveal a number of power laws distributions, confirming the intuition that natural images are multi-scale or near scale-invariant in nature. We show that pixel-based Markov models fail to capture such invariance.
We propose a scale-invariant mid-level representation from bottom-up. We detect edges in an image, build a discrete piecewise linear approximation of the edges, and finally use constrained Delaunay triangulation (CDT) to complete gradient-less gaps and partition the image into regions. We show that the CDT graph is a compact representation with little loss of structure.
On top of the CDT representation, we formulate mid-level vision as contour and region labeling problems. We use conditional random fields (CRF) to capture interactions between contours, junctions and regions. Efficient inference on the CRF is done with loopy belief propagation, and maximum likelihood parameters are obtained through gradient descent.
We apply the CDT/CRF framework on various mid-level vision problems, including curvilinear grouping, figure/ground assignment, and figure/ground segmentation. By extensive experimentation on large annotated datasets, we are able to demonstrate, quantitatively, the effectiveness of mid-level visual cues in natural images: we show that curvilinear grouping improves boundary detection; we show that figure/ground organization is feasible without object knowledge; and we show how low-, mid-, and high-level visual cues can be integrated and systematically analyzed in our framework.
Discussion(0)
No comments yet. Be the first to comment.