Probabilistic models for mid-level vision

Jitendra Malik; Xiaofeng Ren

RDLNetworkEkosistem

Hakkımızda SSS

Probabilistic models for mid-level vision — Jitendra Malik (2006) | RDL Network

Abstract

1 min read

Mid-level vision refers to a collection of visual processes and their associated principles, including contour grouping, region grouping, and figure/ground organization. Evidence from psychophysics strongly suggests that mid-level vision is a key component of the visual system and heavily interacts with object and scene recognition. We develop a unified probabilistic framework of mid-level vision, both motivated by and evaluated on human-annotated collections of natural images. We use human-marked boundaries in natural images as ground-truth data. Empirical studies of these boundaries reveal a number of power laws distributions, confirming the intuition that natural images are multi-scale or near scale-invariant in nature. We show that pixel-based Markov models fail to capture such invariance. We propose a scale-invariant mid-level representation from bottom-up. We detect edges in an image, build a discrete piecewise linear approximation of the edges, and finally use constrained Delaunay triangulation (CDT) to complete gradient-less gaps and partition the image into regions. We show that the CDT graph is a compact representation with little loss of structure. On top of the CDT representation, we formulate mid-level vision as contour and region labeling problems. We use conditional random fields (CRF) to capture interactions between contours, junctions and regions. Efficient inference on the CRF is done with loopy belief propagation, and maximum likelihood parameters are obtained through gradient descent. We apply the CDT/CRF framework on various mid-level vision problems, including curvilinear grouping, figure/ground assignment, and figure/ground segmentation. By extensive experimentation on large annotated datasets, we are able to demonstrate, quantitatively, the effectiveness of mid-level visual cues in natural images: we show that curvilinear grouping improves boundary detection; we show that figure/ground organization is feasible without object knowledge; and we show how low-, mid-, and high-level visual cues can be integrated and systematically analyzed in our framework.

Probabilistic models for mid-level vision

Abstract

Discussion(0)

Related publications

The Ecological Statistics of Grouping and Figure-Ground Cues

Learning to Navigate Using Mid-Level Visual Priors

Cue Integration for Figure/Ground Labeling

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

Sequential Modeling Enables Scalable Learning for Large Vision Models

Related publications

Article2010
The Ecological Statistics of Grouping and Figure-Ground Cues
Article2010

Preprint2019
Learning to Navigate Using Mid-Level Visual Priors
Preprint2019

Article2005
Cue Integration for Figure/Ground Labeling
Article2005

Preprint2018
Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
Preprint2018

Preprint2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Preprint2023