WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 |

«James M. Coughlan A.L. Yuille Smith-Kettlewell Eye Research Inst. Smith-Kettlewell Eye Research Inst. 2318 Fillmore St. 2318 Fillmore St. San ...»

-- [ Page 1 ] --

The Manhattan World Assumption:

Regularities in scene statistics which

enable Bayesian inference.

James M. Coughlan A.L. Yuille

Smith-Kettlewell Eye Research Inst. Smith-Kettlewell Eye Research Inst.

2318 Fillmore St. 2318 Fillmore St.

San Francisco, CA 94115 San Francisco, CA 94115

coughlan@ski.org yuille@ski.org

Abstract

Preliminary work by the authors made use of the so-called “Manhattan world” assumption about the scene statistics of city and indoor scenes. This assumption stated that such scenes were built on a cartesian grid which led to regularities in the image edge gradient statistics. In this paper we explore the general applicability of this assumption and show that, surprisingly, it holds in a large variety of less structured environments including rural scenes. This enables us, from a single image, to determine the orientation of the viewer relative to the scene structure and also to detect target objects which are not aligned with the grid. These inferences are performed using a Bayesian model with probability distributions (e.g. on the image gradient statistics) learnt from real data.

1 Introduction In recent years, there has been growing interest in the statistics of natural images (see Huang and Mumford [4] for a recent review). Our focus, however, is on the discovery of scene statistics which are useful for solving visual inference problems.

For example, in related work [5] we have analyzed the statistics of filter responses on and off edges and hence derived effective edge detectors.

In this paper we present results on statistical regularities of the image gradient responses as a function of the global scene structure. This builds on preliminary work [2] on city and indoor scenes. This work observed that such scenes are based on a cartesian coordinate system which puts (probabilistic) constraints on the image gradient statistics.

Our current work shows that this so-called “Manhattan world” assumption about the scene statistics applies far more generally than urban scenes. Many rural scenes contain sufficient structure on the distribution of edges to provide a natural cartesian reference frame for the viewer. The viewers’ orientation relative to this frame can be determined by Bayesian inference. In addition, certain structures in the scene stand out by being unaligned to this natural reference frame. In our theory such structures appear as “outlier” edges which makes it easier to detect them. Informal evidence that human observers use a form of the Manhattan world assumption is provided by the Ames room illusion, see figure (6), where the observers appear to erroneously make this assumption, thereby grotesquely distorting the sizes of objects in the room.

2 Previous Work and Three- Dimensional Geometry Our preliminary work on city scenes was presented in [2]. There is related work in computer vision for the detection of vanishing points in 3-d scenes [1], [6] (which proceeds through the stages of edge detection, grouping by Hough transforms, and finally the estimation of the geometry).

We refer the reader to [3] for details on the geometry of the Manhattan world and report only the main results here. Briefly, we calculate expressions for the orientations of x, y, z lines imaged under perspective projection in terms of the orientation of the camera relative to the x, y, z axes. The camera orientation relative to the xyz axis system may be specified by three Euler angles: the azimuth (or compass angle) α, corresponding to rotation about the z axis, the elevation β above the xy plane, and the twist γ about the camera’s line of sight. We use Ψ = (α, β, γ) to denote all three Euler angles of the camera orientation. Our previous work [2] assumed that the elevation and twist were both zero which turned out to be invalid for many of the images presented in this paper.

We can then compute the normal orientation of lines parallel to the x, y, z axes, measured in the image plane, as a function of film coordinates (u, v) and the camera orientation Ψ. We express the results in terms of orthogonal unit camera axes a, b and c, which are aligned to the body of the camera and are determined by Ψ. For x lines (see Figure 1, left panel) we have tan θx = −(ucx + f ax )/(vcx + f bx ), where θx is the normal orientation of the x line at film coordinates (u, v) and f is the focal length of the camera. Similarly, tan θy = −(ucy + f ay )/(vcy + f by ) for y lines and tan θz = −(ucz + f az )/(vcz + f bz ) for z lines. In the next section will see how to relate the normal orientation of an object boundary (such as x, y, z lines) at a point (u, v) to the magnitude and direction of the image gradient at that location.

v 150

–  –  –

Figure 1: (Left) Geometry of an x line projected onto (u, v) image plane. θ is the normal orientation of the line in the image. (Right) Histogram of edge orientation error (displayed modulo 180◦ ). Observe the strong peak at 0◦, indicating that the image gradient direction at an edge is usually very close to the true normal orientation of the edge.

3 Pon and Pof f : Characterizing Edges Statistically Since we do not know where the x, y, z lines are in the image, we have to infer their locations and orientations from image gradient information. This inference is done using a purely local statistical model of edges. A key element of our approach is that it allows the model to infer camera orientation without having to group pixels into x, y, z lines. Most grouping procedures rely on the use of binary edge maps which often make premature decisions based on too little information. The poor quality of some of the images – underexposed and overexposed – makes edge detection particularly difficult, as well as the fact that some of the images lack x, y, z lines that are long enough to group reliably.





Following work by Konishi et al [5], we determine probabilities Pon (Eu ) and Pof f (Eu ) for the probabilities of the image gradient magnitude Eu at position u in the image conditioned on whether we are on or off an edge. These distributions quantify the tendency for the image gradient to be high on object boundaries and low off them, see Figure 2. They were learned by Konishi et al for the Sowerby image database which contains one hundred presegmented images.

0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05

–  –  –

We extend the work of Konishi et al by putting probability distributions on how accurately the image gradient direction estimates the true normal direction of the edge. These were learned for this dataset by measuring the true orientations of the edges and comparing them to those estimated from the image gradients.

This gives us distributions on the magnitude and direction of the intensity gradient Pon (Eu |θ), Pof f (Eu ), where Eu = (Eu, φu ), θ is the true normal orientation of the edge, and φu is the gradient direction measured at point u = (u, v). We make a factorization assumption that Pon (Eu |θ) = Pon (Eu )Pang (φu − θ) and Pof f (Eu ) = Pof f (Eu )U (φu ). Pang (.) (with argument evaluated modulo 2π and normalized to 1 over the range 0 to 2π) is based on experimental data, see Figure 1 (right), and is peaked about 0 and π. In practice, we use a simple box-shaped function to model the distribution: Pang (δθ) = (1 − )/4τ if δθ is within angle τ of 0 or π, and /(2π − 4τ ) otherwise (i.e. the chance of an angular error greater than ±τ is ).

In our experiments = 0.1 and τ = 4◦ for indoors and 6◦ outdoors. By contrast, U (.) = 1/2π is the uniform distribution.

4 Bayesian Model

We devised a Bayesian model which combines knowledge of the three-dimensional geometry of the Manhattan world with statistical knowledge of edges in images. The model assumes that, while the majority of pixels in the image convey no information about camera orientation, most of the pixels with high edge responses arise from the presence of x, y, z lines in the three-dimensional scene. An important feature of the Bayesian model is that it does not force us to decide prematurely which pixels are on and off an object boundary (or whether an on pixel is due to x, y, or z), but allows us to sum over all possible interpretations of each pixel.

The image data Eu at a single pixel u is explained by one of five models mu :

mu = 1, 2, 3 mean the data is generated by an edge due to an x, y, z line, respectively, in the scene; mu = 4 means the data is generated by an outlier edge (not due to an x, y, z line); and mu = 5 means the pixel is off-edge. The prior probability P (mu ) of each of the edge models was estimated empirically to be 0.02, 0.02, 0.02, 0.04, 0.9 for mu = 1, 2,..., 5.

Using the factorization assumption mentioned before, we assume the probability of the image data Eu has two factors, one for the magnitude of the edge strength and

another for the edge direction:

P (Eu |mu, Ψ, u) = P (Eu |mu )P (φu |mu, Ψ, u) (1)

where P (Eu |mu ) equals Pof f (Eu ) if mu = 5 or Pon (Eu ) if mu = 5. Also, P (φu |mu, Ψ, u) equals Pang (φu − θ(Ψ, mu, u)) if mu = 1, 2, 3 or U (φu ) if mu = 4, 5.

Here θ(Ψ, mu, u)) is the predicted normal orientation of lines determined by the equation tan θx = −(ucx +f ax )/(vcx +f bx) for x lines, tan θy = −(ucy +f ay )/(vcy + f by ) for y lines, and tan θz = −(ucz + f az )/(vcz + f bz ) for z lines.

In summary, the edge strength probability is modeled by Pon for models 1 through 4 and by Pof f for model 5. For models 1,2 and 3 the edge orientation is modeled by a distribution which is peaked about the appropriate orientation of an x, y, z line predicted by the camera orientation at pixel location u; for models 4 and 5 the edge orientation is assumed to be uniformly distributed from 0 through 2π.

Rather than decide on a particular model at each pixel, we marginalize over all five

possible models (i.e. creating a mixture model):

P (Eu |Ψ, u) = P (Eu |mu, Ψ, u)P (mu ) (2) mu =1

–  –  –

(Although the conditional independence assumption neglects the coupling of gradients at neighboring pixels, it is a useful approximation that makes the model computationally tractable.) Thus the posterior distribution on the camera orientation is given by u P (Eu |Ψ, u)P (Ψ)/Z where Z is a normalization factor and P (Ψ) is a uniform prior on the camera orientation.

To find the MAP (maximum a posterior) estimate, our algorithm maximizes the log posterior term log[P ({Eu }|Ψ)P (Ψ)] = log P (Ψ) + u log[ mu P (Eu |mu, Ψ, u)P (mu )] numerically by searching over a quantized set of compass directions Ψ in a certain range. For details on this procedure, as well as coarse-to-fine techniques for speeding up the search, see [3].

5 Experimental Results This section presents results on the domains for which the viewer orientation relative to the scene can be detected using the Manhattan world assumption. In particular, we demonstrate results for: (I) indoor and outdoor scenes (as reported in [2]), (II) rural English road scenes, (III) rural English fields, (IV) a painting of the French countryside, (V) a field of broccoli in the American mid-west, (VI) the Ames room, and (VII) ruins of the Parthenon (in Athens). The results show strong success for inference using the Manhattan world assumption even for domains in which it might seem unlikely to apply. (Some examples of failure are given in [3]. For example, a helicopter in a hilly scene where the algorithm mistakenly interprets the hill silhouettes as horizontal lines).

The first set of images were of city and indoor scenes in San Francisco with images taken by the second author [2]. We include four typical results, see figure 3, for comparison with the results on other domains.

Figure 3: Estimates of the camera orientation obtained by our algorithm for two indoor scenes (left) and two outdoor scenes (right). The estimated orientations of the x, y lines, derived for the estimated camera orientation Ψ, are indicated by the black line segments drawn on the input image. (The z line orientations have been omitted for clarity.) At each point on a subgrid two such segments are drawn – one for x and one for y. In the image on the far left, observe how the x directions align with the wall on the right hand side and with features parallel to this wall. The y lines align with the wall on the left (and objects parallel to it).

We now extend this work to less structured scenes in the English countryside. Figure (4) shows two images of roads in rural scenes and two fields. These images come from the Sowerby database. The next three images were either downloaded from the web or digitized (the painting). These are the mid-west broccoli field, the Parthenon ruins, and the painting of the French countryside.

6 Detecting Objects in Manhattan world We now consider applying the Manhattan assumption to the alternative problem of detecting target objects in background clutter. To perform such a task effectively requires modelling the properties of the background clutter in addition to those of the target object. It has recently been appreciated that good statistical modelling of the image background can improve the performance of target recognition [7].

The Manhattan world assumption gives an alternative way of probabilistically modelling background clutter. The background clutter will correspond to the regular structure of buildings and roads and its edges will be aligned to the Manhattan grid. The target object, however, is assumed to be unaligned (at least, in part) to this grid. Therefore many of the edges of the target object will be assigned to model 4 by the algorithm. (Note the algorithm first finds the MAP estimate Ψ∗ of the Figure 4: Results on rural images in England without strong Manhattan structure.

Same conventions as before. Two images of roads in the countryside (left panels) and two images of fields (right panel).

Figure 5: Results on an American mid-west broccoli field, the ruins of the Parthenon, and a digitized painting of the French countryside.

compass orientation, see section (4), and then estimates the model by doing MAP of P (mu |Eu, Ψ∗, u) to estimate mu for each pixel u.) This enables us to significantly simplify the detection task by removing all edges in the images except those assigned to model 4.



Pages:   || 2 |


Similar works:

«III Encuentro de Jóvenes de la Comunidades Aragonesas del Exterior: representantes de la juventud llevarán a cabo exposiciones ante los delegados del Gobierno de Aragón. Cena de la Federación de Casas y Centros Regionales de Baleares con la Consejera de Relaciones Institucionales. Cena en Honor a San Lorenzo patrón de Huesca.OCTUBRE FIESTAS DEL PILAR Inicio campeonatos del Pilar. Presentación Reinas de Fiestas del Pilar 20045. Pregón de Fiestas. Vino de Honor con autoridades. Misa...»

«Subject to approval by the Guardianship/Conservatorship Interim Committee GUARDIANSHIP AND CONSERVATORSHIP INTERIM COMMITTEE MINUTES Friday, November 19, 2004 9:30 a.m. Gold Room State Capitol, Boise, Idaho The meeting was called to order at 9:35 a.m. by Cochair Senator Bart Davis. Other committee members present were Cochair Representative Debbie Field, Senators Patti Ann Lodge, Dick Compton and Bert Marley and Representatives Leon Smith and Allen Andersen. Representative Sharon Block was...»

«PARMITER’S SCHOOL GCSE OPTIONS 2016 – 2018 This document contains details of the Key Stage 4 courses that will begin in September 2016. It outlines the compulsory (core) and optional elements of the curriculum and provides detailed information about each subject studied in Years 10 and 11.Guidance to assist with making GCSE option choices and the transition to Key Stage 4 is as follows: (i) Introduction to the GCSE options process in assembly (early January) and issuing of this options...»

«Chris Mourant 1 ‘We must become like the animals in order to become wise, and be blinded in order to be guided’ (Montaigne, Apology for Raymond Sebond).1 Assess the influence of Michel de Montaigne’s Essays in Shakespeare’s conception of nature in King Lear. The first English translation of Michel de Montaigne’s Essays was published in 1603 by John Florio. Ever since Capell (1767) drew attention to Shakespeare’s versification in The Tempest of a passage from Florio’s translation...»

«Polish We are sorry that you have had a miscarriage Przykro nam z powodu Pani poronienia A miscarriage can be a distressing experience. You were expecting a baby and you are probably having to cope with all sorts of feelings about this loss. Changes inside your body can also affect the way you are feeling. This leaflet has been written mostly by women who have been through miscarriage themselves. We hope that it will answer some of your questions. Your feelings I feel very upset and depressed....»

«LINE of DUTY DEATHS CITY OF BROCKTON FIRE DEPARTMENT 1 The Firefighters Prayer When I'm called to duty God wherever flames may rage give me strength to save a life whatever be its age Help me to embrace a little child before it is too late or save an older person from the horror of that fate Enable me to be alert to hear the weakest shout and quickly and efficiently to put the fire out I want to fill my calling and to give the best in me to guard my neighbor and protect his property And if...»

«Overseas Temples and Tamil Migratory Space Pierre-Yves Trouillet To cite this version: Pierre-Yves Trouillet. Overseas Temples and Tamil Migratory Space. South Asia Multidisciplinary Academic Journal (SAMAJ), 2012, 6. halshs-00867729 HAL Id: halshs-00867729 https://halshs.archives-ouvertes.fr/halshs-00867729 Submitted on 30 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of scidestin´e au d´pˆt et ` la...»

«Chapter 6 General Discussion General Discussion The main objective of the research described in this dissertation was to investigate whether and how commitment making can improve environmental behaviors. I have done so by performing both field and laboratory research, and by reviewing the commitment literature. The field research focused on nature conservation as performed by Dutch farmers. In the previous chapters I have described a social-cognitive explanation for farmers’ motivation to...»

«A Typo-morphological Enquiry into the Evolution of Urban and Architectural Forms in the Huangpu District of Shanghai, China Qiu Feng A Thesis in The Department of Geography, Planning & Environment Presented in Partial Fulfillment of the Requirements for the Degree of Master of Science (Geography, Urban and Environmental Studies) at Concordia University Montreal, Quebec, Canada March, 2014 © Qiu Feng, 2014 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis...»

«A 4.94/2:93-95 • (Samatha Vipassanā) Samādhi Sutta 3 SD 41.3 3 (Samatha Vipassanā) Samādhi Sutta 3 Tatiya (Samatha Vipassanā) Samādhi Sutta The Third Discourse on Samadhi (in terms of calm and insight) | A 4.94/2:93-95 Theme: 4 ways of meditating for awakening Translated & annotated by Piya T an ©2013 1 Triad of suttas There are three (Samatha Vipassanā) Samādhi Suttas—all dealing with 4 ways of meditating for the sake of awakening—as follows: (Samatha Vipassanā) Samādhi S 1 A...»

«NINES a federated model for integrating digital scholarship September 2005 9s: September 2005 1 Foreword A Federated Response The Initial Approach A Working Model for NINES The Federated NINES: How Does It Work? Collex Faceted Browsing and Knowledge Discovery Folksonomy and Research Communities Research and Publication in NINES Appendix 1: Making Digital Resources NINES-Ready 1. User-created material in Collex: 2. Federated scholarly archives: 3. Resource records from NINES-approved...»

«Strategy Implementation: An Alternative Choice of 8S’S STRATEGY IMPLEMENTATION: AN ALTERNATIVE CHOICE OF 8S’S Omar Khalid Bhatti Research Scholar International Islamic University, Malaysia “Execution, not strategy, offers an exclusive competitive advantage.” (Lippitt, 2007) INTRODUCTION Strategic implementation is an elemental step in revolving a company's vision and objectives into reality. To implement strategies successfully is critical for not only public but also for private...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.