«Abstract. We present a technique to extract complex suburban roofs from sets of aerial images. Because we combine 2-D edge information, photometric ...»
Automatic Extraction of Generic House Roofs
from High Resolution Aerial Imagery ?
Frank Bignone, Olof Henricsson, Pascal Fua+ and Markus Stricker
Communications Technology Laboratory
Swiss Federal Institute of Technology ETH
CH-8092 Zurich, Switzerland
+ SRI International, Menlo Park, CA 94025, USA
Abstract. We present a technique to extract complex suburban roofs
from sets of aerial images. Because we combine 2-D edge information,
photometric and chromatic attributes and 3-D information, we can deal with complex houses. Neither do we assume the roofs to be at or recti- linear nor do we require parameterized building models. From only one image, 2-D edges and their corresponding attributes and relations are extracted. Using a segment stereo matching based on all available im- ages, the 3-D location of these edges are computed. The 3-D segments are then grouped into planes and 2-D enclosures are extracted, thereby allowing to infer adjoining 3-D patches describing roofs of houses. To achieve this, we have developed a hierarchical procedure that e ectively pools the information while keeping the combinatorics under control. Of particular importance is the tight coupling of 2-D and 3-D analysis.
1 Introduction The extraction of instances of 3-D models of buildings and other man-made objects is currently a very active research area and an issue of high importance to many users of geo-information systems, including urban planners, geographers, and architects.
Here, we present an approach to extract complex suburban roofs from sets of aerial images. Such roofs can neither be assumed to be at nor to have sim- ple rectangular shapes. In fact, their edges may not even form ninety degrees angles. They do tend, however, to lie on planes. This speci c problem is a typi- cal example of the general Image Understanding task of extracting instances of generic object classes that are too complex to be handled by purely image-based approaches and for which no speci c template exists.
Because low-level methods typically fail to extract all relevant features and often nd spurious ones, existing approaches use models to constrain the problem 15]. Traditional approaches rely almost exclusively on the use of edge-based features and their 2-D or 3-D geometry. Although 3-D information alleviates the problem, instantiating the models is combinatorially explosive. This di culty We acknowledge the support given to this research by ETH under project 13-1993-4.
is typically handled by using very constrained models, such as at rectilinear roofs or a parameterized building model, to reduce the size of the search space.
These models may be appropriate for industrial buildings with at roofs and perpendicular walls but not for the complicated suburban houses that can be found in scenes such as the one of Fig. 1.
It has been shown, however, that combining photometric and chromatic region attributes with edges leads to vastly improved results over the use of either alone 6, 11]. The houses of Fig. 1 require more exible models than the standard ones. We de ne a very generic roof primitive: we take it to be a 3-D patch that is roughly planar and encloses a compact polygonal area with consistent chromatic and luminance attributes. We therefore propose an approach that combines 2-D and 3-D edge geometry with region attributes. This is not easy to implement because the complexity of the approach is likely to increase rapidly with the number of information sources. Furthermore, these sources of information should be as robust as possible but none of them can be expected to be error-free and this must be taken into account by the data-fusion mechanism.
Figure 1 Two of the four registered 1800 1800 images that are part of our residential dataset (Courtesy of Institute of Photogrammetry and Geodesy at ETH Zurich).
To solve this problem, we have developed a procedure that relies on hierarchical hypothesis generation, see Fig. 2. The procedure starts with a multi-image coverage of a site, extracts 2-D edges from a source image, computes corresponding photometric and chromatic attributes, and their similarity relationships. Using both geometry and photometry, it then computes the 3-D location of these edges and groups them to in nite planes. In addition, 2-D enclosures are extracted and combined with the 3-D planes to instances of our roof primitive, that is 3-D patches. All extracted hypotheses of 3-D patches are ranked according to their geometric quality. Finally, the best set of 3-D patches that are mutually consistent are retained, thus de ning a scene parse. This procedure has proven powerful enough so that, in contrast to other approaches to generic roof extraction (e.g. 14, 6, 4, 13, 7, 12]), we need not assume the roofs to be at or rectilinear or use a parameterized building model.
Note that, even though geometric regularity is the key to the recognition of man-made structures, imposing constraints that are too tight, such as requiring that edges on a roof form ninety degrees angles, would prevent the detection of many structures that do not satisfy them perfectly. Conversely, constraints that are too loose will lead to combinatorial explosion. Here we avoid both problems by working in 2-D and 3-D, grouping only edges that satisfy loose coplanarity constraints, weak 2-D geometric and similarity constraints on their photometric and chromatic attributes. None of these constraints is very tight but, because we pool a lot of information from multiple images, we are able to retain only valid object candidates.
2−D Framework 3−D Framework
Figure 2 Our hierarchical framework, a feed-forward scheme, where several components in the 2-D scheme mutually exchange data and aggregates with the 3-D modules.
We view the contribution of our approach as the ability to robustly combine information derived from edges, photometric and chromatic area properties, geometry and stereo, to generate well organized 3-D data structures describing complex objects while keeping the combinatorics under control. Of particular importance is the tight coupling of 2-D and 3-D analysis.
For our experiments, we use a state-of-the-art dataset produced by the Institute of Geodesy and Photogrammetry at ETH Zurich. It consists of a residential and an industrial scene with the following characteristics: 1:5,000 image scale vertical aerial photography, four-way image overlap, color imagery, geometrically accurate lm scanning with 15 microns pixel size, precise sensor orientation, and accurate ground truth including DTM and manually measured building CADmodels. The latter are important to quantitatively evaluate our results.
Our hierarchical parsing procedure is depicted in Fig. 2. Below we describe each of its components: 2-D edge extraction, computation of photometric and chromatic attributes, de nition of similarity relationships among 2-D contours, 3-D edge matching and coplanar grouping, extraction of 2-D enclosures, and nally, generation and selection of candidate 3-D object models. Last, we present and discuss our results.
2 Attributed Contours and their Relations
2.1 Edge Detection and Edgel Aggregation Our approach is based on grouping contour segments. The presented work does not require a particular edge detector, however, we believe it is wise to use the best operator available to obtain the best possible results. For this reason we use the SE energy operator (suppression and enhancement) recently presented in 8]. The operator produces a more accurate representation of edges and lines in images of outdoor scenes than traditional edge detectors due to its superior handling of interferences between edges and lines, for example at sharp corners.
The edge and line pixels are then aggregated to coherent contour segments by using the algorithm described in 10]. The result is a graph representation of contours and vertices, as shown in Fig. 3B. Each contour has geometric attributes such as its coarse shape, that is straight, curved or closed.
A C Figure 3 (A) a cut-out 350 350 from the dataset in Fig.1, (B) The resulting attributed graph with all its contours and vertices, (C) the anking regions with their corresponding median luminance attributes. The background is black.
2.2 Photometric and Chromatic Contour Attributes The contour graph contains only basic information about geometry and connectivity. To increase its usefulness, image attributes are assigned to each contour and vertex. The attributes re ect either properties along the actual contour (e.g. integrated gradient magnitude) or region properties on either side, such as chromatic or photometric homogeneity.
Since we are dealing with fairly straight contours the construction of the anking regions is particularly simple. The anking region is constructed by a translation of the original contour in the direction of its normal. We de ne a anking region on each side of the contour. When neighboring contours interfere with the constructed region, a truncation mechanism is applied. In Fig. 3C we display all anking regions. For more details we refer to 9].
To establish robust photometric and chromatic properties of the anking regions, we need a color model that accurately represents colors under a variety of illumination conditions. We chose to work with HVC color spaces since they separate the luminant and chromatic components of color. The photometric attributes are computed by analyzing the value component, whereas the chromatic attributes are derived from the hue and chroma components. As underlying color space we use the CIE(L*a*b*) color space because of its well based psychophysical foundation; it was created to measure perceptual color di erences 16].
Since each anking region is assumed to be fairly homogeneous (due to the way it is constructed), the data points contained in each region tend to concentrate in a small region of the color space. As we deal with images of aerial scenes where disturbances like chimneys, bushes, shadows, or regular roof texture are likely to be within the de ned regions, the computation of region properties must take outliers into account. Following the approach in 11] we represent photometric attributes with the median luminance and the interquartile range (IQR), see Fig. 3C. The chromatic region properties are computed analogously from the CIE(a*b*) components and are represented by the center of the chromatic cluster and the corresponding spreads.
2.3 Contour Similarity Relations Although geometric regularity is a major component in the recognition of manmade structures, neglecting other sources of information that corroborate the relatedness among straight contours imposes unnecessary restrictions on the approach. We propose to form a measure that relates contours based on similarity in position, orientation, and photometric and chromatic properties.
For each straight contour segment we de ne two directional contours pointing in opposite directions. Two such directional contours form a contour relation with a de ned logical interior. For each contour relation we compute four scores based on similarity in luminance, chromaticity, proximity, and orientation and combine them to a single similarity score by summation.
Three consecutive selection procedures are applied, retaining only the best non-con icting interpretations. The rst selection involves only two contours (resp. four directional contours) and aims at reducing the eight possible interpretations to less or equal to four. The second selection procedure removes shortcuts among three directed contours. The nal selection is highly data-driven and aims at reducing the number of contour relations from each directed contour to only include the locally best ones. All three selection procedures are based on analysis of the contour similarity scores. Due to lack of space we refer to 11] for more details.
3 Segment Stereo Matching Many methods for edge-based stereo matching rely on extracting straight 2-D edges from images and then matching them 1]. These methods, although fast and reliable, have one drawback: if an edge extracted from one image is occluded or only partially de ned in one of the other images, it may not be matched.
In outdoor scenes, this happens often, for example when shadows cut edges.
Another class of methods 2] consists of moving a template along the epipolar line to nd correspondences. It is much closer to correlation-based stereo and avoids the problem described above. We propose a variant of the latter approach for segment matching that can cope with noise and ambiguities. Edges are extracted from only one image (the source image) and are matched in the other images by maximizing an \edginess measure" along the epipolar line. The source image is the nadir (most top-view) image because it is assumed to contain few (if any) self-occluded roof parts. Geometric and photometric constraints are used to reduce the number of 3-D interpretations of each 2-D edge. We outline this approach below and refer the interested reader to 3] for further details.
where G(r) is the image gradient at r, (r) its orientation. The function f is maximum when the virtual segment lies on a straight edge and decreases quickly with any translation or rotation. Further, f can be large even if if the edge is only partially visible in the image, that is occluded or broken.
The search for the most likely counterparts for the source edge now reduces to nding the maxima of f by discretizing and sm and performing a 2-D search.
In the presence of parallel structures, the edginess typically has several maxima that cannot be distinguished using only two images. However, using more than two images, we can reduce the number of matches and only keep the very best by checking for consistency across image pairs.
We can further reduce the hypothesis set by using the photometric edge attributes of section 2.2 after photometric equalization of the images. We compute the 2-D projections of each candidate 3-D edge into all the images. The image photometry in areas that pertains to at least one side of the 2-D edges should be similar across images. Figure 4 shows all matched 3-D segments as well as the manually measured CAD model for the house in Fig. 3A.
4 Coplanar Grouping of 3-D Segments To group 3-D segments into in nite planes, we propose a simple method that
accounts for outliers in the data. It proceeds in two steps: