FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 11 |

«Image Alignment and Stitching: A Tutorial Richard Szeliski1 1 Microsoft Research, USA, szeliski Abstract This tutorial reviews image ...»

-- [ Page 1 ] --

Foundations and Trends R in

Computer Graphics and Vision

Vol. 2, No 1 (2006) 1–104

c 2006 R. Szeliski

DOI: 10.1561/0600000009

Image Alignment and Stitching: A Tutorial

Richard Szeliski1


Microsoft Research, USA, szeliski@microsoft.com


This tutorial reviews image alignment and image stitching algorithms.

Image alignment algorithms can discover the correspondence relation-

ships among images with varying degrees of overlap. They are ide-

ally suited for applications such as video stabilization, summarization, and the creation of panoramic mosaics. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to pro- duce seamless mosaics. It ends with a discussion of open research prob- lems in the area.

1 Introduction Algorithms for aligning images and stitching them into seamless photo- mosaics are among the oldest and most widely used in computer vision.

Frame-rate image alignment is used in every camcorder that has an “image stabilization” feature. Image stitching algorithms create the high-resolution photo-mosaics used to produce today’s digital maps and satellite photos. They also come bundled with most digital cameras currently being sold, and can be used to create beautiful ultra wide- angle panoramas.

An early example of a widely used image registration algorithm is the patch-based translational alignment (optical flow) technique developed by Lucas and Kanade [123]. Variants of this algorithm are used in almost all motion-compensated video compression schemes such as MPEG and H.263 [113]. Similar parametric motion estima- tion algorithms have found a wide variety of applications, including video summarization [20,93,111,203], video stabilization [81], and video compression [95,114]. More sophisticated image registration algorithms have also been developed for medical imaging and remote sensing – see [29, 71, 226] for some previous surveys of image registration techniques.

1 2 Introduction In the photogrammetry community, more manually intensive meth- ods based on surveyed ground control points or manually registered tie points have long been used to register aerial photos into large-scale photo-mosaics [181]. One of the key advances in this community was the development of bundle adjustment algorithms that could simultane- ously solve for the locations of all of the camera positions, thus yielding globally consistent solutions [207]. One of the recurring problems in creating photo-mosaics is the elimination of visible seams, for which a variety of techniques have been developed over the years [1,50,135,136,148].

In film photography, special cameras were developed at the turn of the century to take ultra wide-angle panoramas, often by exposing the film through a vertical slit as the camera rotated on its axis [131]. In the mid-1990s, image alignment techniques were started being applied to the construction of wide-angle seamless panoramas from regular hand-held cameras [43, 124, 193, 194]. More recent work in this area has addressed the need to compute globally consistent alignments [167, 178, 199], the removal of “ghosts” due to parallax and object movement [1, 50, 178, 210], and dealing with varying exposures [1, 116, 124, 210]. (A collection of some of these papers can be found in [19].) These techniques have spawned a large number of commercial stitching products [43, 168], for which reviews and comparison can be found on the Web.

While most of the above techniques work by directly minimizing pixel-to-pixel dissimilarities, a different class of algorithms works by extracting a sparse set of features and then matching these to each other [7, 30, 35, 38, 129, 227]. Feature-based approaches have the advantage of being more robust against scene movement and are potentially faster, if implemented the right way. Their biggest advantage, however, is the ability to “recognize panoramas,” i.e., to automatically discover the adjacency (overlap) relationships among an unordered set of images, which makes them ideally suited for fully automated stitching of panoramas taken by casual users [30].

What, then, are the essential problems in image alignment and stitching? For image alignment, we must first determine the appropriate mathematical model relating pixel coordinates in one image to pixel coordinates in another. Section 2 reviews these basic motion 3 models. Next, we must somehow estimate the correct alignments relating various pairs (or collections) of images. Section 3 discusses how direct pixel-to-pixel comparisons combined with gradient descent (and other optimization techniques) can be used to estimate these parameters. Section 4 discusses how distinctive features can be found in each image and then efficiently matched to rapidly establish correspondences between pairs of images. When multiple images exist in a panorama, techniques must be developed to compute a globally consistent set of alignments and to efficiently discover which images overlap one another.

These issues are discussed in Section 5.

For image stitching, we must first choose a final compositing surface onto which to warp and place all of the aligned images (Section 6).

We also need to develop algorithms to seamlessly blend overlapping images, even in the presence of parallax, lens distortion, scene motion, and exposure differences (Section 6). In the last section of this survey, additional applications of image stitching and open research problems were discussed.

2 Motion Models Before we can register and align images, we need to establish the mathematical relationships that map pixel coordinates from one image to another. A variety of such parametric motion models are possible, from simple 2D transforms, to planar perspective models, 3D camera rotations, lens distortions, and the mapping to nonplanar (e.g., cylindrical) surfaces [194].

To facilitate working with images at different resolutions, we adopt a variant of the normalized device coordinates used in computer graphics [145, 216]. For a typical (rectangular) image or video frame, we let the pixel coordinates range from [−1, 1] along the longer axis, and [−a, a] along the shorter, where a is the inverse of the aspect ratio, as shown in Figure 2.1.1 For an image with width W and height H, the equations mapping integer pixel coordinates x = (x, y) to normalized device coordinates x = (x, y) are 2x − W 2y − H x= and y =, where S = max(W, H). (2.1) S S 1 Incomputer graphics, it is usual to have both axes range from [−1, 1], but this requires the use of two different focal lengths for the vertical and horizontal dimensions, and makes it more awkward to handle mixed portrait and landscape mode images.

–  –  –

Fig. 2.1 Mapping from pixel coordinates to normalized device coordinates.

Note that if we work with images in a pyramid, we need to halve the S value after each decimation step rather than recomputing it from max(W, H), since the (W, H) values may get rounded off or truncated in an unpredictable manner. Note that for the rest of this paper, we use normalized device coordinates when referring to pixel coordinates.

2.1 2D (planar) Motions Having defined our coordinate system, we can now describe how coordinates are transformed. The simplest transformations occur in the 2D plane and are illustrated in Figure 2.2.

–  –  –

Table 2.1 Hierarchy of 2D coordinate transformations.

The 2 × 3 matrices are extended with a third [0T 1] row to form a full 3 × 3 matrix for homogeneous coordinate transformations.

Name Matrix Number of d.o.f. Preserves Icon

–  –  –

Hierarchy of 2D Transformations The preceding set of transformations are illustrated in Figure 2.2 and summarized in Table 2.1. The easiest way to think of these is as a set of (potentially restricted) 3 × 3 matrices operating on 2D homogeneous coordinate vectors. Hartley and Zisserman [86] contains a more detailed description of the hierarchy of 2D planar transformations.

The above transformations form a nested set of groups, i.e., they are closed under composition and have an inverse that is a member of the same group. Each (simpler) group is a subset of the more complex group below it.

2.2 3D Transformations A similar nested hierarchy exists for 3D coordinate transformations that can be denoted using 4 × 4 transformation matrices, with 3D equivalents to translation, rigid body (Euclidean) and affine transformations, and homographies (sometimes called collineations) [86].

The process of central projection maps 3D coordinates p = (X, Y, Z) to 2D coordinates x = (x, y, 1) through a pinhole at the camera origin onto a 2D projection plane a distance f along the z axis,

–  –  –

Fig. 2.3 Central projection, showing the relationship between the 3D and 2D coordinates p and x, as well as the relationship between the focal length f and the field of view θ.

–  –  –

where K = diag(f, f, 1) is called the intrinsic calibration matrix.2 This matrix can be replaced by a more general upper-triangular matrix K that accounts for nonsquare pixels, skew, and a variable optic center location [86]. However, in practice, the simple focal length scaling used above provides high-quality results when stitching images from regular cameras.

In this paper, I prefer to use a 4 × 4 projection matrix, P, K 0 x∼ p = P p, ˜ (2.13) 0T 1 which maps the homogeneous 4-vector p = (X, Y, Z, 1) to a special kind of homogeneous screen vector x = (x, y, 1, d). This allows me to denote ˜ the upper-left 3 × 3 portion of the projection matrix P as K (making it compatible with the computer vision literature), while not dropping altogether the inverse screen depth information d (which is also sometimes called the disparity d [144]). This latter quantity is necessary to reason about mappings between images of a 3D scene, as described below.

What happens when we take two images of a 3D scene from different camera positions and/or orientations (Figure 2.4a)? A 3D point p gets mapped to an image coordinate x0 in camera 0 through the ˜

–  –  –

2 The last column of K usually contains the optical center (cx, cy ), but this can be set to zero if we use normalized device coordinates.

10 Motion Models combination of a 3D rigid-body (Euclidean) motion E 0,

–  –  –

Unfortunately, we do not usually have access to the depth coordinates of pixels in a regular photographic image. However, for a planar scene, we can replace the last row of P 0 in (2.13) with a general plane equation, n0 · p + c0 that maps points on the plane to d0 = 0 values ˆ (Figure 2.4b). Then, if we set d0 = 0, we can ignore the last column of M 10 in (2.17) and also its last row, since we do not care about the final z-buffer depth. The mapping equation (2.17) thus reduces to

–  –  –

3 For points off the reference plane, we get out-of-plane parallax motion, which is why this representation is often called the plane plus parallax representation [110, 165, 196].

4 Note that for a single pair of images, the fact that a 3D plane is being viewed by a set

–  –  –

Fig. 2.5 Pure 3D camera rotation. The form of the homography (mapping) is particularly simple and depends only on the 3D rotation matrix and focal lengths.

Rotational Panoramas The more interesting case is when the camera undergoes pure rotation (Figure 2.5), which is equivalent to assuming all points are very far from the camera, i.e., on the plane at infinity.

Setting t0 = t1 = 0, we get the simplified 3 × 3 homography

–  –  –

each image is intrinsically more stable than estimating a full 8-d.o.f.

homography, which makes this the method of choice for large-scale image stitching algorithms [30, 178, 199].

Parametrizing 3D Rotations If we are going to represent panoramas using a combination of rotations and focal lengths, what is the

best way to represent the rotations themselves? The choices include:

–  –  –

2.3 Cylindrical and Spherical Coordinates An alternative to using homographies or 3D motions to align images is to first warp the images into cylindrical coordinates and to then use a pure translational model to align them [43, 193]. Unfortunately, this only works if the images are all taken with a level camera or with a known tilt angle.

Assume for now that the camera is in its canonical position, i.e., its rotation matrix is the identity, R = I, so that the optic axis is aligned with the z axis and the y axis is aligned vertically. The 3D ray corresponding to an (x, y) pixel is therefore (x, y, f ).

We wish to project this image onto a cylindrical surface of unit radius [193]. Points on this surface are parameterized by an angle θ and a height h, with the 3D cylindrical coordinates corresponding to (θ, h) given by

–  –  –

Fig. 2.7 An example of a cylindrical panorama: (a) two cylindrically warped images related by a horizontal translation; (b) part of a cylindrical panorama composited from a sequence of images.

related by a pure horizontal translation.6 This makes it attractive as an initial class project in an introductory computer vision course, since the full complexity of the perspective alignment algorithm (Subsections 3.5 and 4.3) can be avoided. Figure 2.7 shows how two cylindrically warped images from a leveled rotational panorama are related by a pure translation [199].

Pages:   || 2 | 3 | 4 | 5 |   ...   | 11 |

Similar works:

«Journal of Consciousness Studies www.imprint-academic.com/jcs Jana M. Iverson and Esther Thelen Hand, Mouth and Brain The Dynamic Emergence of Speech and Gesture Introduction The past fifteen years have seen a resurgence of interest in ideas of embodiment, the claim that bodily experiences play an integral role in human cognition (e.g., Clark, 1997; Johnson, 1987; Sheets-Johnstone, 1990; Varela et al., 1991). The notion that mind arises from having a body that interacts with the environment in...»

«JNCC Report No. 374 The numbers of inshore waterbirds using the Greater Thames during the non-breeding season; an assessment of the area’s potential for qualification as a marine SPA Andy Webb, Ben J. Dean, Susan H. O’Brien, Ilka Söhle, Claire McSorley, James B. Reid, Peter A. Cranswick, Lucy E. Smith and Colette Hall May 2009 © JNCC, Peterborough 2009 ISSN 0963 8901 The numbers of inshore waterbirds using the Greater Thames during the non-breeding season; an assessment of the area’s...»

«Obra reproducida sin responsabilidad editorial EL PÚBLICO Federico García Lorca Advertencia de Luarna Ediciones Este es un libro de dominio público en tanto que los derechos de autor, según la legislación española han caducado. Luarna lo presenta aquí como un obsequio a sus clientes, dejando claro que: La edición no está supervisada por nuestro departamento editorial, de forma que no nos responsabilizamos de la fidelidad del contenido del mismo. 1) Luarna sólo ha adaptado la obra para...»

«CAPACITY DEVELOPMENT FOR SUSTAINABLE HUMAN DEVELOPMENT: CONCEPTUAL AND OPERATIONAL SIGNPOSTS I. INTRODUCTION 1. What is new? Or different? The SHD mission of UNDP has major implications for the way UNDP does capacity development. SHD is not something outsiders can do for people; it is something they must do for themselves. Developing people's capacities for sustainable human development is thus one of the most effective ways of fostering SHD. But capacity development for SHD to be effective,...»

«Welding Simulations of Aluminum Alloy Joints by Finite Element Analysis Justin D. Francis Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Aerospace Engineering Dr. Eric Johnson, Chair Dr. Rakesh Kapania Dr. Zafer Gurdal Dr. Tom-James Stoumbos April 2002 Blacksburg, Virginia Keywords: weld simulation, GMAW, aluminum, finite element analysis Welding Simulations of Aluminum...»

«Media,http://mcs.sagepub.com/ Culture & Society Nation-branding and transnational consumption: Japan-mania and the Korean wave in Taiwan Shuling Huang Media Culture Society 2011 33: 3 DOI: 10.1177/0163443710379670 The online version of this article can be found at: http://mcs.sagepub.com/content/33/1/3 Published by: http://www.sagepublications.com Additional services and information for Media, Culture & Society can be found at: Email Alerts: http://mcs.sagepub.com/cgi/alerts Subscriptions:...»

«A Thesis Submitted for the Degree of Doctor of Area Studies Politics of Electoral Reform in Thailand タイにおける選挙制度改革の政治 Siripan Nogsuan Sawasdee November 2014 Doctor of Area Studies A Thesis Submitted for the Degree of Doctor of Area Studies Politics of Electoral Reform in Thailand タイにおける選挙制度改革の政治 Siripan Nogsuan Sawasdee สิ ริพรรณ นกสวน สวัสดี Graduate School of Asian and African Area Studies Kyoto...»

«Unaccompanied Resident Brochure 2011 RAF Alconbury Unaccompanied Housing Building 652 01480 843115 1 DSN: 268 3115 Welcome to RAF Alconbury Unaccompanied Housing (UH) Welcome to RAF Alconbury! We are proud of our base and sincerely hope your stay will be both pleasant and rewarding. This brochure was developed to familiarize you with RAF Alconbury dormitory standards. Your knowledge and compliance with these standards will make your stay pleasant and more comfortable. When you have many people...»

«FLIGHT ATTENDANT MEDICAL RESEARCH INSTITUTE PUBLICATIONS AND PRESENTATIONS 2002-2013 Table of Contents Table of Contents 2013 Publications 2013 Presentations and Abstracts 2013 Book Chapters, etc., 2013 2012 Publications 2012 Presentations and Abstracts 2012 Book Chapters, etc., 2012 2011 Publications 2011 Presentations and Abstracts 2011 Book Chapters, etc., 2011 2010 Publications 2010 Presentations and Abstracts 2010 Book Chapters, etc., 2010 2009 Publications 2009 Presentations and Abstracts...»

«Faculty Handbook Revision Dates: 08/2014, 07/2013, 5/13, 8/12, 8.11, 8.08 Office of the Provost This document is updated periodically for corrections or changes. The most current version of this document is noted in the date which appears in the footer section. For questions or concerns, please contact the Office of the Provost: 928-523-2230. To access an electronic copy of this handbook, please visit the Provost’s web site: http://nau.edu/provost/ For additional information related to...»

«Final Project Report Great Lakes Fish and Wildlife Restoration Act Project Title: The importance of the larval stage to cisco recruitment variation in the Great Lakes Project Sponsor: Chippewa Ottawa Resource Authority FWS Agreement Numbers: 30181-8-G018 and 30181-8-G046 Principal Investigators: Mark Ebener, Dan Yule, Randy Claramunt, Ken Cullis, Michael Jones, Seth Moore, Henry Quinlan, Ed Roseman, Steve Schram, Don Schreiner, Shawn Sitar, Jason Stockwell Final Report Authors: Jared Myers,...»

«The Hollywood Oscars are a Modern Reenactment of an Ancient Egyptian Sun-god Festival by Jeremy James From time to time we like to expose the extent to which this world is run by people who hate Christianity and who worship the gods of ancient paganism, especially the gods of ancient Egypt. An earlier paper examined the role of witchcraft in Hollywood, with particular reference to an episode of the television series, Two and a Half Men. Last year, we published a detailed study (over 200 pages)...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.