«Three-Dimensional Face Recognition ∗ Alexander M. Bronstein, Michael M. Bronstein and Ron Kimmel Department of Computer Science Technion – Israel ...»
Three-Dimensional Face Recognition ∗
Alexander M. Bronstein, Michael M. Bronstein and Ron Kimmel
Department of Computer Science
Technion – Israel Institute of Technology, Haifa 32000, Israel
First version: May 18, 2004; Second version: December 10, 2004.
An expression-invariant 3D face recognition approach is presented. Our basic
assumption is that facial expressions can be modelled as isometries of the facial
surface. This allows to construct expression-invariant representations of faces using the canonical forms approach. The result is an eﬃcient and accurate face recognition algorithm, robust to facial expressions that can distinguish between identical twins (the ﬁrst two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods.
The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients ﬁeld, or the surface metric, are suﬃcient for constructing the expression-invariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.
Keywords: expression-invariant 3D face recognition, isometry invariant, facial ex- pressions, multidimensional scaling.
1. Introduction Automatic face recognition has been traditionally associated with the ﬁelds of computer vision and pattern recognition. Face recognition is considered a natural, non-intimidating, and widely accepted biometric identiﬁcation method (Ashbourn, 2002; Ortega-Garcia et al., 2004). As such, it has the potential of becoming the leading biometric technology.
Unfortunately, it is also one of the most diﬃcult pattern recognition problems. So far, all existing solutions provide only partial, and usually unsatisfactory, answers to the market needs.
In the context of face recognition, it is common to distinguish be- tween the problem of authentication and that of recognition. In the ﬁrst case, the enrolled individual (probe) claims identity of a person whose template is stored in the database (gallery). We refer to the data used for a speciﬁc recognition task as a template. The face recognition algorithm needs to compare a given face with a given template and ∗ This research was partially supported by Dvorah Fund of the Technion, the Bar Nir Bergreen Software Technology Center of Excellence and the Technion V.P.R.
Fund - E. and J. Bishop Research Fund.
c 2004 Kluwer Academic Publishers. Printed in the Netherlands.
"IJCV - second review - 3".tex; 14/12/2004; 20:14; p.1 2 Michael M. Bronstein, Alexander M. Bronstein, Ron Kimmel verify their equivalence. Such a setup (one-to-one matching) can occur when biometric technology is used to secure ﬁnancial transactions, for example, in an automatic teller machine (ATM). In this case, the user is usually assumed to be collaborative.
The second case is more diﬃcult. Recognition implies that the probe subject should be compared with all the templates stored in the gallery database. The face recognition algorithm should then match a given face with one of the individuals in the database. Finding a terrorist in a crowd (one-to-many matching) is one such application. Needless to say, no collaboration can be assumed in this case. At current technological level, one-to-many face recognition with non-collaborative users is practically unsolvable. That is, if one intentionally wishes not to be recognized, he can always deceive any face recognition technology. In the following, we will assume collaborative users.
Even collaborative users in a natural environment present high variability of their faces – due to natural factors beyond our control. The greatest diﬃculty of face recognition, compared to other biometrics, stems from the immense variability of the human face. The facial appearance depends heavily on environmental factors, for example, the lighting conditions, background scene and head pose. It also depends on facial hair, the use of cosmetics, jewelry and piercing. Last but not least, plastic surgery or long-term processes like aging and weight gain can have a signiﬁcant inﬂuence on facial appearance.
Yet, much of the facial appearance variability is inherent to the face itself. Even if we hypothetically assume that external factors do not exist, for example, that the facial image is always acquired under the same illumination, pose, and with the same haircut and make up, still, the variability in a facial image due to facial expressions may be even greater than a change in the person’s identity (see Figure 1).
Face recognition with varying lighting, head pose, and facial Figure 1.
expression is a non-trivial task.
1.1. Two-dimensional face recognition: Invariant versus generative approaches Trying to make face recognition algorithms insensitive to illumination, head pose, and other factors mentioned above is one of the main efforts of current research in the ﬁeld. Broadly speaking, there are two alternatives in approaching this problem. One is to ﬁnd features that are not aﬀected by the viewing conditions; we call this the invariant approach. Early face recognition algorithms advocated the invariant approach by ﬁnding a set of ﬁducial points such as eyes, nose, mouth, etc. and comparing their geometric relations (feature-based recognition) (Bledsoe, 1966; Kanade, 1973; Goldstein et al., 1971) or comparing the face to a whole facial template (template-based recognition) (Brunelli and Poggio, 1993).
It appears, however, that very few reliable ﬁducial points can be extracted from a 2D facial image in the presence of pose, illumination, and facial expression variability. As the result, feature-based algorithms are forced to use a limited set of points, which provide low discrimination ability between faces (Cox et al., 1996). Likewise, templates used in template matching approaches change due to variation of pose or facial expression (Brunelli and Poggio, 1993). Using elastic graph matching (Wiskott, 1995; Wiskott et al., 1997) as an attempt to account for the deformation of templates due to ﬂexibility of the facial surface has yielded limited success since the attributed graph is merely a ﬂat representation of a curved 3D object (Ortega-Garcia et al., 2004).
Appearance-based methods that treat facial images as vectors of a multidimensional Euclidean space and use standard dimensionality reduction techniques to construct a representation of the face (eigenfaces (Turk and Pentland, 1991) and similar approaches (Sirovich and Kirby, 1987; Hallinan, 1994; Pentland et al., 1994)), require accurate registration between facial images. The registration problem brings us back to identifying reliably ﬁducial points on the facial image independently of the viewing conditions and the internal variability due to facial expressions. As a consequence, appearance-based methods perform well only when the probe image is acquired in conditions similar to those of the gallery image (Gheorghiades et al., 2001).
The second alternative is to generate synthetic images of the face in new, unseen conditions. Generating facial images with new pose and illumination requires some 3D facial surface as an intermediate stage. It is possible to use a generic 3D head model (Huang et al., 2002), or estimate a rough shape of the facial surface from a set of observations (e.g. using photometric stereo (Georghiades et al., 1998;
Gheorghiades et al., 2001)) in order to synthesize new facial images and "IJCV - second review - 3".tex; 14/12/2004; 20:14; p.3 4 Michael M. Bronstein, Alexander M. Bronstein, Ron Kimmel then apply standard face recognition methods like eigenfaces (Sirovich and Kirby, 1987; Turk and Pentland, 1991) to the synthetic images.
Yet, facial expressions appear to be more problematic to synthesize.
Approaches modelling facial expressions as warping of the facial image do not capture the true geometric changes of the facial surface, and are therefore useful mainly for computer graphics applications. That is, the results may look natural, but fail to represent the true nature of the expression.
Figure 2 shows a simple visual experiment that demonstrates the generative approach. We created synthetic faces of Osama Bin Laden (Figure 2c) and George Bush (Figure 2d) in diﬀerent poses by mapping the respective textures onto the facial surface of another subject (Figure 2a,b). The resulting images are easily recognized as the world number one terrorist and the forty third president of the United States, though in both cases, the facial geometry belongs to a completely diﬀerent individual. This is explained by the property of the human visual system, which uses mainly the 2D information of the face to perform recognition.
Figure 2. Simple texture mapping on the same facial surface can completely change the appearance of the 2D facial image and make the same face look like George Bush or Osama Bin Laden.
Simple texture mapping in our experiment allowed to create naturallylooking faces, yet, the individuality of the subject concealed in the 3D geometry of his face was completely lost. This reveals the intrinsic weakness of all the 2D face recognition approaches: the face is a 3D object, and using only its 2D projection can be misleading. Exaggerating this example, if one had the ability to draw any face on his facial surface,
he could make himself look essentially like any person and deceive any 2D face recognition method. Practically, even with very modest instruments, makeup specialists in the theater and movie industry can change completely the facial appearance of actors.
1.2. Three-dimensional face recognition Three-dimensional face recognition is a relatively recent trend that in some sense breaks the long-term tradition of mimicking the human visual recognition system, like the 2D methods attempt to do. As evaluations such as the Face Recognition Vendor Test (FRVT) demonstrate in an unarguable manner that current state of the art in 2D face recognition is insuﬃcient for high-demanding biometric applications (Phillips et al., 2003), trying to use 3D information has become an emerging research direction in hope to make face recognition more accurate and robust.
Three-dimensional facial geometry represents the internal anatomical structure of the face rather than its external appearance inﬂuenced by environmental factors. As the result, unlike the 2D facial image, 3D facial surface is insensitive to illumination, head pose (Bowyer et al., 2004), and cosmetics (Mavridis et al., 2001). Moreover, 3D data can be used to produce invariant measures out of the 2D data (for example, given the facial surface, the albedo can be estimated from the 2D reﬂectance under assumptions of Lambertian reﬂection).
However, while in 2D face recognition a conventional camera is used, 3D face recognition requires a more sophisticated sensor, capable of acquiring depth information – usually referred to as depth or range camera or 3D scanner. The 3D shape of the face is usually acquired together with a 2D intensity image. This is one of the main disadvantages of 3D methods compared to 2D ones. Particularly, it prohibits the use of legacy photo databases, like those maintained by police and special agencies.
Early papers on 3D face recognition revealed the potential hidden in the 3D information rather than presented working algorithms or extensive tests. In one of the ﬁrst papers on 3D face recognition, Cartoux et al. (1989) approached the problem by ﬁnding the plane of bilateral symmetry through the facial range image, and either matching the extracted proﬁle of the face, or using the symmetry plane to compensate for the pose and then matching the whole surface. Similar approaches based on proﬁles extracted from 3D face data were also described in the follow-up papers by Nagamine et al. (1992), Beumier and Acheroy (1988) and Gordon (1997).
"IJCV - second review - 3".tex; 14/12/2004; 20:14; p.5 6 Michael M. Bronstein, Alexander M. Bronstein, Ron Kimmel Achermann et al. (1997), Hesher et al. (2003), Mavridis et al. (2001), Chang et al. (2003) Tsalakanidou et al. (2003) explored the extension of conventional dimensionality reduction techniques, like Principal Component Analysis (PCA), to range images or combination of intensity and range images. Gordon (1992) proposed representing a facial surface by a feature vector created from local information such as curvature and metric. The author noted that the feature vector is similar for diﬀerent instances of the same face acquired in diﬀerent conditions, except “variation due to expression” (Gordon, 1992).
Lee and Milios (1990) and Tanaka et al. (1998) proposed performing curvature-based segmentation of the range image into convex regions and compute the Extended Gaussian Image (EGI) locally for each region. A diﬀerent local approach based on Gabor ﬁlters in 2D and point signatures in 3D was presented by Wang et al. (2002).
Finally, many theoretical works including Medioni and Waupotitsch (2003), Achermann and Bunke (2000), as well as some commercial systems, use rigid surface matching in order to perform 3D face recognition. However, facial expressions do change signiﬁcantly the 3D facial surface not less than they change the 2D intensity image, hence modelling faces as rigid objects is invalid when considering facial expressions.
We note that the topic of facial expressions in 3D face recognition is very scarcely addressed in the literature, which makes diﬃcult to draw any conclusions about the robustness of available algorithms. Many of the cited authors mention the problem of facial expressions, yet, none of them has addressed it explicitly, nor any of the algorithms except in Wang et al. (2002) was tested on a database with suﬃciently large (if any) variability of facial expressions.
1.3. The 3DFACE approach