«Zhiwei Zhu a, Qiang Ji b a E-mail:zhuz Telephone: 1-518-276-6040 Department of Electrical, Computer, and Systems Engineering, Rensselaer ...»
Robust Real-Time Eye Detection and
Tracking Under Variable Lighting Conditions
and Various Face Orientations
Zhiwei Zhu a, Qiang Ji b
Department of Electrical, Computer, and Systems Engineering,
Rensselaer Polytechnic Institute
JEC 6219, Troy, NY 12180-3590
Department of Electrical, Computer, and Systems Engineering,
Rensselaer Polytechnic Institute JEC 6044, Troy, NY 12180-3590 Abstract Most eye trackers based on active IR illumination require distinctive bright pupil eﬀect to work well. However, due to a variety of factors such as eye closure, eye occlusion, and external illumination interference, pupils are not bright enough for these methods to work well. This tends to signiﬁcantly limit their scope of application.
In this paper, we present an integrated eye tracker to overcome these limitations.
By combining the latest technologies in appearance-based object recognition and tracking with active IR illumination, our eye tracker can robustly track eyes under variable and realistic lighting conditions and under various face orientations.
In addition, our integrated eye tracker is able to handle occlusion, glasses, and to simultaneously track multiple people with diﬀerent distances and poses to the camera. Results from an extensive experiment shows a signiﬁcant improvement of our technique over existing eye tracking techniques.
Key words: Eye Tracking, Support Vector Machine, Mean Shift, Kalman Filtering Preprint submitted to Elsevier Science 5 July 2004 1 Introduction As one of the salient features of the human face, human eyes play an important role in face detection, face recognition and facial expression analysis.
Robust non-intrusive eye detection and tracking is a crucial step for vision based man-machine interaction technology to be widely accepted in common environments such as homes and oﬃces. Eye tracking has also found applications in other areas including monitoring human vigilance , gaze-contingent smart graphics , and assisting people with disability. The existing work in eye detection and tracking can be classiﬁed into two categories: traditional image based passive approaches and the active IR based approaches. The former approaches detect eyes based on the unique intensity distribution or shape of the eyes. The underlying assumption is that the eyes appear diﬀerent from the rest of the face both in shape and intensity. Eyes can be detected and tracked based on exploiting these diﬀerences. The active IR based approach, on the other hand, exploits the spectral (reﬂective) properties of pupils under near IR illumination to produce the bright/dark pupil eﬀect. Eye detection and tracking is accomplished by detecting and tracking pupils.
The traditional methods can be broadly classiﬁed into three categories: template based methods [3–9,8,10,11], appearance based methods [12–14] and feature based methods [15–23]. In the template based methods, a generic eye model, based on the eye shape, is designed ﬁrst. Template matching is then used to search the image for the eyes. Nixon  proposed an approach for accurate measurement of eye spacing using Hough transform. The eye is modeled by a circle for the iris and a “tailored” ellipse for the sclera boundary.
Their method, however, is time-consuming, needs a high contrast eye image, and only works with frontal faces. Deformable templates are commonly used [3–5]. First, an eye model, which is allowed to translate, rotate and deform to ﬁt the best representation of the eye shape in the image, is designed. Then, the eye position can be obtained through a recursive process in an energy minimization sense. While this method can detect eyes accurately, it requires the eye model be properly initialized near the eyes. Furthermore, it is computationally expensive, and requires good image contrast for the method to converge correctly.
The appearance based methods ,,  detect eyes based on their photometric appearance. These methods usually need to collect a large amount of training data, representing the eyes of diﬀerent subjects, under diﬀerent face orientations, and under diﬀerent illumination conditions. These data are used to train a classiﬁer such as a neural network or the Support Vector Machine and detection is achieved via classiﬁcation. In , Pentland et al. extended the eigenface technique to the description and coding of facial features, yielding eigeneyes, eigennoses and eigenmouths. For eye detection, they extracted an appropriate eye templates for training and constructed a principal component projective space called “Eigeneyes”. Eye detection is accomplished by comparing a query image with an eye image in the eigeneyes space. Huang et al.  also employed the eigeneyes to perform initial eye positions detection. Huang et al.  presented a method to represent eye image using wavelets and to perform eye detection using RBF NN classiﬁer. Reinders et al.  proposed several improvements on the neural network based eye detector. The trained neural network eye detector can detect rotated or scaled eyes under diﬀerent lighting conditions. But it is trained for the frontal view face image only.
Feature based methods explore the characteristics (such as edge and intensity of iris, the color distributions of the sclera and the ﬂesh) of the eyes to identify some distinctive features around the eyes. Kawato et al  proposed a feature based method for eyes detection and tracking. Instead of detecting eyes, they propose to detect the point between two eyes. The authors believe the point is more stable and easier to detect than the eyes. Eyes are subsequently detected as two dark parts, symmetrically located on each side of the between-eyepoint. Feng et al. [8,9] designed a new eye model consisting of six landmarks (eye corner points). Their technique ﬁrst locates the eye landmarks based on the variance projection function (VPF) and the located landmarks are then employed to guide the eye detection. Experiment shows their method will fail if the eye is closed or partially occluded by hair or face orientation. In addition, their technique may mistake eyebrows for eyes. Tian et al.  proposed a new method to track the eye and recover the eye parameters. The method requires to manually initialize the eye model in the ﬁrst frame. The eye’s inner corner and eyelids are tracked using a modiﬁed version of the Lucas-Kanade tracking algorithm . The edge and intensity of iris are used to extract the shape information of the eye. Their method, however, requires a high contrast image to detect and track eye corners and to obtain a good edge image.
In summary, the traditional image based eye tracking approaches detect and track the eyes by exploiting eyes’ diﬀerences in appearance and shape from the rest of the face. The special characteristics of the eye such as dark pupil, white sclera, circular iris, eye corners, eye shape, etc. are utilized to distinguish the human eye from other objects. But due to eye closure, eye occlusion, variability in scale and location, diﬀerent lighting conditions, and face orientations, these diﬀerences will often diminish or even disappear. Wavelet ﬁltering [25,26] has been commonly used in computer vision to reduce illumination eﬀect by removing subbands sensitive to illumination change. But it only works under slight illumination variation. Illumination variation for eye tracking applications could be signiﬁcant. Hence, the eye image will not look much diﬀerent in appearance or shape from the rest of the face, and the traditional image based approaches can not work very well, especially for faces with non-frontal orientations, under diﬀerent illuminations, and for diﬀerent subjects.
Eye detection and tracking based on the active remote IR illumination is a simple yet eﬀective approach. It exploits the spectral (reﬂective) properties of the pupil under near IR illumination. Numerous techniques [27–31,1] have been developed based on this principle, including some commercial eye trackers [32,33]. They all rely on an active IR light source to produce the dark or bright pupil eﬀects. Ebisawa et al.  generate the bright/dark pupil images based on a diﬀerential lighting scheme using two IR light sources (on and oﬀ camera axis). The eye can be tracked eﬀectively by tracking the bright pupils in the diﬀerence image resulting from subtracting the dark pupil image from the bright pupil image. Later in , they further improved their method by using pupil brightness stabilization to eliminate the glass reﬂection. Morimoto et al.
 also utilize the diﬀerential lighting scheme to generate the bright/dark pupil images, and pupil detection is done after thresholding the diﬀerence image. A larger temporal support is used to reduce artifacts caused mostly by head motion, and geometric constraints are used to group the pupils.
Most of these methods require distinctive bright/dark pupil eﬀect to work well. The success of such a system strongly depends on the brightness and size of the pupils, which are often aﬀected by several factors including eye closure, eye occlusion due to face rotation, external illumination interferences, and the distances of the subjects to the camera. Figures 1 and 2 summarize diﬀerent conditions under which the pupils may not appear very bright or even disappear. These conditions include eye closure (Figure 1 (a)) and oblique face orientations (Figure 1 (b) (c) and (d)), presence of other bright objects (due to either eye glasses glares or motion) as shown in Figure 2 (a) and (b), and external illumination interference as shown in Figure 2 (c).
The absence of the bright pupils or even weak pupil intensity poses serious problems to the existing eye tracking methods using IR for they all require relatively stable lighting conditions, users close to the camera, small out-ofplane face rotations, and open and un-occluded eyes. These conditions impose (a) (b) (c) Fig. 2. (a)original image, (b)the corresponding thresholded diﬀerence image, which contains other bright regions around the real pupil blobs due to either eye glasses glares and rapid head motion, (c) Weak pupils intensity due to strong external illumination interference.
serious restrictions on the part of their systems as well as on the user, and therefore limit their application scope. Realistically, however, lighting can be variable in many application domains, the natural movement of head often involves out-of-plane rotation, eye closures due to blinking and winking are physiological necessities for humans. Furthermore, thick eye glasses tend to disturb the infrared light so much that the pupils appear very weak. It is therefore very important for the eye tracking system to be able to robustly and accurately track eyes under these conditions as well.
To alleviate some of these problems, Ebisawa  proposed an image diﬀerence method based on two light sources to perform pupil detection under various lighting conditions. The background can be eliminated using the image diﬀerence method and the pupils can be easily detected by setting the threshold as low as possible in the diﬀerence image. They also proposed an ad hoc algorithm for eliminating the glares on the glasses, based on thresholding and morphological operations. However, the automatic determination of the threshold and the structure element size for morphological operations is diﬃcult; and the threshold value cannot be set as low as possible considering the eﬃciency of the algorithm. Also, eliminating the noise blobs just according to their sizes is not enough. Haro proposed to perform pupil tracking based on combining eye appearance, the bright pupil eﬀect, and motion characteristics so that pupils can be separated from other equally bright objects in the scene. To do so, they proposed to verify the pupil blobs using conventional appearance based matching method and the motion characteristics of the eyes.
But their method can not track the closed or occluded eyes or eyes with weak pupil intensity due to external illuminations interference. Ji et al.  proposed a real time subtraction and a special ﬁlter to eliminate the external light interferences. But their technique fails to track the closed/occluded eyes. To handle the presence of other bright objects, their method performs pupil veriﬁcation based on the shape and size of pupil blobs to eliminate spurious pupils blobs.
But usually, spurious blobs have similar shape and size to those of the pupil blobs as shown in Figure 2 and make it diﬃcult to distinguish the real pupil blobs from the noise blobs based on only shape and size.
In this paper, we propose a real-time robust method for eye tracking under variable lighting conditions and face orientations, based on combining the appearance-based methods and the active IR illumination approach. Combining the respective strengths of diﬀerent complementary techniques and overcoming their shortcomings, the proposed method uses an active infrared illumination to brighten subject’s faces to produce the bright pupil eﬀect. The bright pupil eﬀect and the appearance of eyes are utilized simultaneously for eyes detection and tracking. The latest technologies in pattern classiﬁcation recognition (the Support Vector Machine) and in object tracking (the meanshift) are employed for pupil detection and tracking based on eyes appearance.
Some of the ideas presented in this paper have been brieﬂy reported in ,.
In this paper, we report our algorithm in details.