«MULTILAYER BACKGROUND MODELING UNDER OCCLUSIONS FOR SPATIO-TEMPORAL SCENE ANALYSIS A Dissertation Presented to The Academic Faculty By Shoaib Azmat ...»
MULTILAYER BACKGROUND MODELING UNDER
OCCLUSIONS FOR SPATIO-TEMPORAL SCENE
The Academic Faculty
In Partial Fulﬁllment
of the Requirements for the Degree
Doctor of Philosophy
Electrical and Computer Engineering
School of Electrical and Computer Engineering
Georgia Institute of Technology
Copyright © 2014 by Shoaib Azmat
MULTILAYER BACKGROUND MODELING UNDER
OCCLUSIONS FOR SPATIO-TEMPORAL SCENEANALYSIS
Dr. Linda Wills, Advisor Dr. Bo Hong School of ECE School of ECE Georgia Institute of Technology Georgia Institute of Technology Dr. Aaron Lanterman Dr. Scott Wills, Co-Advisor School of ECE (Posthumous) Georgia Institute of Technology School of ECE Georgia Institute of Technology Dr. Jeﬀrey Vetter Dr. James Hamblen College of Computing Georgia Institute of Technology School of ECE Georgia Institute of Technology Date Approved: June 10, 2014 In memory of Dr. Scott Wills
I am grateful to Dr. James Hamblen, Dr. Bo Hong, Dr Aaron Lanterman, and Dr.
Jeﬀrey Vetter for serving on my committee, and for providing valuable feedback and suggestions. I want to thank the Higher Education Commission of Pakistan and the Fulbright Program of USA for giving me scholarship, to pursue my graduate studies. I also want to thank my colleagues of the MOVES Lab at Georgia Tech, Dr. Dana Forsthoefel and Qianao Ju, for their support and company.
In the end, I want to thank my parents, R D Khan and M J Khan, and my siblings R Azmat, B Azmat, S Azmat, N Azmat, A Khan, H Khan, and R Khan. Their support and encouragement always acted as a catalyst for achieving my goals.
Figure 4 Unimodal vs multimodal background modeling.............. 18 Figure 5 Traditional vs two-layer background modeling.............. 21 Figure 6 Two-layer background modeling pixel-level................ 21 Figure 7 Need for multi-layer background modeling................ 23
Figure 11 Ghost removal based on three histograms................. 32 Figure 12 Changing background scenario: (a) Original BG (b) Brown box added (c) Red box added (use main background layer (a) to calculate H2 and H3) 32
Figure 14 Abandoned object detection in a crowded scene at diﬀerent points in time 37 Figure 15 Blocks added and removed at diﬀerent points in time, a red circle indicates new entries while blue indicates the ones already there, a black circle indicates the objects removed while white indicates the ones removed from the initial background: (a) Initial background (b) Three blocks added (c) Three more blocks added, three removed including one from initial background (d) Three more blocks added.......... 37 Figure 16 Object layer removal based on occlusion reasoning: (a) Brown box added (b) Red box added occluding brown box (c) Brown box removed (d) Brown box added occluding red box.................... 38 Figure 17 Occlusion reasoning eﬀect: (a) Original image (b) Ground truth (c) Pixelbased  (d) Pixel-based  (e) Object-based [TM3]........... 38
Figure 19 Pixel vs. object-based modeling: (a) Original image (b) Ground truth (c) Pixel-based  (d) Pixel-based  (e) Object-based [TM3]........ 39 Figure 20 Outdoor cars, ﬁltering at 50% observability threshold: (a) Original image (b) Ground truth (c) Unﬁltered (d) Filtered.............. 41 Figure 21 Indoor boxes, ﬁltering at 50% observability threshold: (a) Original image (b) Ground truth (c) Unﬁltered (d) Filtered.............. 41 Figure 22 Outdoor cars: (a) Total number of pixel errors at 50% observability (b) FP vs TP at 50% observability (c) % Observability vs. no. of layer errors 42 Figure 23 Indoor boxes: (a) Total number of pixel errors at 50% observability (b) FP vs TP at 50% observability (c) % Observability vs. no. of layer errors 43
Figure 26 Spatial displacement scenarios: Scenario1, moved object from original background; Scenario2, moved object; Scenario3, partially displaced object; Scenario4, partially occluded object................. 51 Figure 27 A change in a bag position has been recognized.............. 51 Figure 28 An object distance with itself dist(PP) and a diﬀerent object dist(PQ) in the four scenarios: (a) 64-bin histogram (b) 512-bin histogram (c) 4096-bin histogram............................. 52
Figure 31 Un-coalesced array of structures (left), coalesced structure of arrays (right) 61 Figure 32 Asus AT3IONT-I NVIDIA ION GPU platform.............. 65
Figure 34 Speed ups over a single core of Atom CPU as a result of various performance optimizations, cumulatively applied left to right.......... 67 Figure 35 Speed ups for diﬀerent number of pixels per thread implementations over a single pixel per thread implementation.................. 67
Figure 39 TM3 speed bottlenecks temporarily removed for testing (column 2-4) results in higher fraction of the MMM speed for TM3 on ION GPU, the ﬁrst & last column again show TM3 speed as a fraction of MMM on ION and Atom respectively from the previous ﬁgure........... 72
This dissertation presents an eﬃcient multi-layer background modeling approach to distinguish among midground objects, the objects whose existence occurs over varying time scales between the extremes of short-term ephemeral appearances (foreground) and longterm stationary persistences (background). The dissertation consists of three contributions.
In the ﬁrst contribution, a multilayer object-based background modeling technique, called temporal multimodal mean TM3, is presented for video surveillance. The technique temporally models a scene in which there are multiple interacting midground objects occurring at diﬀerent time scales. The approach correctly models scenes with long-term occlusions and ghost objects as compared to the multilayer pixel-based background modeling approaches. TM3 technique represents a scene, with multiple midground objects entering, leaving, and occluding each other at diﬀerent points in time. This leads to richer information about temporal properties of a scene than traditional foreground/background segmentation. The information includes when a particular object arrived or left the scene, and the occlusion relationships among diﬀerent objects while they are in the scene.
The multi-layer (and two-layer) background modeling techniques that model objects that have become stationary will incorrectly detect a new object if an existing midground or background object is displaced. The second contribution presents a novel spatio-temporal reasoning mechanism, called spatio-temporal multimodal mean STM3, based on multilayer background modeling and objects appearances to conserve the state of moved objects in a scene. The algorithm is an extension of our temporal multimodal mean TM3 algorithm to spatial analysis. The STM3 algorithm, consistently models midground/background objects upon partial/full change of position, and maintains conservation of existing objects, only removing them once they leave the scene. An important feature of this algorithm is that it avoids false detections of new objects when existing objects are displaced in the scene.
balance accuracy, speed, and power. Due to its inherent parallelism, robust adaptive background modeling, such as the Gaussian mixture model (GMM), has been implemented on graphical processing units (GPUs) with signiﬁcant performance improvements over CPUs.
However, these implementations are infeasible in embedded applications due to the high power ratings, in the range of 100 watts, of the targeted general-purpose NVIDIA GeForce GPU platforms. The third contribution focuses on how data and thread-level parallelism is exploited and memory access patterns are optimized to target a low-cost robust adaptive background modeling algorithm multimodal mean (MMM) to a low-power GPU NVIDIA ION with thermal design power (TDP) of only 12 watts. The algorithm has comparable accuracy with the GMM algorithm, but less computational cost. Accelerating this technique is also important because it is at the core of our spatio-temporal multi-layer background modeling algorithms TM3/STM3. We have achieved a frame rate of 392fps with a full VGA resolution (640x480) frame on the NVIDIA ION GPU. This is a 20X speed-up of the MMM algorithm on the GPU compared to the embedded CPU platform Intel Atom of comparable TDP. Moreover, our GPU implementation of MMM outperforms the GPU implementation of GMM by achieving a speed up of 6x. Subsequently, we extended the MMM GPU implementation to our multi-layer background modeling algorithm TM3, and achieved 5x speed up over the Atom CPU implementation.
The demand for video surveillance systems in public places and industry has increased dramatically. A recent survey shows that an estimated 1.85 million surveillance cameras have been deployed in the United Kingdom alone . Many modern cities now have a network of surveillance cameras, deployed across metropolitan regions by multiple coordinated public/private agencies. These cameras are used in places such as streets, airports, subway stations, malls, and oﬃces to detect abnormal activity. This enables many public safety applications including intruder detection, abandoned object detection, people counting, and traﬃc violation detection. Cameras are also extensively deployed in industry for process monitoring and product inspection, and in health facilities for improved patient care such as fall detection.
Requiring human operators to monitor video feeds is tedious, error prone, and simply infeasible. Advances in video technology has made automated video surveillance systems attractive in reducing the burden and tedium of manual monitoring. The desirability of portable and low-cost automated video surveillance systems, for example in outdoor settings, has led to the emergence of embedded smart surveillance cameras. These cameras have limited available power and computational resources, demanding eﬃcient low-cost algorithms.
A core problem in automated visual surveillance is background modeling. This is the problem of separating salient, moving foreground from uninteresting, stationary background. Traditional background modeling divides a given scene into foreground and background regions. However, the real world can be much more complex than this simple classiﬁcation, and object appearance events often occur over varying time scales. There are situations in which objects appear on the scene at diﬀerent points in time and become stationary; these objects can get occluded by one another, and can change positions or be removed from the scene. Inability to deal with such scenarios involving midground objects results in errors, such as ghost objects (when newly revealed background, due to removal of an object, is mistaken as a new midground object), miss-detection of overlapping objects, and aliasing caused by the objects that have left the scene but are not removed from the model. Modeling temporal layers of multiple objects can overcome these errors, and enables the surveillance of scenes containing multiple midground objects.
This dissertation is focused on modeling temporal layers of multiple objects and it speciﬁcally targets embedded surveillance systems, requiring a real-time, energy eﬃcient and low-cost solution. One approach is to model these multiple midground objects using a tracking algorithm, but the computational cost is prohibitively high for applications in a resource-constrained embedded environment. This dissertation pursues the goal of eﬃciently modeling multiple midground objects using layers of low-cost background modeling, and discusses the challenges that arise in achieving this goal.
A few existing pixel-based approaches attempt to address this challenge by maintaining multiple layers , . However, the problem with pixel-based modeling is that it is unable to deal with 1) long-term occlusions, and 2) ghost objects created by movement of objects in the original background. On a pixel level, one can delete object pixels not seen for a long time, but doing so will result in a new object in the scene if that object reappears.
If an occluded pixel is not deleted, even if it has been occluded for a long period, then if the occluded object moves out of the scene, the pixel will remain in the model which will take extra space, and cause aliasing with overlapping objects. In addition, at the pixel level, it is diﬃcult to reason about the order of occlusion among objects, and to suppress ghost objects created by movement of objects in the original background. Moreover, if an original background object is moved to a diﬀerent location in a scene, then the existing multi-layer background modeling techniques will detect a new object at the new location in addition to a ghost object at the original location.