CSE/EE 486:  Computer Vision I

Computer Project Report # : Project 5

CAMSHIFT Tracking Algorithm

Group #4: Isaac Gerg, Adam Ickes, Jamie McCulloch

Date: December 7, 2003


A.  Objectives
  1. Become familiar with object and feature tracking
  2. Study the effects of motion and its effects on optical flow.
  3. Study tracking techniques utilizing object hue.
  4. Implement the CAMSHIFT algorithm (PDF of article from Intel OpenCV (294k))
  5. Become familiar with Matlab programming and the Image Processing Toolbox.
B. Methods
There are two 'M' file for this project.

All coordinate references are defined as follows:

-X

^
|
|
|
+---------> Y

Where the + is the top left hand corner of image.

part1.m

1. Converts 14 image pairs of a video sequence to grayscale.
2. Computes the absolute difference between the pairs yielding 15 images. 

camshift.m

1. Implementation of the CAMSHIFT algorithm for tracking a hand in a video sequence.

Executing this project from within Matlab

At the command prompt enter:

>>part1

>>camshift

 

C. Results

The video sequence analyzed in this experiment is located here.

Results described in order following Methods section above.

The first 14 pairs of frames in the video sequence were absolutely differenced to study their effects. It appears that for this video sequence, tracking using this method would not yield good results. The hand object is not "closed" in all the differenced scene. Furthermore, thresholding would have to be done to remove some of the background noise. This is caused from small movements of the person and illumination and reflection differences.

In a tracking method such as this, one may want to utilize morpholoply techniques in an attempt to bound the object. Once and object is bound, its centroid can be computed. This enables us then to track the object. In busy scenes with much variation in movement and light, this method becomes very non-useful. 

This method is really only useful for transient optical flows. If the pixels intensities do not change, no motion will be visible in the difference frames. 

001-000.png (33186 bytes) 002-001.png (31833 bytes) 003-002.png (31565 bytes) 004-003.png (31286 bytes) 005-004.png (31066 bytes) 006-005.png (30657 bytes) 007-006.png (29908 bytes)
008-007.png (30127 bytes) 009-008.png (30235 bytes) 009-008.png (30235 bytes) 010-009.png (30736 bytes) 011-010.png (31081 bytes) 012-011.png (30283 bytes) 014-013.png (30569 bytes)
015-014.png (33088 bytes)
Figure 1 - Absolute difference of 14 pairs of frames. Frames 0-15 utilized. The frames are in sequence from left to right across the row.

 

CAMSHIFT Algorithm

The CAMSHIFT algorithm is based on the MEAN SHIFT algorithm. The MEAN SHIFT algorithm works well on static probability distributions but not on dynamic ones as in a movie. CAMSHIFT is based principles of the MEAN SHIFT but also a facet to account for these dynamically changing distributions.

CAMSHIFT's is able to handle dynamic distributions by readjusting the search window size for the next frame based on the zeroth moment of the current frames distribution. This allows the algorithm to anticipate object movement to quickly track the object in the next scene. Even during quick movements of an object, CAMSHIFT is still able to correctly track.

The CAMSHIFT algorithm is a variation of the MEAN SHIFT algorithm.

CAMSHIFT works by tracking the hue of an object, in this case, flesh color. The movie frames were all converted to HSV space before individual analysis.

CAMSHIFT was implemented as such:
1. Initial location of the 2D search window was computed.
2. The color probability distribution is calculated for a region slightly bigger than the mean shift search window.
3. Mean shift is performed on the area until suitable convergence. The zeroth moment and centroid coordinates are computed and stored.
4. The search window for the next frame is centered around the centroid and the size is scaled by a function of the zeroth movement.
5. Go to step 2. 

The initial search window was determined by inspection. Adobe Photoshop was used to determine its location and size. The inital window size was just big enough to fit most of the hand inside of it. A window size too big may fool the tracker into tracking another flesh colored object. A window too small will mostly quickly expand to an object of constant hue, however, for quick motion, the tracker may lock on the another object or the background. For this reason, a hue threshold should be utilized to help ensure the object is properly tracked, and in the event that an object with mean hue not of the correctly color is being tracked, some operation can be performed to correct the error. 

For each frame, its hue information was extracted. We noted that the hue of human flesh has a high angle value. This simplified our tracking algorithm as the probability that a pixel belonged to the hand decreased as its hue angle did. Hue thresholding was also performed to help filter out the background make the flesh color more prominent in the distributions.

The zeroth moment, moment for x, and moment for y were all calculated. The centroid was then calculated from these values.

xc = M10 / M00; yc = M01 / M00

The search window was then shifted to center the centroid and the mean shift computed again. The convergence threshold used was T=1. This ensured that we got a good track on each of the frames. A 5 pixel expansion in each direction of the search window was done to help track movement.

Once the convergent values were computed for mean and centroid, we computed the new window size. The window size was based on the area of the probability distribution. The scaling factor used was calculated by:

s = 1.1 * sqrt(M00)

The 1.1 factor was chosen after experimentation. A desirable factor is one that does not blow up the window size too quickly, or shrink it too quickly. Since the distribution is 2D, we use the sqrt of M00 to get the proper length in a 1D direction.

The new window size was computed with this scaling factor. It was noted that the width of the hand object was 1.2 times greater than the height. This was noted and the new window size was computed as such:

W = [ (s)   (1.2*s) ]

The window is centered around the centroid and the computation of the next frame is started.

Figure 2 - Probability distribution of skin. High intensity values represent high probability of skin. The search window and centroid are also superimposed on each frame. The frames are in sequence from top to bottom in each row. The frames displayed are 0, 19, 39, 59, 79.

 

Figure 3 - Actual frames from the movie. The search window and centroid are also superimposed on each frame. The frames are in sequence from top to bottom in each row. The frames displayed are 0, 19, 39, 59, 79.

 

Figure 4 - Motion plots of centroid. The image on the right has the first frame of the movie superimposed on the plot.

 

Frame: Coordinates
1: 111, 181
2: 109, 176
3: 106, 173
4: 104, 172
5: 110, 168
6: 112, 168
7: 115, 166
8: 118, 165
9: 118, 166
10: 123, 164
11: 126, 167
12: 130, 167
13: 131, 168
14: 131, 170
15: 134, 175
16: 137, 176
17: 137, 176
18: 140, 181
19: 144, 176
20: 145, 175
21: 146, 177
22: 146, 179
23: 145, 179
24: 145, 180
25: 145, 182
26: 145, 184
27: 149, 185
28: 151, 189
29: 151, 191
30: 150, 194
31: 148, 197
32: 149, 195
33: 149, 200
34: 143, 205
35: 142, 209
36: 144, 211
37: 145, 212
38: 144, 216
39: 141, 217
40: 142, 222
41: 142, 223
42: 140, 224
43: 139, 230
44: 140, 229
45: 139, 232
46: 138, 233
47: 137, 237
48: 136, 236
49: 134, 239
50: 132, 241
51: 132, 243
52: 129, 245
53: 130, 246
54: 130, 248
55: 127, 250
56: 123, 249
57: 124, 249
58: 124, 249
59: 123, 252
60: 120, 255
61: 120, 253
62: 119, 253
63: 124, 257
64: 120, 257
65: 112, 253
66: 111, 252
67: 110, 254
68: 109, 254
69: 106, 252
70: 109, 251
71: 108, 247
72: 106, 245
73: 104, 243
74: 108, 242
75: 109, 239
76: 108, 237
77: 109, 237
78: 109, 235
79: 110, 223
80: 109, 223
81: 111, 222
82: 111, 221
83: 112, 217
84: 113, 215
85: 113, 213
86: 113, 210
87: 115, 209
88: 116, 207
89: 116, 205
90: 114, 204
91: 115, 201
92: 119, 198
93: 117, 196
94: 120, 195
95: 121, 192
96: 120, 193
97: 123, 189
98: 123, 189
99: 124, 185
Figure 5 - Centroid coordinates of each frame.

Summary
All results were as expected in the experiment.

 

D. Conclusions

Object tracking is a very useful tool. Object can be tracked many ways including by color or by other features.

Tracking objects by difference frames is not always robust enough to work in every situation. There must be a static background and constant illumination to get great results. With this method, object can be tracked in only situations with transient optical flow. If the pixel values don't change, no motion will be detected.

The CAMSHIFT is a more robust way to track an object based on its color or hue. It is based after the MEAN SHIFT algorithm. CAMSHIFT improves upon MEAN SHIFT by accounting for dynamic probability distributions. It scales the search window size for the next frame by a function of the zeroth moment. In this way, CAMSHIFT  is very robust for tracking objects.

There are many variables in CAMSHIFT. One must decide suitable thresholds and search window scaling factors. One must also take into account uncertainties in hue when there is little intensity to a color. Knowing your distributions well helps to enable one to pick scaling values that help track the correct object. 

In any case, CAMSHIFT works well in tracking flesh colored objects. These object can be occluded or move quickly and CAMSHIFT usually corrects itself.

   
E. Appendix

Source Code

Movies

The hand tracking movies have the follow format parameters:

Fps: 15.0000
Compression: 'Indeo3'
Quality: 75
KeyFramePerSec: 2.1429

Automatically updated parameters:
TotalFrames: 99
Width: 320
Height: 240
Length: 0
ImageType: 'Truecolor'
CurrentState: 'Closed'

 

Time Management
Isaac spent ten hours working on this project. Adam spent two hours working on this project. Jamie spent two hours working on this project.