CSE/EE 486:  Computer Vision I

Computer Project Report # : Project 4

Disparity Maps and Depth Computations

Group #4: Isaac Gerg, Adam Ickes, Jamie McCulloch

Date: November 13, 2003


A.  Objectives
  1. Become familiar with stereo vision and its uses..
  2. Study epipolar geometry and its inherent constraints on the correspondence problem.
  3. Study how interesting points are determined.
  4. Study how depth is calculated and the problems with computation complexity associated with determining information for such computations.
  5. Implement a stereo vision depth analysis system.
  6. Become familiar with Matlab programming and the Image Processing Toolbox.
B. Methods
There are six 'M' file for this project.

All coordinate references are defined as follows:

-Y

^
|
|
|
+---------> X

Where the + is the top left hand corner of image.

corner_code.m

1. Determines interesting points by detecting corner utilizing eigenvalues and eigenvectors.
2. Produces a 'mat' file named 'X-Y_interesting_points' where X is scene name and Y is 'right' or 'left' depending on what image is analyzed. A matrix, goodpts, is stored in this file in the format of: y, x


DisparityOfInterestingPoints.m

1. Reads in the two images of a scene along with the interesting points for both the right and left images of the scene.
2. Computes a mapping of interesting points from the left image to the right image. These points are saved in a mat file (e.g. InitialDisparityMapCastle) in the format: xl, yl, xr, yr

allpoints.m

1. Reads in the 2 images of a scene along with the disparity map outputted from DisparityOfInterestingPoints.m
2. Computes a disparity mapping of all points in the image to the other image.  Outputs a mapping matrix in the format: xl, yl, xr, yr. File named in format of "CastleDisparityXAllPoints".
3. Creates an image of the disparity in the X direction (e.g. CastleDisparityX.png)

calculate_depth.m

1. Analyzes the mat file outputted by allpoints. Computes depth information and outputs it to an image. (e.g. CastleDepth.png);

FundementalMatrix.m

1. Reads in the output mat file from DisparityOfInterestingPoints.m.
2. Computes a normalized fundamental matrix and outputs the matrix and the null space of the matrix.

CorrelationFunction.m

1. Contains the SSD correlation function used during correlation.

 

Executing this project from Matlab

If you had a file entitled "file.m"; at the command prompt enter:

>>file

***For best results and ease, the files should be run in the order they are described above. To switch between images and scenes, the variables in the files must be rename appropriately. This allows much flexibility in using this code in future projects.

C. Results

Results described in order following Methods section above.

*Note: Images will not  be labeled by a figure number. The images in this experiment are meant to be inspected  and not directly referenced. The discussion of such images will be adjacent to the images in the flow of the document.

Original Images (Top scene is Castle, bottom scene is Villa)

Finding Interesting Points

The corners_code.m file that was given to us for this project was utilized. The threshold for the Villa scene was 200,000 and for the Castle was 150,000.

Interesting Points (top is Castle scene, bottom is Villa scene)

Disparity of Interesting Points (Mapping from left to right image)

From the interesting points previously computed, an algorithm was constructed to find corresponding points. It was found that if we used a large correlation window with a tight epipolar constraint, we yielded the most amount of corresponding points. If we would have used a looser epipolar constraint to account for error, we would have had to use a large correlation window and the time it would take to perform the computation would be unreasonable. Below is a table of the parameters used for each scene.

Villa Search Window (width, height) (65, 1)
Correlation Window (width, height) (9,9)
Castle Search Window (width, height) (125, 1)
Correlation Window (width, height) (7,7)

The width of  the search window was determined by computing the maximum disparity of each scene. This was mostly done by inspection.

While this code is running, it will output the percentage complete and the number of points found. This helped gauge the time it would take to compute the mappings with different search window and correlation window parameters.

The Sum of Squares Difference (SSD) was used as the correlation function.

Disparity of All Points (Mapping of all points in left to all points in right)

From the mappings already created in the previously explained code, we compute a mapping for all points using the following constraints:

  • Pixels between corresponding interesting points in one image map to pixels between corresponding interesting points  in the other image.

  • Restrict pixels to map along the epipolar lines.

Because the epipolar lines in these two scenes. are close to parallel with the image top and bottom (epipoles are close to infinity because they cameras both face 'straight ahead'.) we could make an assumption that all the points in the left image will have a high probability of mapping to points in a position left of the location in the right image. It was as if all the points in a certain depth plane were shifted to the left in the right image. This limited our search window to one half of the search window previously used. This made our code run much faster without sacrificing accuracy. We continued to use the tight epipolar constraint of one pixel high for the search window. We found this yielded the best results when used with a large correlation window.

Below are the images displaying the disparities in the X direction. The higher the disparity, the higher the intensity in the image. There was no Y disparity component computed since we used such a tight epipolar constraint.  This would yield an image which is all black as there would be no disparity.

Castle Villa

Notice how the different depth planes of each structure correspond to a different disparity. This indicates that these structures are not round, but flat along the plane.

Depth Computation

With the disparities computed for all pixels from the previous code, the depth could be computed with some extrinsic camera  parameters. These parameters for each scene are in the table below.

Castle f = 148.11mm
T = 20.00mm
Villa f = 57.44mm
T = 19.32mm

The equation for depth was: Depth = f*(T/disparity)

The images realized from these depth calculations did not show anything really useful. This was due to the fact that the imagesc function in Matlab is a linear scaling function. However, the Depth equation is non-linear. Therefore, by linearly normalizing all the points in the image you 'compress' many of the depths to values close to 0 or close to 1. A better would be to discretize the different disparities and then display those values. This is exactly what the disparity mapping image of the previously section displays. 

To get a true sense of depth, the depth values should be scaled by a non linear transform (C/x or similar equation perhaps). This would show the true sense of depth and not just display information about the different image planes.

The output of the depths is shown below.

Castle Villa

Fundamental Matrix Computation

To ensure the fundamental matrix, F,  was computed as best as possible, the disparities (mappings) used in the computation were the ones derived from the interesting points. If we used all the points, there would have been much error as you can see many of the points mapped incorrectly due to constraints in processing power and time.

F was computed using a normalization method to prevent numerical instabilities. This was done by multiplying the points in each image of a scene, left and right, by a transformation matrix which translated and scaled the points. The points were translated such that their center was around the origin and scaled so that their average distance from the center was 1.

F was then computed using the EIGHT_POINT algorithm demonstrated on page 156 of Trucco and Verri (Introductory Techniques for 3-D computer Vision).

After F was computed, each transform for the left and right points was multiplied by F to denormalize the matrix.
F' = Tr' * F * Tl

F is then outputted along with its null space (null(F)) to verify that the epipolar line runs to infinity.

The output from this operation is below.

Castle
F =

-0.0000 -0.0000 0.0000
0.0000 0.0000 0.0029
-0.0000 -0.0026 -0.0713

>> null(F)

ans =

-1.0000
0.0000
0.0000
Villa F =

0.0000 0.0000 -0.0000
-0.0000 0.0000 -0.0032
0.0000 0.0031 0.0331

>> null(F)

ans =

-1.0000
0.0000
0.0000

As shown, the epipolar lines (null(F)) are run to infinity.

 

Summary
All results were as expected in the experiment.

 

D. Conclusions

Stereo Vision is a fundamental concept in computer vision. Moreso, depth computation can be extremely useful and necessary

In the real world, these algorithms utilized in the experiment would be heavily optimized and probably executed on special hardware in which the code for these algorithms was modified to tightly match the hardware abilities.

The better the algorithm to realize interesting points, the better the depth computations. Ideally, an sophisticated method of computing correspondences would be utilized. Such a system might use object recognition and chain codes to better detect correspondences in left and right images.

With our algorithm though, it would be ideal if the search window could be much larger in height and the correlation window could also be much larger in both dimensions. This would realize highly accurate correspondences. However, due to the amount of time it would take to run such an algorithm, this would be not feasible.

As mentioned in the Report Section, a different non linear scaling scheme should be used for the values computed by the depth algorithm.

The fundamental matrix computation as stated in the book (Trucco and Verri) can have some numerical instabilities associated with it. These instabilities can be remove though with normalization techniques. However, there may be some other techniques to determine such a matrix. Ideally, one would want a transform that given the transform matrix T, by multiplying a point on one image by T, you would get the corresponding point in the other image. This would make depth computations very easy. However, such a matrix with associated complexity of computation is not realizable with elementary techniques.

   
E. Appendix

Source Code

See directory: source

Time Management
Isaac spent twenty hours working on this project. Adam spent zero hours working on this project. Jamie spent zero hours working on this project.