Project 3: [Auto]Stitching Photo Mosaics

Project Overview

This project implements the automatic stitching of photo mosaics. For part A, the stitching is done by manually picking correspondence points between the images and using a homography matrix to project the images on the same plane.

Part A.1: Shoot the Pictures

This part includes sets of pictures shot with projective transformations, ie. rotating the camera while keeping the COP (center of projection) the same. Later part of the project will be done with these pictures.

Original Selfie
During Lecture (left)
Box Filter Result
During Lecture (middle)
Gradient Dx Result
During Lecture (right)
Original Selfie
Outside Soda (left)
Box Filter Result
Outside Soda (middle)
Gradient Dx Result
Outside Soda (right)
Original Selfie
Inside Sky Lab (left)
Box Filter Result
Inside Sky Lab (middle)
Gradient Dx Result
Inside Sky Lab (right)

Part A.2: Recover Homographies

This part introduces how we can recover the homography matrix with a set of correspondence points. Note that the homography matrix has 8 degrees of freedom, therefore we need at least 4 points to recover a homography matrix. More points are needed for a more robust recovery.

homography matrix
Homography Matrix

Since the 1 on the homography matrix is fixed, we cannot directly solve for the homography matrix. Therefore, we need to expand the system of equations and use least squares to solve for the homography matrix.

system of equations
System of Equations to Recover the Homography matrix

An example of manually picked correspondences and the recoevred homography matrix are shown below.

lecture_1_correspondence
During Lecture (left) Correspondences
lecture_2_correspondence
During Lecture (middle) Correspondences
recovered_H_matrix
The Recovered Homography Matrix

Part A.3: Warp the Images

This part introduces two methods to warp the images after recovering the homography matrix: nearest neighbor and bilinear interpolation. The nearest neighbor method is the simplest method to warp the images, where one only needs to find the pixel values of the nearest pixel in the original image (pixels are got from inverse mapping); he bilinear interpolation method finds the pixel values of the four nearest pixels in the original image and uses a weighted average to find the pixel values of the warped image. Here are some examples of the warped images (rectification) with both methods.

channing_court
Channing Court at 7 AM
channing_court_nearest_neighbor
Warped Channing Court (Nearest Neighbor)
channing_court_bilinear_interpolation
Warped Channing Court (Bilinear Interpolation)
roomates
Roomates
roomates_nearest_neighbor
Warped Roomates (Nearest Neighbor)
roomates_bilinear_interpolation
Warped Roomates (Bilinear Interpolation)

What? Lines in a tennis court are not straight? That is probably because of the distortion with the camera in the original image.

Note that the nearest neighbor warping results in more artifacts than the bilinear interpolation method, an example can be found at the top of the warped channing court image.

Part A.4:Blend the Images into a Mosaic

This part of the project blends the images into a mosaic. The procedure is simple:

  1. Manually pick correspondences between the images
  2. Pad the images with thick black borders so that we can have better view of the whole stitched image
  3. Use the picked points to recover a homography matrix
  4. Warp the images to the same plane
  5. Add a mask used for blending
  6. Blend the images into a mosaic using a weighted average calculated with the mask. To be more specific, I take the average of the images where the masks overlap with each other.

Below are some examples of the stitched images. The first set of images will be used as an example to demonstrate the procedure, and the intermediate results for the other two sets of images will not be shown here for brevity.

lecture_1_correspondence
During Lecture (left) Correspondences
lecture_2_correspondence
During Lecture (middle) Correspondences
lecture_2_for_3_correspondence
During Lecture (middle for right) Correspondences
lecture_3_correspondence
During Lecture (right) Correspondences
warped_lec_1
Warped Lecture (left)
mask_lec_1
Mask for Lecture (left)
warped_lec_2
Warped Lecture (middle)
mask_lec_2
Mask for Lecture (middle)
warped_lec_3
Warped Lecture (right)
mask_lec_3
Mask for Lecture (right)
during_lecture
During Lecture
soda_1
Outside Soda (left)
soda_2
Outside Soda (middle)
soda_3
Outside Soda (right)
outside_soda
Outside Soda
sky_1
Inside Sky Lab (left)
sky_2
Inside Sky Lab (middle)
sky_3
Inside Sky Lab (right)
inside_sky_lab
Inside Sky Lab

Part B.1: Harris Corner Detection

In this section, I use the Harris corner detection algorithm to detect the corners of the images. However, naively choosing the strongest corners will lead to a uneven distribution of the features. Therefore, I applied Adaptive Non-Maximal Suppression (ANMS) to choose the corners. The following images are the results of the Harris corner detection and the Adaptive Non-Maximal Suppression.

ANMS is implemented by the following steps:

  1. Detect the corners of the image using the Harris corner detection algorithm and sort the corners by the Harris response score
  2. Maintain a list of already selected corners, initialized as empty
  3. for the i-th corner, calculate the minimum distance to the already selected corners with a response score threshold of 0.9
  4. Sort the corners by the minimum distance and select the top k corners
lecture_1_harris_corners
Lecture 1 with all corners detected by Harris corner detection
lecture_1_strongest_500_harris_corners
Lecture 1 with 500 strongest corners
lecture_1_ANMS_500_harris_corners
Lecture 1 with 500 corners selected by ANMS

Part B.2: Feature Descriptor Extraction

This part of the project implements the feature descriptor extraction. The feature descriptor is a 8 by 8 square patch, downsampled from a 40 by 40 patch on the original scale. All the descriptors here are axis-aligned since no rotation is involved in taking the pictures. The following images are some examples of the feature descriptors from the left lecture image.

lecture_1_descriptor_patch_0
Lecture Descriptor Patch Example 1
lecture_1_descriptor_patch_1
Lecture Descriptor Patch Example 2
lecture_1_descriptor_patch_2
Lecture Descriptor Patch Example 3

Part B.3: Feature Matching

This part of the project implements the feature matching process. The feature matching is done by iterating through all the feature descriptors; for each descriptor, track the ratio between the errors of the best and the second best match. If the ratio is less than 0.7, then we consider it as a good match.

lecture_1_2_matches
Lecture Matches

Part B.4: RANSAC for Robust Homography

This part of the project implements the RANSAC algorithm to robustly estimate the homography matrix. The RANSAC algorithm is implemented by the following steps:

  1. Randomly sample 4 points from the matches
  2. Estimate the homography matrix using the 4 points
  3. Count the number of inliers
  4. Repeat the process for a certain number of iterations
  5. With the most inliers, estimate the homography matrix with all the inliers
  6. Return the homography matrix

The following images are some examples of the auto-stitched images and a comparison between the stitching with manually picked correspondences.

During Lecture (Stitched Manually)
During Lecture (Stitched with RANSAC)
Outside Soda (Stitched Manually)
Outside Soda (Stitched with RANSAC)
Inside Sky Lab (Stitched Manually)
Inside Sky Lab (Stitched with RANSAC)