To get a feel for keypoint detection, I started off with just detecting the nose tips. Here are a few of the training data.
I trained a simple convolutional neural net with 4 convolutional layers and two fully connected layers to predict the location of the nose keypoint.
Training and Validation Losses:
Some good predictions of my model:
Some weak predictions of my model:
Notably, in the bad predictions that my model makes, the person's face is tilted or shifted. This is the "out of distribution" data in my set.
This part aims to detect the full facial keypoint structure of faces. To widen the breadth of the training distribution, some random rotations (+/- 15 degrees), translations (+/- 10 pixels) and pixel jittering are induced. Below are the results and visualizations from the network.
The network is a convolutional neural network (CNN) designed for facial keypoints detection. Below are the details of its architecture:
h × w.| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Output Dimensions |
|---|---|---|---|---|---|---|
| Conv1 | 1 | 8 | 7×7 | 1 | 3 | h/2 × w/2 |
| Conv2 | 8 | 14 | 5×5 | 1 | 2 | h/4 × w/4 |
| Conv3 | 14 | 20 | 3×3 | 1 | 1 | h/8 × w/8 |
| Conv4 | 20 | 26 | 3×3 | 1 | 1 | h/16 × w/16 |
| Conv5 | 26 | 32 | 3×3 | 1 | 1 | h/16 × w/16 |
The losses depicted are on the last image of each epoch, hence the high variance in the curve. However, there is certainly a downward trajectory and some sort of convergence.
In the images where detection was very poor, there are outstanding features that resemble certain creases the model is associating with the keypoints on which it was trained. For example, in Sample 13, the lower portion of the woman's smile is conflated with her chin. We can remedy this with a larger dataset.
Finally, we try full facial keypoint detection with a larger dataset. We use the same noising techniques as in the previous part.
We adapt a pre-trained ResNet18 for facial keypoint detection. Below are the modifications:
The following plot illustrates the training and validation loss across iterations:
Below are examples of keypoint predictions on the testing set:
Sample 9: The left side of the image is significantly less exposure, causing deterioration in keypoint detection.
Sample 10: Much of the face is out of view of the camera. The portion of the keypoint detection that is incorrect in this image is the portion of the face that isn't on the physical image.
Here are the results of running the trained model on three personal images:
Observations:
Here are the results of running the model on provided test images:
Observations:
The response function $g(Z)$ describes the logarithmic relationship between pixel values ($Z$) and exposure ($X = E \cdot \Delta t$). From Equation (2) in the paper, we derive:
Where:
To solve for $g$, we minimize a quadratic objective function:
Here, $\lambda$ is a regularization parameter controlling smoothness. The weighting function $w(Z)$ emphasizes values in the middle of the intensity range:
g curves are shown below
Once $g$ is recovered, we compute the logarithmic irradiance $\ln(E_i)$ for each pixel:
This combines information across exposures, reducing noise and artifacts.
The bilateral filtering process decomposes the HDR image into base and detail layers using the following steps:
First, we convert the HDR radiance map to the logarithmic domain:
The bilateral filter is applied with the following parameters:
The bilateral filter preserves edges while smoothing the image by combining domain and range filtering:
Where:
The image is decomposed into:
We compare three different tone mapping approaches:
Global tone mapping with automatic exposure adjustment:
In summary, both projects were super fun! Facial keypoint detection gave me the opportunity to mess around with neural netoworks while HDR taught me some nuances of image exposure.