# Emotion Recognition

Authors: Greg Szrom, Ben Shealy

In this notebook, we will show you how to create a neural network for emotion recognition. For this task, we used a dataset of faces from the UCI dataset repository. This dataset contains 20 individuals, each showing four different emotions: "angry", "happy", "neutral", and "sad". We trained a convolutional neural network (CNN) on this data using the emotions as labels.

This project is a work in progress, so anyone is welcome to pick up this project and attempt to improve the results!

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
import seaborn as sns
import skimage
import sklearn
import sklearn.metrics
import sklearn.model_selection
import sklearn.preprocessing
from tensorflow import keras

## Prepare the Data

The first step is to acquire and process the data so that it can be fed to the CNN. There are several preprocessing steps which we must perform, including:

- Remove bad data samples
- Convert images from PGM to PPM, which can be read by `skimage`
- Crop images from 128x120 to 120x120 so that they are square
- Generate 90, 180, and 270 degree rotations of each image
- Generate horizontal flips of each image (including rotated copies)
- Reorganize images according to emotion instead of individual

All of these tasks are easier to do in Bash instead of Python, so we will perform each step as a Bash script.

In [None]:
%%script bash

# remove existing dataset files
rm -rf faces.tar.gz faces

# download and extract dataset from UCI repository
wget -q http://archive.ics.uci.edu/ml/machine-learning-databases/faces-mld/faces.tar.gz
tar -xf faces.tar.gz

# remove some artifacts from dataset
rm -rf faces/.anonr faces/**/*.bad faces/**/*_2.pgm faces/**/*_4.pgm

# print the number of images in the dataset
ls faces/**/*.pgm | wc -l

In [None]:
%%script bash

# convert pgm files to ppm
for f in faces/**/*.pgm; do
    convert $f "$(dirname $f)/$(basename $f .pgm).ppm"
done

rm -rf faces/**/*.pgm

In [None]:
%%script bash

# crop images to be square (120x120)
for f in faces/**/*.ppm; do
    convert $f -resize 120x120^ -gravity center -extent 120x120 $f
done

In [None]:
%%script bash

# generate rotations of each image at 90, 180, and 270
for f in faces/**/*.ppm; do
    convert $f -rotate  90 "$(dirname $f)/090_$(basename $f)"
    convert $f -rotate 180 "$(dirname $f)/180_$(basename $f)"
    convert $f -rotate 270 "$(dirname $f)/270_$(basename $f)"
    mv $f "$(dirname $f)/000_$(basename $f)"
done

In [None]:
%%script bash

# generate horizontal flips of each image
for f in faces/**/*.ppm; do
    convert $f -flop "$(dirname $f)/flop_$(basename $f)"
done

In [None]:
%%script bash

# rearrange faces into subfolders by emotion
EMOTIONS="angry happy neutral sad"

mv faces faces-old

for EMOTION in ${EMOTIONS}; do
    mkdir -p faces/${EMOTION}
    mv faces-old/**/*_${EMOTION}_*.ppm faces/${EMOTION}
done

rm -rf faces-old

## Load the Data

Now that the dataset is ready, we can load it into Python. In particular, we'll create two numpy arrays: `X` contains the images, and `y` contains the numerical label for each image.

In [None]:
# infer class names from the sub-directory names
classes = os.listdir("faces")

# initialize empty data array and label array
num_samples = 8 * 624
X = np.empty((num_samples, 120, 120, 3), dtype=np.uint8)
y = np.empty((num_samples,), dtype=np.int64)

# iterate through sub-directories
i = 0

for k, class_name in enumerate(classes):
    # get list of images in class k
    filenames = os.listdir("faces/%s" % class_name)
    filenames = ["faces/%s/%s" % (class_name, f) for f in filenames]
    
    # load each image into numpy array
    for fname in filenames:
        X[i] = skimage.io.imread(fname)
        y[i] = k
        i += 1

print("X:", X.shape)
print("y:", y.shape)

## Visualize the Data

Before we do anything else with the data, let's visualize a few examples to make sure that everything looks good so far.

In [None]:
# define the size of the grid
rows = 4
cols = 4

# select several random samples from dataset
indices = np.random.choice(np.arange(len(X)), rows * cols)

# plot the images in a grid
plt.figure(figsize=(3 * cols, 3 * rows))

for i in range(rows * cols):
    index = indices[i]
    
    ax = plt.subplot(rows, cols, i + 1)
    plt.imshow(X[index], cmap="gray")
    plt.title("label = %d" % y[index])
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

plt.show()

## Create Train / Test Sets

In order to train the neural network, we need to split the dataset into train and test sets. The training set will be used to train the neural network, and then the test set will be used to evaluate the network's ability to recognize emotions. We'll also normalize the dataset to be [0, 1] instead of [0, 255], which generally helps in the training process.

In [None]:
# normalize the data
X = X.astype("float32") / 255.

# convert labels into one-hot labels
y = keras.utils.to_categorical(y, num_classes=4)

# split dataset into train set and test set
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.2)

# print shapes of train/test sets
print("X_train: %s" % str(X_train.shape))
print("y_train: %s" % str(y_train.shape))
print("X_test: %s" % str(X_test.shape))
print("y_test: %s" % str(y_test.shape))

## Train the CNN

Now that our data is ready, we can create and train our neural network. Here also we have several options to play with, including:

- Number of layers
- Number of filters per layer
- Optimizer
- Batch size
- Epochs

See if you can improve the accuracy of the network by tweaking these parameters...

In [None]:
# create a basic convolutional neural network
cnn = keras.models.Sequential() 
cnn.add(keras.layers.Conv2D(64, (3,3), padding="same", activation="relu", input_shape=(120,120,3)))
cnn.add(keras.layers.MaxPooling2D(2, 2))
cnn.add(keras.layers.Conv2D(128, (3,3), padding="same", activation="relu"))
cnn.add(keras.layers.MaxPooling2D(2, 2))
cnn.add(keras.layers.Conv2D(128, (3,3), padding="same", activation="relu"))
cnn.add(keras.layers.MaxPooling2D(2, 2))
cnn.add(keras.layers.Conv2D(128, (3,3), padding="same", activation="relu"))
cnn.add(keras.layers.MaxPooling2D(2, 2))
cnn.add(keras.layers.Conv2D(128, (3,3), padding="same", activation="relu"))
cnn.add(keras.layers.Flatten())
cnn.add(keras.layers.Dense(1024, activation="relu"))
cnn.add(keras.layers.Dense(4, activation="softmax"))

cnn.summary()

In [None]:
# train the convolutional neural network
cnn.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=["accuracy"])

history = cnn.fit(X_train, y_train, batch_size=32, epochs=20, validation_split=0.2)

## Evaluate the CNN

The final step is to evaluate our network using the test set. The network has not seen any of the data in the test set, so it should be a good way to tell whether the network has truly learned how to recognize the four emotions in this dataset.

The most basic metric which we'll use first is accuracy, the percentage of test images that the network classifies correctly. However, in case the network doesn't get a high accuracy, we need to be able to dig deeper into what exactly the network got wrong. For this, we'll create a confusion matrix, which succinctly shows what the network predicted versus what the correct answer was, for each group of samples.

In [None]:
# print the raw predictions of the cnn on the test set
# each value corresponds to how confident the network is that a sample belongs to a particular class
y_pred = cnn.predict(X_test, verbose=0) 

print("%12s %12s %12s" % ("Confidence", "Predicted", "Actual"))

n_correct = 0

for i in range(len(y_test)):
    confidence = np.amax(y_pred[i])
    y_pred_i = np.argmax(y_pred[i])
    y_test_i = np.argmax(y_test[i])

    if y_pred_i == y_test_i:
        n_correct += 1
        
    print("%12.3f %12d %12d %12d" % (confidence, y_pred_i, y_test_i, n_correct))
        
print("Accuracy: %0.3f" % (n_correct / len(y_test)))

In [None]:
def plot_confusion_matrix(y_true, y_pred, classes,
                          normalize=False,
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    # Compute confusion matrix
    cm = sklearn.metrics.confusion_matrix(y_true, y_pred)

    # apply normalization if specified
    if normalize:
        title = "Confusion matrix (normalized)"
        cm = cm.astype("float32") / cm.sum(axis=1)
    else:
        title = "Confusion matrix (not normalized)"

    fig, ax = plt.subplots()
    im = ax.imshow(cm, interpolation="nearest", cmap=cmap)
    ax.figure.colorbar(im, ax=ax)

    # We want to show all ticks...
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           # ... and label them with the respective list entries
           xticklabels=classes,
           yticklabels=classes,
           title=title,
           ylabel="True label",
           xlabel="Predicted label")

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")

    # Loop over data dimensions and create text annotations.
    fmt = ".2f" if normalize else "d"
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return ax



# plot confusion matrix to better understand the results
np.set_printoptions(precision=2)

classes = ["angry", "happy", "neutral", "sad"]
y_test2 = np.argmax(y_test, axis=1)
y_pred2 = np.argmax(y_pred, axis=1)

# plot non-normalized confusion matrix
plot_confusion_matrix(y_test2, y_pred2, classes=classes)

# Plot normalized confusion matrix
plot_confusion_matrix(y_test2, y_pred2, classes=classes, normalize=True)

plt.show()

## Conclusions

At the time of this writing, the CNN only achieves a test accuracy of ~25%, which is about the same as random guessing. This remains the case even after extensive experimentation with the dataset (using data augmentation) and the CNN (adjusting the number of layers, etc.). We suspect that the dataset we chose may not be large enough for the CNN to be able to learn the four emotions, or that it may not be informative enough. For example, the images still contain a lot of background noise and the facial features themselves are very small in the image. Therefore, some future directions would be acquiring more training data, removing background noise, and possibly experimenting with other network architectures or different models entirely.