Training CNN
This section describes the code behind training our convolutional neural network for recognizing hot dog vs carrots vs bananas. We assume that we've already validated and taken a look at the images we are working with. (Check out Get Data
section.)
Importing libraries and defining paths and constants¶
import os
import argparse
import simplejson
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import numpy as np
import pandas as pd
import glob
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.regularizers import l2
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.layers import Dense, Activation, Conv2D, Flatten, Dropout, MaxPooling2D
from keras.callbacks import ModelCheckpoint
from keras.applications.inception_v3 import InceptionV3, preprocess_input
keras.backend.clear_session()
Like before, we define the data file path and some important constants including the batch size and image size ($299$ x $299$ pixels).
#define paths and constants
cwd = os.getcwd()
data_path = os.path.join(cwd, 'data')
#data_path = "/Users/victorialiu/git/creatica/code/data/"
batch_size = 16
TARGET_SIZE = 299
Command Line Argument Parser¶
We want to be able to run our code from the command line (at least in the .py
version of this notebook), so we use an argument parser to translate command line arguments. We require a model name when running the script, as well as what we want the regularizer strength to be (default to $0$). As we'll see later, having a model name makes it much easier to compare models.
## Parse command line arguments
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-m', '--model-name',
help='prefix for file to save trained model to ' +
'(e.g. dense_arch1, conv_regularize05, etc.)',
required=True)
parser.add_argument('-r', '--regularizer-strength',
help='strength of l2 regularization to use',
type=float, default=0.00)
return parser.parse_args()
Image Pre-processing with InceptionV3 net¶
Next, we do data augmentation in order to "create" more data to train from. Data augmentation includes shifting the image in small ways such that the same image can be trained from multiple perspective (i.e. a rotated image of a hot dog is still a hot dog, and now we'll have more training data). We also make sure to normalize the image by dividing by the maximum pixel value of $255$. We also write a helper function to easily call based on whether we are using testing or training data.
def image_data_augment(rescale=1/255, shear_range = False, zoom_range = False, horizontal_flip = False):
#declare ImageDataGenerator class for augmenting images using shear, zoom, and flips
#normalize with 1./255
return (ImageDataGenerator(
rescale=rescale,
shear_range=shear_range,
zoom_range=zoom_range,
horizontal_flip=horizontal_flip))
def dataframe_categories(train_or_test):
#define new data_path to make life easier
data_subpath = os.path.join(data_path, train_or_test)
#initiate lists for dataframe
images_names = []
categories = []
for category in os.listdir(data_subpath):
#no .DS_Store omg I hate .DS_Store
if category != '.DS_Store':
for image in os.listdir(
os.path.join(data_subpath, category)
):
#only use jpg
if image[-4:] == '.jpg':
images_names.append(f'{category}/{image}')
categories.append(category)
images_names = np.array(images_names)
categories = np.array(categories)
df = pd.DataFrame({
'filename' : images_names,
'class' : categories
})
return df
def get_images(train_or_test):
#augment images
if train_or_test == 'train':
datagen = image_data_augment(shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True)
else:
datagen = image_data_augment()
df = dataframe_categories(train_or_test)
generator = datagen.flow_from_dataframe(
df,
directory = os.path.join(data_path, train_or_test),
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size=batch_size,
class_mode='categorical',
shuffle=False,
validate_filenames=False
)
return generator
In the effort of saving time training the model (which could take overnight and use up precious time) and making the model as accurate as possible, we use transfer learning, where we piggy-back on a pre-trained CNN model that will extract out important features of our data through the bottleneck in the network. We use InceptionV3, and the standard image pre-processing pipeline will always include feeding it through InceptionV3 first. This will make prediction a little slower, but the accuracy and significantly decreased training time (order of minutes) is so worth it. Lastly, we write function get_data()
that calls all the previous image-preprocessing helper functions to streamline the process. Notably, we use one-hot vectors for our data labels.
def preprocess_inception():
"""
inception for transfer learning
"""
#transfer learning with InceptionV3, a pre-trained cnn
model = InceptionV3(
weights='imagenet',
include_top=False,
input_shape=(TARGET_SIZE, TARGET_SIZE, 3),
classes = 3
)
for train_or_test in [
'train',
'test'
]:
generator = get_images(train_or_test)
bottleneck_features = model.predict(generator, len(generator), verbose=1)
#save model with the bottleneck features
np.savez(f'inception_features_{train_or_test}', features=bottleneck_features)
return True
def get_data():
#augment images and use inception net
preprocess_inception()
#load training data and define labels, where 0 is hotdog and 1 is nothotdog
train_data = np.load('inception_features_train.npz')['features']
#requires the number of hotdog and nothotdog samples to be the exact same
train_data_type_count = int(len(train_data) / 3)
train_labels = np.array(
[0] * train_data_type_count +
[1] * train_data_type_count +
[2] * train_data_type_count
)
#load testing data and define labels, where 0 is hotdog and 1 is nothotdog
test_data = np.load('inception_features_test.npz')['features']
#requires the number of hotdog and nothotdog samples to be the exact same
test_data_type_count = int(len(test_data) / 3)
test_labels = np.array(
[0] * test_data_type_count +
[1] * test_data_type_count +
[2] * test_data_type_count
)
# Convert labels to one-hot vectors (probability distributions w/
# probability 1 assigned to the correct label)
train_labels = keras.utils.to_categorical(train_labels)
test_labels = keras.utils.to_categorical(test_labels)
return (train_data, train_labels, test_data, test_labels)
Convolutional Neural Network¶
Now, we get to the heart of our machine learning model! After being passed through InceptionV3, the feature-extracted version of the image is passed through our own convolutional network. This CNN was largely taken from J-Yash's open-source code, but we added a regularizer (reg_param
), which notably can be inputted from the command line. There are five main convolutional blocks, followed by three fully connected layers. All of our filters are $3 x 3$ matrices. In the first convolutional block, we train $16$ filters; in our second block, we train $32$ filters, max-pool, and regularize via dropout method. In the third block, we train on $64$ filters and regularize via dropout method. In the fourth block, we train $128$ filters and regularize dropout method. In the fifth block, we train $256$ filters twice, before max pooling and doing dropout. We then flatten the data to feed into the fully connected layers, which ultimately output three values, each one representing a different food category. We use a softmax
activation for the final output, while we use ReLu
for all other layers.
def build_conv_net(reg_param, train_data_shape):
#train_data_shape = train_data.shape[1:] = (8, 8, 2048)
model = Sequential()
#convolutional layer with 32 3x3 trainable filters, using rectified linear units.
#Padding to result in the same shape as the original picture. Add reg_param if true
#use l2 regularization if reg_param is given in command line
model.add(Conv2D(
16, (3, 3),
activation='relu',
input_shape=train_data_shape,
padding='same',
kernel_regularizer=l2(reg_param)
))
# # max pooling for noise reduction
# model.add(MaxPooling2D(pool_size=(2, 2)))
# second block
model.add(Conv2D(
32, (3, 3),
activation='relu',
padding='same',
kernel_regularizer=l2(reg_param)
))
# max pooling for noise reduction
model.add(MaxPooling2D(pool_size=(2, 2)))
# dropout for more regularization
model.add(Dropout(0.25))
# third block
model.add(Conv2D(
64, (3, 3),
activation='relu',
padding='same',
kernel_regularizer=l2(reg_param)
))
# model.add(Conv2D(
# 64, (3, 3),
# activation='relu',
# padding='same',
# kernel_regularizer=l2(reg_param)
# ))
# max pooling for noise reduction
# model.add(MaxPooling2D(pool_size=(2, 2)))
# dropout for more regularization
model.add(Dropout(0.25))
# fourth block
model.add(Conv2D(
128, (3, 3),
activation='relu',
padding='same',
kernel_regularizer=l2(reg_param)
))
# model.add(Conv2D(
# 128, (3, 3),
# activation='relu',
# padding='same',
# kernel_regularizer=l2(reg_param)
# ))
# # max pooling for noise reduction
# model.add(MaxPooling2D(pool_size=(2, 2)))
# dropout for more regularization
model.add(Dropout(0.25))
#fifth block
model.add(Conv2D(
256, (3, 3),
activation='relu',
padding='same',
kernel_regularizer=l2(reg_param)
))
model.add(Conv2D(
256, (3, 3),
activation='relu',
padding='same',
kernel_regularizer=l2(reg_param)
))
# model.add(Conv2D(
# 256, (3, 3),
# activation='relu',
# padding='same',
# kernel_regularizer=l2(reg_param)
# ))
# max pooling for noise reduction
model.add(MaxPooling2D(pool_size=(2, 2)))
# dropout for more regularization
model.add(Dropout(0.25))
# fully connected layer
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
#4 categories
model.add(Dense(3, activation = 'softmax'))
return model
Ready to go¶
The main()
function is mainly for the command line, but it can also be run from Jupyter if the right lines are commented out. Notably, we use the Adam optimizer, we calculate loss based on categorical cross-entropy, and we save the MSE to plot. We also have a checkpointer, so that we only save the best model after multiple epochs. The best model is defined the model in which the validation MSE is the lowest, not the training error. We have a 20/80% training vs. validation split, which is generally a good rule of thumb, and we choose $8$ epochs since the accuracy generally peaks within the first $5$ epochs... The saved model is saved as a .h5
and .json
file under creatica/code/model
, so that we can load it in another module. Finally, we output a graph of the loss and accuracy metrics, compared between training and validation sets. The image is actually saved to the /creatica/code/img
folder, but it also pops up when the script is run from the command line.
def main():
# comment this out when running from command line!
model_name = 'demo'
regularizer_strength = .00001
# # comment out when not running from cmdline
# ## get cmdline args
# args = parse_args()
# model_name = args.model_name
# # get regularization strength, if defined. Otherwise, it is 0
# regularizer_strength = args.regularizer_strength
# Remove src from cwd if necessary
cwd = os.getcwd()
if os.path.basename(cwd) == 'src': cwd = os.path.dirname(cwd)
# Create img directory to save images if needed
os.makedirs(os.path.join(cwd, 'img'), exist_ok=True)
plot_fname = os.path.join(cwd, 'img', '%s_learn.png' % model_name)
# Create model directory to save models if needed
os.makedirs(os.path.join(cwd, 'model'), exist_ok=True)
model_weights_fname = os.path.join(cwd, 'model', model_name + '.h5')
model_json_fname = os.path.join(cwd, 'model', model_name + '.json')
# Importing the hotdog dataset
#may take a few seconds
(train_data, train_labels, test_data, test_labels) = get_data()
train_data_shape = train_data.shape[1:]
# build model
model = build_conv_net(regularizer_strength, train_data_shape)
# Print a summary of the layers and weights in the model
model.summary()
# Have our model minimize the binary cross entropy loss with the adam
# optimizer (fancier stochastic gradient descent that converges faster)
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['MSE'])
# #set checkpointer to use in callback, to only keep the best model weights
checkpointer = ModelCheckpoint(
filepath='/Users/victorialiu/git/creatica/tmp',
verbose=1,
save_weights_only=True,
)
#time to fit
history = model.fit(train_data, train_labels,
epochs=8,
batch_size=batch_size,
validation_split=0.2,
validation_data=(train_data, train_labels),
verbose=2,
callbacks=[checkpointer],
shuffle=True)
#load best model
model.load_weights('/Users/victorialiu/git/creatica/tmp')
# Save model weights and json spec describing the model's architecture
model.save(model_weights_fname)
model_json = model.to_json()
with open(model_json_fname, 'w') as f:
f.write(simplejson.dumps(simplejson.loads(model_json), indent=4))
# # Plot accuracy learning curve
# ax1 = plt.subplot(2, 1, 1)
# plt.plot(history.history['accuracy'])
# plt.plot(history.history['val_accuracy'])
# plt.title('%s accuracy' % model_name)
# plt.ylabel('Accuracy')
# plt.xlabel('Epoch')
# plt.legend(['Train', 'Validation'], loc='lower right')
# plt.savefig(plot_fname)
# # Plot loss learning curve
# plt.subplot(2, 1, 2, sharex=ax1)
# plt.plot(history.history['loss'])
# plt.plot(history.history['val_loss'])
# plt.title('%s loss' % model_name)
# plt.ylabel('Loss')
# plt.xlabel('Epoch')
# plt.legend(['Train', 'Validation'], loc='upper right')
# Plot accuracy learning curve
ax1 = plt.subplot(2, 1, 1)
plt.plot(history.history['MSE'])
plt.plot(history.history['val_MSE'])
plt.title('%s accuracy' % model_name)
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.savefig(plot_fname)
# Plot loss learning curve
plt.subplot(2, 1, 2, sharex=ax1)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('%s loss' % model_name)
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.tight_layout()
plt.savefig(plot_fname)
plt.show()
return True
main()
Found 2508 non-validated image filenames belonging to 3 classes. 157/157 [==============================] - 100s 640ms/step Found 150 non-validated image filenames belonging to 3 classes. 10/10 [==============================] - 5s 494ms/step Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_188 (Conv2D) (None, 8, 8, 16) 294928 _________________________________________________________________ conv2d_189 (Conv2D) (None, 8, 8, 32) 4640 _________________________________________________________________ max_pooling2d_8 (MaxPooling2 (None, 4, 4, 32) 0 _________________________________________________________________ dropout (Dropout) (None, 4, 4, 32) 0 _________________________________________________________________ conv2d_190 (Conv2D) (None, 4, 4, 64) 18496 _________________________________________________________________ dropout_1 (Dropout) (None, 4, 4, 64) 0 _________________________________________________________________ conv2d_191 (Conv2D) (None, 4, 4, 128) 73856 _________________________________________________________________ dropout_2 (Dropout) (None, 4, 4, 128) 0 _________________________________________________________________ conv2d_192 (Conv2D) (None, 4, 4, 256) 295168 _________________________________________________________________ conv2d_193 (Conv2D) (None, 4, 4, 256) 590080 _________________________________________________________________ max_pooling2d_9 (MaxPooling2 (None, 2, 2, 256) 0 _________________________________________________________________ dropout_3 (Dropout) (None, 2, 2, 256) 0 _________________________________________________________________ flatten (Flatten) (None, 1024) 0 _________________________________________________________________ dense (Dense) (None, 512) 524800 _________________________________________________________________ dropout_4 (Dropout) (None, 512) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 131328 _________________________________________________________________ dropout_5 (Dropout) (None, 256) 0 _________________________________________________________________ dense_2 (Dense) (None, 3) 771 ================================================================= Total params: 1,934,067 Trainable params: 1,934,067 Non-trainable params: 0 _________________________________________________________________ Epoch 1/8 Epoch 00001: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.6362 - MSE: 0.1225 - val_loss: 1.3662 - val_MSE: 0.3387 Epoch 2/8 Epoch 00002: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.5295 - MSE: 0.1032 - val_loss: 1.4071 - val_MSE: 0.3562 Epoch 3/8 Epoch 00003: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.5986 - MSE: 0.1080 - val_loss: 1.0744 - val_MSE: 0.2654 Epoch 4/8 Epoch 00004: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.4734 - MSE: 0.0935 - val_loss: 1.1686 - val_MSE: 0.2983 Epoch 5/8 Epoch 00005: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 8s - loss: 0.4455 - MSE: 0.0902 - val_loss: 1.3958 - val_MSE: 0.3653 Epoch 6/8 Epoch 00006: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.4440 - MSE: 0.0911 - val_loss: 1.2252 - val_MSE: 0.3160 Epoch 7/8 Epoch 00007: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.4698 - MSE: 0.0933 - val_loss: 1.3832 - val_MSE: 0.3559 Epoch 8/8 Epoch 00008: saving model to /Users/victorialiu/git/creatica/tmp 126/126 - 7s - loss: 0.4304 - MSE: 0.0886 - val_loss: 1.3510 - val_MSE: 0.3537
True
Authors: Victoria Liu and Gloria Liu
Last modified: November 2020
Description: A script to train and save a neural net to recognize hot dogs vs. non-hot dogs.
Credits Parts of the code are originally part of a Caltech extra credit assignment (CS 156a), where Aadyot Bhatnagar wrote the parse_args() and main() functions. The conv-net code is heavily modified from J-Yash's open-source code. All markdown is done by Victoria / Gloria.