Binary Classification (Cats vs Dogs) Tutorial using TensorFlow2 and Keras

The "Hello World" program of Deep learning is the classification of the Cat and Dog and in this article we would going through each and…

Apoorv Gupta

~5 min read · July 8, 2020 (Updated: December 15, 2021) · Free: Yes

Using TensorFlow2 and Keras to perform Binary Classification (Cats vs Dogs)

The "Hello World" program of Deep learning is the classification of the Cat and Dog and in this article we would be going through each and every step of successfully creating a Binary Classifier. So, let's get started!

First of all we need a dataset to perform the classification and for that purpose we would go to Kaggle and search for one. The dataset which we are going to use can be found at: https://www.kaggle.com/chetankv/dogs-cats-images

After downloading the dataset and extract the contents from the zip file, we would be creating a python file (.py) and start with the coding part.

We will follow the 3-phase Rule in order to successfully complete the coding part which are Exploration, Training and Testing. In Exploration phase we will go through the data which we have downloaded and make relevant changes if needed at any point and after that we will move on the Training Phase where we would be training our model with the help of Keras. Finally, in Testing Phase we would be Testing our model against some unknown images and check how accurately our model can classifies dogs and cats.

Exploring The Data

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from matplotlib.image import imread
import os
import math
from sklearn.metrics import classification_report,confusion_matrix

After importing these libraries we will specify the path for the data directory and also for test data and train data.

data_dir = 'PATH TO DATA' 
os.listdir(data_dir)  
test_p = data_dir+'/test_set/'
train_p = data_dir+'/training_set/'

We need to make sure that all the images have same have dimensions and for that we would be first initialising two empty arrays where would be storing the dimensions of each image and then finally check if all the dimensions are same.

dimension1 = []
dimension2 = []
for i_file in os.listdir(test_p+'cats'):
	img= imread(test_p+'cats/'+i_file)
	x,y,colors = img.shape
	dimension1.append(x)
	dimension2.append(y)

In order to get the same dimensions for all the images we would use the concept of np.mean() to calculate the mean value and apply it to every image in the image_shape variable that we have defined. . One thing that you should be familiar with is that the last parameter of image_shape which we have defined as 3, it basically means that we have colored images also which are made of the RGB. In case we are working with black and white images, we would have gone for 1.

print(np.mean(dimension1))
print(np.mean(dimension2)) 
image_shape = (math.ceil(np.mean(dimension1)),math.ceil(np.mean(dimension2)),3)

We also want to make sure that our final model should be tough enough to deal with the situations where it hasn't seen a particular image before and for that purpose we will be using the technique of data manipulation which includes resizing , rotating and scaling our images. For this purpose we would be using ImageDataGenerator. This would us allow to generate more data automatically without having to grab more data from different sources.

image_gen =
ImageDataGenerator(rotation_range=10,width_shift_range=0.10,height_shift_range=0.10,rescale=1/255)
print(image_gen.flow_from_directory(train_p))
print(image_gen.flow_from_directory(test_p))

The first parameter which we have defined is the rotation_range which allows us to rotate the images up to a certain limit. We also used width_shift_range feature which will shift the width of the picture by some specified percentage and height_shift_range which will stretch out the images . Finally in order to rescale the images we used the rescale feature which is responsible for rescaling the images to values between 0 and 1 if we had values between 1 and 255. Now there are other features also such as zoom_range which could zoom into the image and sheer_range which would help us in cutting off a portion of the image.

We can have a look at it by call random_transform() on the image_gen. For the next step we already have all the images in different folders representing each class, so we could go ahead with flow_from_directory() which is responsible for generating batches of the augmented data.

Training Phase

Finally, we can now define and train our model. To start with this, we will have to define the type of model and in this case we are going to use the Sequential model from Keras which is just a list of layers we define.

model = Sequential()

After specifying the model, we will start inserting the layers. First of all we will add a Conv2D layer where we four main parameters:

filters: The common way to predict the filter is the complexity of the tasks that your are performing. This is generally in the power of 2 i.e. 32, 64, 128 etc. For now we will go with 32.
kernel_size: It also depends on the type of the data you are performing on. A typical recommendation is to start with (4,4).
input_shape: This determines the shape of the input image and we will assign the image_shape variable which we had defined earlier.
activation: We need to specify the activation function we are gonna use and for this purpose we will use 'relu' which is Rectified Linear Unit.

Next layer would be MaxPool2D() where we have only one parameter to define which is pool size.

model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))

After this series of Conv2D layer and MaxPool2D layers, we will have to flatten out the images in order to get a single array of the Data Points and add a Dense Layer of 128 neurons with 'relu' activation function.

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))

In order to prevent overfitting we would make use of Dropout layer where we would be turning off half of neurons randomly and after that add another Dense Layer with 1 neuron with sigmoid function since we have only one output.

model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

Now we are ready to compile the model where would we be choosing 'binary_crossentropy' as loss and 'adam' as our optimser. We also need to make sure that our model doesn't overfit while performing the iterative method of training and for that purpose we will use the process of EarlyStopping and define it using the variable early_stop.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
early_stop = EarlyStopping(monitor='val_loss',patience=2)

After this we will define the batch_size which in our case if 16 and then create two generators from above i.e. train_gen and test_gen using the flow_from_directory method. We can now advance to the final step which is model.fit_generator which will train our model and hence we can save it to make the predictions afterwards.

batch_size = 16
train_image_gen = image_gen.flow_from_directory(train_p, target_size=image_shape[:2], color_mode='rgb', batch_size=batch_size, class_mode='binary')
test_image_gen = image_gen.flow_from_directory(test_p, target_size=image_shape[:2], color_mode='rgb', batch_size=batch_size, class_mode='binary',shuffle=False)
train_image_gen.class_indices
results = model.fit_generator(train_image_gen,epochs=20, validation_data=test_image_gen, callbacks=[early_stop])
model.save('catdog.h5')

Testing Phase

In order to evaulate the performance of out model we have to use the load_model and load the model if you are using a different file.

pred_probabilities = model.predict_generator(test_image_gen)
predictions = pred_probabilities > 0.5
print(classification_report(test_image_gen.classes,predictions))
print(confusion_matrix(test_image_gen.classes,predictions))

After that we defined a variable called 'predict' which would predict the category of the test images. Initially it would just return the probability which would be between 0 and 1. But we don't have to worry for that because we have sklearn for it and from which we could import classification_report and confusion_matrix which would give us a detailed report about performance.

I hope you had a good time understanding all the things!