Skip to content

Convolutional Neural Network

What is Convolutional Neural Network ?

Convolutional Neural Network is a type of neural network that has such architecture that is most suitable for image processing, it has several different layer with one of the layers called “Convolutional”, while other layers important as well. “Convolutional Neural Network” used in the tasks where image “understanding” is important, such tasks might be as following: object detection and object labeling on image, image manipulation and transformation , some practical examples might be the security cameras that automatically finds the “wanted” person in the crowd, cameras on self-driving cars which process car awareness of the situation around the car in real-time, some medical image processing that identify disease on x-ray images.

It is not easy to explain such complicated topic as CNN in one article, so I advice you to use following youtube videos for better understanding:

 

 

Let’s create CNN !

Since CNN are used for image manipulation, we need some image database(dataset). Most common image dataset is so-called “MNIST dataset” which is a collection of numbers from 0 to 9 which are hand-written, there are 60000 images for network training and 10000 images for testing, images are black and white and has dimension 28×28 pixels each. Each picture of image obviously has label that represent it’s value in clear text manner which is needed for classification, training and testing of CNN network. The MNIST images look like this:

MNIST image dataset is already included in many ML libraries, so in out example you don’t have to download that MNIST dataset separately, you just have to load it to the code within your ML framework since dataset already on your PC if you use “keras” ML library.

Show me the code !

Here the full code of our Convolutional Neural Network that is trained on 60000 hand-written digits and able to correctly predict new hand-written digits with better than 99% accuracy. (the code mentioned below has only 5 epoch for faster training, even with only 5 epoch on training it is able to achieve almost 99% accuracy).


from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshaping to format which CNN expects (batch, height, width, channels)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1).astype('float32')

# normalize inputs from 0-255 to 0-1
X_train/=255
X_test/=255

# one hot encode
number_of_classes = 10
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)

# create model
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(X_train.shape[1], X_train.shape[2], 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(number_of_classes, activation='softmax'))

# Compile model
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=500)

# Final evaluation of the model
metrics = model.evaluate(X_test, y_test, verbose=0)

print("Metrics(Test loss & Test Accuracy): ")
print(metrics)


So let’s break the code for small pieces and understand each one separately.

Block below is to load into out python script all needed libraries that actually perform the work, while within our code we just operate with high-level API provided by libraries we just loaded:

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils

——————-

Line below is to load MNIST images into our code for further processing. X represents the images itself, Y represents the label for each image. As you can probably suggest, (X_train, y_train) – for CNN training, while (X_test, y_test) for testing, i.e. under “testing” we feed new images into trained network and evaluate how well trained network can recognize images, recognized images compared with existing labels for testing dataset so we can  understand the correctness of image recognition on test dataset.

(X_train, y_train), (X_test, y_test) = mnist.load_data()

——————-

Two lines below is to reshape images for the dimensions that lately used for processing. The input shape that CNN expects is a 4D array (batch, height, width, channels). Channels signify whether the image is grayscale or colored. In our case, we are using grayscale images so we give 1 for channels if these are colored images we give 3(RGB). Below code for reshaping our inputs.

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1).astype('float32')

——————-

Two code lines below is to reduce dimension of processed images to process images faster. Original black and white image coding use numbers from 0 to 255 to represent “256 shades of grey”, from pure white(0) to gray to pure black(255). By dividing by 255, we just reduce whole range of numbers to 0…1 range while still keeping all shades but not in 0…255 scale but on 0…1 scale, which reduce size of numbers to process on your PC, which in many cases makes processing just faster. This is not a mandatory code, this is optional and can be skipped.

X_train/=255
X_test/=255

——————-

Three code lines below used for 2 purposes: 1)to define number of output classes, i.e. 10 in out example that represents digits from 0 to 9, and 2)to convert those classes into so-called “one hot encoding”.

number_of_classes = 10
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)


“One hot encoding” is a technique used in machine learning to represent output labels not as “words” or “decimal digits”, but instead represent it in 0 and 1 only, but “One hot encoding” has nothing to do with “normal” binary format. You can imagine “One hot encoding” of output labels as a columns with 0, where number of columns equal to number of labels and the column that represent some label is marked by “1” instead of “0”, like following:

output label below
0 1 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0 0
5 0 0 0 0 0 1 0 0 0 0
6 0 0 0 0 0 0 1 0 0 0
7 0 0 0 0 0 0 0 1 0 0
8 0 0 0 0 0 0 0 0 1 0
9 0 0 0 0 0 0 0 0 0 1

——————-

Nine lines of code below is the CNN network architecture itself. Exactly this code defines how our network look like and what type of manipulation with images we are performing in our CNN network.

model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(X_train.shape[1], X_train.shape[2], 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(number_of_classes, activation='softmax'))


I will NOT explain network architecture in this article, but instead provide detailed network architecture in separate article just to keep article size in reasonable limits, otherwise one article might grow to such huge size that most people will not read it.
——————-

This one line of code below used to set properties of learning process, so we set metrics=[‘accuracy’] which means that our CNN within training phase set the target to maximize exactly that parameter ‘accuracy’, we also set optimizer=Adam() which is one of the most commonly used type of optimizer, we also set loss=’categorical_crossentropy’ which tells that there are several output labels (not a 0/1 labels in case of “binary”) and used a “crossentropy” loss function that is actually a logarithmic loss.

model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

——————-

This one simple line of code start the process of our CNN training. Within this command we pass input data for training(X_train), training labels(y_train, i.e. 0…9 numbers for each picture) and pass the validation dataset that used to calculate accuracy of trained network ((X_test, y_test)), we also tell the CNN to pass all training iterations five times (epochs=5), while batch_size=500 defines how many images we pass through network by 1 step.

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=500)

——————-

Code below creates “metrics” i.e. shows us how good the network is on test dataset.

metrics = model.evaluate(X_test, y_test, verbose=0)
print("Metrics(Test loss & Test Accuracy): ")
print(metrics)

——————-

That’s it, you have fully working CNN that used to recognize hand-written digits 0…9, it is able to do it with ~99% accuracy with current settings.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *