Introduction to Computer Vision with Deep Learning
- Published on
MNIST Data: A real-life test
Sooooooo !!! We all see MNSIT data set as the Hello World for data science, but I wanted to take it a step further by creating a small tutorial for all the beginners to see tangible results of their model.
The above video is accompanied by a small break down of the code. Also if you want to download the code follow this right here ( https://github.com/anmolchawla/MNSIT-Webcam-Tutorial), if you just want to play and test the code for yourself, it is hosted on a free cloud server by google (https://colab.research.google.com/drive/1KTYo80VpJ6VCwa558NZxbdP1UIkv3OGV), just click on it and it will open in your browser for you to play around.
Let’s get cracking.
Step 1: Import the modules needed
import tensorflow as tf
import keras
from tensorflow.keras.callbacks import TensorBoard
import time
import numpy as np
import matplotlib.pyplot as plt
import cv2
Tensorflow: TensorFlow™ is an open source software library for high-performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. (Think of it as the brains behind all the complex computation )
Keras: Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, or Theano. (Think of it as a library which saves you from writing repetitive code and makes your life easy by making the code simpler and readable)
Numpy: NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Matplot lib allows you to view images and plot of your data. (Let’s you have a visual element to all those numbers) and cv2 allows you to read an image like an array.
mnist = tf.keras.datasets.mnist #28*28 image of handwritten of 0-9
(x_train, y_train),(x_test,y_test) = mnist.load_data()
We just loaded the data as it is one of the many data sets available for us to play around with.
We split it up in test and train. Train: The data that the model uses to learn and Test: The data that the model will use to check it’s own performance.
print("Training Data Shape is {}".format(x_train.shape))
print("Training Labels Shape is {}".format(y_train.shape))
print("Testing Data Shape is {}".format(x_test.shape))
print("Testing Labels Shape is {}".format(y_test.shape))
So all the images are in a 28*28 format, 28 rows and 28 columns and each data point have a label associated with it (It could be anything from 0–9). We call the x as the feature set (i.e the data ) and y as the target (i.e the label associated with it)
for i in range(0,20):
plt.imshow(x_train[i], cmap = plt.cm.binary)
plt.show()
Just print a few data points to see what are we actually dealing with.
x_train = tf.keras.utils.normalize(x_train, axis = 1)
x_test = tf.keras.utils.normalize(x_test,axis = 1)
Here we normalize data. Why do we normalize you ask ? Good question here is an article which goes in depth (https://www.quora.com/Why-do-we-normalize-the-data). For the lazy people out there. Normalize basically means putting something which could range from 0–1000, we can put it in the range 0–1, For example, assume your input dataset contains one column with values ranging from 0 to 1, and another column with values ranging from 10,000 to 100,000. The great difference in the scale of the numbers could cause problems when you attempt to combine the values as features during modelling.
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10,activation=tf.nn.softmax)) # softmax for probability distribution
model.compile(optimizer = "adam" , loss = 'sparse_categorical_crossentropy' , metrics = ['accuracy'] )
model.fit(x_train,y_train,epochs = 3,callbacks = [tensorboard])
predictions = model.predict([x_test])
So this is where the heart of the system comes in.
Sequential: Basically a series of layer one after the other in the order specified.
Flatten: Taking an N-dimensional array and converting it into a single long continuous 1-D array.
Dense Layer: A linear operation in which every input is connected to every output by a weight (so there are n_inputs * n_outputs weights — which can be a lot!).
Activation function: So what does an artificial neuron do? Simply put, it calculates a “weighted sum” of its input, adds a bias and then decides whether it should be “fired” or not ( yeah right, an activation function does this, but selecting the right one is the challenge).
Compile (Basically binds the model together and sets some parameter: like Optimizer: Loss: Metric:)
Fit: Basically introduces the data to the model and epochs decides how many times does it get introduced.
Prediction: So now that our model is trained we need to show it some new data to see if it able to predict the appropriate target.
I promise you this is the bare bone version and explanation of this Deep Learning application. Understanding the number of layers and other parameter settings are where the actual skills come in to play. There are a number of approaches one takes to get them (Trial and error, reading relevant research papers and of course having a strong conceptual understanding of the basics )
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
bw = gray.point(lambda x: 0 if x < 100 else 255, '1')
bw.save("bw_image.jpg")
bw
img_array = cv2.imread("bw_image.jpg", cv2.IMREAD_GRAYSCALE)
img_array = cv2.bitwise_not(img_array)
print(img_array.size)
plt.imshow(img_array, cmap = plt.cm.binary)
plt.show()
img_size = 28
new_array = cv2.resize(img_array, (img_size,img_size))
plt.imshow(new_array, cmap = plt.cm.binary)
plt.show()
user_test = tf.keras.utils.normalize(new_array, axis = 1)
predicted = model.predict([[user_test]])
a = predicted[0][0]
for i in range(0,10):
b = predicted[0][i]
print("Probability Distribution for",i,b)
print("The Predicted Value is",np.argmax(predicted[0]))
The above code comes after you have been able to take an image and store it in the same directory. The image that you took from your webcam or your phone.
It takes the image and converts into black and white, as numbers having nothing to do with colour in our use case, then we read the image in a binary format where we get the image in an array format. Then we resize the image into the 28*28 format and then flatten it. Now we have a neat and tidy input we ask the model to predict it.
The output of the model, in this case, will always be a probability distribution. So the probability will be spread over all the possible classes. The model says something like this (Hey Dude, I think that what you just showed me was a 1 with 80% per cent probability, 5 with a 0.01 per cent probability so and so forth till the cumulative probabilities add up to 1).
So, there you are. Congratulations on making to the end of the article.
Please feel free to leave comments on here and the youtube video
Let’s get in touch
Email: anmol.chawlatrojan@gmail.com
Adios! Until next time.