top of page


Theodore Murphy
Theodore Murphy

Hands-On Deep Learning For Images With TensorFl...

In this codelab, you will learn how to build and train a neural network that recognises handwritten digits. Along the way, as you enhance your neural network to achieve 99% accuracy, you will also discover the tools of the trade that deep learning professionals use to train their models efficiently.

Hands-On Deep Learning for Images with TensorFl...


The neural network we will build classifies the handwritten digits in their 10 classes (0, .., 9). It does so based on internal parameters that need to have a correct value for the classification to work well. This "correct value" is learned through a training process which requires a "labeled dataset" with images and the associated correct answers.

Data is stored in matrices. A 28x28 pixel grayscale image fits into a 28x28 two-dimensional matrix. But for a color image, we need more dimensions. There are 3 color values per pixel (Red, Green, Blue), so a three-dimensional table will be needed with dimensions [28, 28, 3]. And to store a batch of 128 color images, a four-dimensional table is needed with dimensions [128, 28, 28, 3].

If all the terms in bold in the next paragraph are already known to you, you can move to the next exercise. If your are just starting in deep learning then welcome, and please read on.

A neural network classifier is made of several layers of neurons. For image classification these can be dense or, more frequently, convolutional layers. They are typically activated with the relu activation function. The last layer uses as many neurons as there are classes and is activated with softmax. For classification, cross-entropy is the most commonly used loss function, comparing the one-hot encoded labels (i.e. correct answers) with probabilities predicted by the neural network. To minimize the loss, it is best to choose an optimizer with momentum, for example Adam and train on batches of training images and labels.

batch or mini-batch: training is always performed on batches of training data and labels. Doing so helps the algorithm converge. The "batch" dimension is typically the first dimension of data tensors. For example a tensor of shape [100, 192, 192, 3] contains 100 images of 192x192 pixels with three values per pixel (RGB).

.cache caches the dataset in RAM. This is a tiny dataset so it will work. .shuffle shuffles it with a buffer of 5000 elements. It is important that training data are well shuffled. .repeat loops the dataset. We will be training on it multiple times (multiple epochs). .batch pulls multiple images and labels together into a mini-batch. Finally, .prefetch can use the CPU to prepare the next batch while the current batch is being trained on the GPU.

It turns out that deep neural networks with many layers (20, 50, even 100 today) can work really well, provided a couple of mathematical dirty tricks to make them converge. The discovery of these simple tricks is one of the reasons for the renaissance of deep learning in the 2010's.

Increase the learning rate from its default value of 0.001 to 0.01. For that, you will have to replace the 'adam' predefined optimizer with an actual instance of an Adam optimizer so that you have access to its configuration parameters:

Dropout is one of the oldest regularization techniques in deep learning. At each training iteration, it drops random neurons from the network with a probability p (typically 25% to 50%). In practice, neuron outputs are set to 0. The net result is that these neurons will not participate in the loss computation this time around and they will not get weight updates. Different neurons will be dropped at each training iteration.

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.[4][5]

Starting in 2011, Google Brain built DistBelief as a proprietary machine learning system based on deep learning neural networks. Its use grew rapidly across diverse Alphabet companies in both research and commercial applications.[12][13] Google assigned multiple computer scientists, including Jeff Dean, to simplify and refactor the codebase of DistBelief into a faster, more robust application-grade library, which became TensorFlow.[14] In 2009, the team, led by Geoffrey Hinton, had implemented generalized backpropagation and other improvements which allowed generation of neural networks with substantially higher accuracy, for instance a 25% reduction in errors in speech recognition.[15]

In order to assess the performance of machine learning models, TensorFlow gives API access to commonly used metrics. Examples include various accuracy metrics (binary, categorical, sparse categorical) along with other metrics such as Precision, Recall, and Intersection-over-Union (IoU).[37]

Google JAX is a machine learning framework for transforming numerical functions.[69][70][71] It is described as bringing together a modified version of autograd (automatic obtaining of the gradient function through differentiation of a function) and TensorFlow's XLA (Accelerated Linear Algebra). It is designed to follow the structure and workflow of NumPy as closely as possible and works with TensorFlow as well as other frameworks such as PyTorch. The primary functions of JAX are:[69]

This guide trains a neural network model to classify images of clothing, like sneakers and shirts. It's okay if you don't understand all the details; this is a fast-paced overview of a complete TensorFlow program with the details explained as you go.

Created by the Google Brain team and initially released to the public in 2015, TensorFlow is an open source library for numerical computation and large-scale machine learning. TensorFlow bundles together a slew of machine learning and deep learning models and algorithms (aka neural networks) and makes them useful by way of common programmatic metaphors. It uses Python or JavaScript to provide a convenient front-end API for building applications, while executing those applications in high-performance C++.

TensorFlow, which competes with frameworks such as PyTorch and Apache MXNet, can train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential equation)-based simulations. Best of all, TensorFlow supports production prediction at scale, with the same models used for training.

Python is the most popular language for working with TensorFlow and machine learning generally. But JavaScript is now also a first-class language for TensorFlow, and one of JavaScript's massive advantages is that it runs anywhere there's a web browser.

The single biggest benefit TensorFlow provides for machine learning development is abstraction. Instead of dealing with the nitty-gritty details of implementing algorithms, or figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall application logic. TensorFlow takes care of the details behind the scenes.

The TensorBoard visualization suite lets you inspect and profile the way graphs run by way of an interactive, web-based dashboard. A service, (hosted by Google), lets you host and share machine learning experiments written in TensorFlow. It's free to use with storage for up to 100M scalars, 1GB of tensor data, and 1GB of binary object data. (Note that any data hosted in is public, so don't use it for sensitive projects.)

TensorFlow competes with a slew of other machine learning frameworks. PyTorch, CNTK, and MXNet are three major frameworks that address many of the same needs. Let's close with a quick look at where they stand out and come up short against TensorFlow:

Avro2TF is just one of the tools the company has donated to the community as part of its internal deep learning initiatives that have been applied to its recommendation and search artificial intelligence (AI) systems. LinkedIn said it's on a mission to democratize machine learning.

"One of the important lessons we have learned from this journey is the importance of providing good deep learning platforms that help our modeling engineers become more efficient and productive," the LinkedIn engineering team said in a blog post today (April 4). "Avro2TF is part of this effort to reduce the complexity of data processing and improve the velocity of advanced modeling."

"Based on the feedback from our users on the LinkedIn ML vertical teams, we needed a scalable solution focused on scalable data conversion," the team said. "More specifically, we needed a solution that converted our LinkedIn data types (e.g., sparse vector, dense vector, etc.) into a deep learning format (i.e., tensors)."

You can work with an editor and the command line and you often want to do that but, Jupyter notebooks are great for doing machine learning development work. In order to get Jupyter notebook to work the way you want with this new TensorFlow environment you will need to add a "kernel" for it.

That MNIST digits training example was a model with 1.2 million training parameters and a dataset with 60,000 images. **It took 80 seconds utilizing the NVIDIA GTX 980 on my old test system! For reference it took 1345 seconds using all cores at 100% on the Intel i7-4770 CPU in that machine. That's an 17 fold speedup on the GPU. That's why you use GPU's for this stuff!**

Unfortunately, properly explaining how and why a convolutional neural net work would make this post twice as long. If you want to understand convnets work, I suggest checking out cs231n and then colah. For any non-dl people who are reading this, the best summary I can give of a CNN is this: An image is a 3D array of pixels. A convolutional layer is where you have a neuron connected to a tiny subgrid of pixels or neurons, and use copies of that neuron across all parts of the image/block to make another 3d array of neuron activations. A max pooling layer makes a block of activations spatially smaller. Lots of these stacked on top of one another can be trained with gradient descent and are really good at learning from images. 041b061a72




bottom of page