What's So Good About Deep Learning?

I bet it's more powerful than you think…

Michael Yorke

The Startup

· ~7 min read · August 18, 2020 (Updated: December 15, 2021) · Free: No

I bet it's more powerful than you think…

The question comes up again and again; what actually is deep learning? To understand deep learning let's start with a simple use case to frame our explanation around. That use case is one of a self driving car.

A self driving car is an autonomous vehicle that drives itself without any human intervention. This driving is broken down into three primary activities: see, decide, act. For example I see a red light, I decide to stop, I put on the brakes. But in this case the car is doing everything on its own. So where does deep learning come into it? Let's focus on the "see" by explaining how deep learning works in the context of image recognition.

Image Recognition

When you take a picture of something, your brain processes all the information that it receives from the camera and converts it into an image. Deep learning is a subfield of machine learning that focuses on computer vision and that attempts to mimic the human brain's ability to recognise objects in images.

We can look at images in a two dimensional plane, but an object's three dimensional shape and position in space can be just as useful. Because of this, visual recognition can be broken up into classification and localisation. Classification identifies an object, while localisation finds the position of the object in the image.

For the sake of this explanation, we'll focus on localisation.

Here's an example image of a dog.

Yep that's a dog — https://medium.com/berrynet/run-object-detection-using-deep-learning-on-raspberry-pi-3-2-66f43609bc85

In this image, the dog is in a different location than it appears to be in the photo as it's position relative to where it's normally located plays an important role in recognising the object.

The nose is a region of high contrast that helps the computer identify the object. The ears and eyes also have some contrast making them easier to identify. The shape and texture of the fur plays a role as well. The two main components that come into play when the computer needs to figure out the position of the object in the image are the landmark data and scale.

Landmark data is a region of the image that contains multiple, distinct features that can be used to identify the location of the object. The eyes are a good landmark, as are the ears. The distance between these landmarks is used in combination with the scale of the image to determine the location of the object.

Here's another example of landmark data. In this image, the computer has determined that the right eye is at (5, 3), and the nose is at (4, 5). The left eye is at (1, 1).

Landmarking using a cute pooch — https://www.today.com/style/lost-found-app-uses-facial-recognition-technology-find-missing-pets-2D79501254

By taking the measurements of each landmark in the image, we can calculate a robust location for the nose.

Once the computer has located the nose, it uses this information to determine the scale of the image. In this case, the dog is about one third of the way into the photo from the left side. This provides a good starting point from which to measure the size of other areas of the image.

Let's look at how we might recognise the red truck.

Red truck object detection using landmarking and localisation — https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1182&context=ccee_pubs

Dog Classification

In order to classify the image, the computer needs additional information. The most important of these is a set of parameters that describe what the computer believes a dog to look like. These parameters are referred to as a classifier. A good classifier will have high accuracy, be fast to train, and small in memory and processing requirements.

Here's an example classifier. This classifier has a precision of 0.67 and a recall of 0.33. This means that out of all the times the dog was correctly identified, 67% of those times it was wrong. Only 33% of the time was it right.

This one looks like a dog — https://nerdist.com/article/dog-smile-ai-tool/

The computer's ability to classify an image as either a dog or not is based on what the classifier believes a dog to look like. In this example, the computer believes a dog to be some combination of four features. The shape of the ears, the curve of the back, the width of the forehead, and the rotation of the neck. These parameters are combined to give us a value between 0 and 1, with 0 being completely human and 1 being a dog. The classifier in this example believes a dog to be any value between 0.67 and 1.0. This wide range makes the classifier susceptible to outliers.

So what does this mean for self driving cars? In self driving, cars use many cameras to classify and localise objects around them such as other cars, trucks, traffic lights, trees, pedestrians and road markings. Most car self driving systems today use some form of optical character recognition (OCR) to identify the labels on the road. Labels such as "ONE WAY", "BUSY", and "CLEARED" are hard to identify using OCR, and are better represented by a visual system that can see and understand the words. Cameras use a classifier to detect these labels and take them into consideration when localising an object. If a car's classifier detects a label that it understands, then it will take that information into consideration when identifying other objects in the image.

What does a self driving car see? — https://techxplore.com/news/2020-05-deep-image-recognition-ability-self-driving.html

Once it has identified objects in the image this information is combined with other data such as speed and distance to other objects before a decision is made and an action is taken by the self driving car. For example, if a self driving car's classifier detects a pedestrian crossing on a road, it might predict that a collision is likely to occur.

If it does, additional rules in its programming will make the decision to stop and the car will act by using the brakes, hopefully in a way that is not dangerous to other road users. In the future, self driving cars may predict that an upcoming hazard might cause a car to swerve out of control. In this case, the car may predict that it might need to execute a backwards leap to avoid a dangerous situation. This prediction would be made based on the information it has about a car's height, speed, direction of travel and distance from potential hazards.

There are many different ways to classify and localise an object, each with their own advantages and disadvantages. Some popular techniques are:

Classification : Determines what object a photo depicts using some combination of appearance, location, movement and other attributes.
Feature Detection : Identifies the presence of a certain feature, such as the presence of a human face in an image.
Feature Localisation : Classifies a feature and determines its position in some coordinate system. For example, the position of a face in 3D space.

Out of all the techniques, feature detection and classification have had the most success in self driving cars. They are able to be used separately or in any combination by the software that powers self driving cars.

The feature detection and classification techniques can be sub divided into two groups: Supervised learning and Unsupervised learning. In supervised learning, the machine learns by example. It is fed a set of examples, each example containing data about an object, and the output is a class for that object. The machine is then fed new data, and given a new example. It is asked whether the new example matches a known class. For example, if the known classes are "cat", "dog", and "fish", then the new example is whether an image contains a cat, and the answer will be either yes or no.

The machine will improve over time by being fed more and more examples. For example, if a new image contains a cat, a dog, and a fish, the machine will learn that the new example represents a cat. This method works, but it can be very slow. It is also possible that the machine makes a mistake, for example thinking a tree is a plant.

In unsupervised learning, the machine doesn't learn from examples. Instead, the machine is pre determined about what a certain object looks like. This method may be much faster, but it is far more likely to make mistakes. Since supervised learning is too slow and prone to mistakes, almost all self driving cars use unsupervised learning. In self driving cars, unsupervised learning is performed on raw sensor data. This data is interpreted and combined in different ways to detect objects, spaces, and relationships between the three.

In conclusion, self driving cars use many different techniques to detect their surroundings. Some of these techniques, such as vision are better than others, unsupervised learning is often used despite its inherent flaws, and sensors can only detect features at a certain scale. The self driving car industry is still in its infancy, and there are many problems to be solved. But the experiments being performed today could result in a driverless car that can be on the market during your lifetime.

And finally, how powerful is deep learning? Powerful enough that a deep neural network wrote everything in the article you have just read that isn't in italics :) It's called GPT-3 developed by OpenAI, check it out

#deep-learning #ai #artificial-intelligence #openai #self-driving-cars

What's So Good About Deep Learning?

I bet it's more powerful than you think…

Image Recognition

Dog Classification

Reporting a Problem