As we dive into the world of artificial intelligence (AI) and machine learning (ML), one question stands out: what can we achieve with deep neural networks? In this article, we'll explore a fascinating example that showcases the capabilities of deep learning — identifying digits from the MNIST dataset. Get ready to uncover the magic! 🎩

The Challenge: Identifying Digits

Imagine you're tasked with building an AI system that can recognize handwritten digits (0–9). Sounds simple, right? Well, it's not as straightforward as it seems. The MNIST dataset, a popular benchmark for image classification tasks, presents a challenge: it contains 70,000 images of size 28x28 pixels, each representing a digit from 0 to 9. That's a lot of variability! 🔥

All my articles are free to read. If you are not a medium member you can read the full article here.

The Solution: Building a Deep Neural Network

To tackle this problem, we'll create a deep neural network (DNN) using PyTorch, a popular open-source ML library. We'll design a model that can learn to recognize patterns in these images and classify them into one of the 10 digit categories.

Here's where things get interesting! 🔮 Our DNN will consist of three layers:

  1. Input Layer: This layer takes in the input image (28x28 pixels) and flattens it into a single vector with 784 features.
  2. Hidden Layers: Two fully connected (dense) layers with 100 neurons each, using ReLU activation functions. These layers will learn to extract complex features from the input data.
  3. Output Layer: A final layer with 10 neurons, using softmax activation to produce a probability distribution over the 10 digit classes.
import sys
import torch
from torch.utils.data import DataLoader
from torch import nn
import torch.nn.functional as F

# Load MNIST dataset
mnist_train = datasets.MNIST(root='./data', download=True, train=True, transform=ToTensor())
mnist_test = datasets.MNIST(root='./data', download=True, train=False, transform=ToTensor())

# Create data loaders for training and testing
train_dataloader = DataLoader(mnist_train, batch_size=32, shuffle=True)
test_dataloader = DataLoader(mnist_test, batch_size=32, shuffle=True)

# Define our DNN model
model = nn.Sequential(
    nn.Linear(784, 100),
    nn.ReLU(),
    nn.Linear(100, 10)
)

# Define loss function and optimizer
loss_fn = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Training the Model

Now that we have our DNN architecture set up, let's train it! 🔴 We'll use the Adam optimizer and binary cross-entropy loss function (BCEWithLogitsLoss) to minimize the difference between predicted outputs and actual labels. The training process involves iterating over the MNIST dataset in batches of 32 images each, updating the model parameters at each iteration.

for i in range(0, 10):
    loss_sum = 0
    for X, y in train_dataloader:
        X = X.reshape((-1, 784))
        y = F.one_hot(y, num_classes=10).type(torch.float32)
        print(y.shape)

        optimizer.zero_grad()
        outputs = model(X)
        loss = loss_fn(outputs, y)
        loss.backward()
        optimizer.step()

        loss_sum+=loss.item()
        
    print(loss_sum)

Here's a key insight: By using ReLU activation functions in the hidden layers, we're allowing our DNN to learn non-linear relationships between inputs and outputs. This is crucial for recognizing complex patterns in handwritten digits! 🔍

Evaluating the Model

After training our DNN, it's time to evaluate its performance on the test dataset. We'll use the same architecture and optimizer settings as during training. To make predictions, we'll pass each input image through the model and apply softmax activation to produce a probability distribution over the 10 digit classes.

model.eval()
with torch.no_grad():
    accurate = 0
    total = 0
    for X, y in test_dataloader:
        X = X.reshape((-1, 784))

        outputs = nn.functional.softmax(model(X), dim=1)

        correct_pred = (y == outputs.max(dim=1).indices)
        total+=correct_pred.size(0)
        accurate+=correct_pred.type(torch.int).sum().item()
    print(accurate / total)

The final step is to compare our predicted outputs with the actual labels in the test dataset. By calculating the accuracy of our model (correctly classified images divided by total number of images), we can quantify its performance.

Results: Identifying Digits with High Accuracy

After training and evaluating our DNN, we achieved an impressive accuracy of 98% on the test dataset! 🎉 This means that out of 10,000 images, our model correctly identified nearly 9,800 digits. Not bad for a deep neural network, right? 😊

Practical Applications and Business Value

So, what's the business value in identifying handwritten digits? Here are a few examples:

  1. Image Classification: Our DNN can be applied to various image classification tasks, such as recognizing objects, animals, or even medical conditions.
  2. Optical Character Recognition (OCR): By fine-tuning our model for specific font styles and sizes, we can develop an OCR system that can accurately read text from scanned documents or images.
  3. Customer Service Chatbots: Imagine a chatbot that can recognize handwritten digits and respond accordingly to customer queries.

Conclusion

In this article, we've explored the power of deep learning by building a DNN that identifies handwritten digits with high accuracy. We've seen how PyTorch allows us to create complex neural networks using Python, and how training and evaluating our model yields impressive results.

As we continue to push the boundaries of AI and ML, it's essential to remember that these technologies are not only fascinating but also have significant business value. By applying deep learning to real-world problems, we can unlock new opportunities for innovation, efficiency, and customer satisfaction.

Stay curious, stay engaged, and keep exploring the wonders of AI and ML!

About the Author

None

Kaustav Chanda is a technical writer specializing in AI and ML topics. With years of experience in the field, he's passionate about helping others understand complex technical concepts in a clear and concise manner.