Image Classification: CLIP and ResNext

Overview

alpha2phi

The Startup

· ~3 min read · January 13, 2021 (Updated: December 27, 2021) · Free: No

Overview

CLIP (Contrastive Language–Image Pre-training) is a new neural network introduced by OpenAI. There is a very detailed paper talking about it and you can go through it if you are interested. It is claimed that CLIP's performance is much more representative of how it will fare on datasets that measure accuracy in different, non-ImageNet settings.

Image from CLIP

ResNext is a simple, highly modularized network architecture for image classification. The network is constructed by repeating a building block that aggregates a set of transformations with the same topology.

Without going into any theory, I am going to use both models and perform some image classification testing using Jupyter Notebook and Google Colab.

Google Colab

From Google Colab, open the notebook available in this repository.

Google Colab — Open Notebook from GitHub

For better performance, ensure that you change the runtime type to GPU or TPU under Runtime -> Change runtime type in Google Colab.

Google Colab — GPU Runtime

Project and Library Setup

Let's install the Python libraries and clone the repository to download additional Python files and the images that will be used for the testing.

Since the Colab virtual machine comes with PyTorch and cudatoolkit pre-installed, I will not be installing them again.

Project and Library Setup

As you can see from the screenshot above, the current CUDA version is 10.1

Import Libraries and Pre-trained Models

Let's import the required libraries used by the notebook.

For the CLIP pre-trained model, I download it from the OpenAI site using the provided CLIP code snippet.

For the ResNext pre-trained model, I use the model from PyTorch Hub.

Import Libraries and Pre-trained Models

Prediction using CLIP and ResNext on ImageNet Classes

I implemented 2 methods — predict_clip and predict_resnext using the 1000 ImageNet classes. Both methods return the top 5 probable classes.

Prediction Methods in CLIP and ResNext

Image Classification Testing

Using a combination of different images, I performed a test using both prediction methods.

Using a simple panda image, both models are able to predict correctly.

Prediction made by CLIP and ResNext

And here is the test for other images.

CLIP and ResNext Image Classification Test

Just by this quick test and the observation from the results, it seems CLIP is able to make better predictions for unseen object categories. However, CLIP is taking a much longer time to come out with the prediction.

Also check out the following articles to see how we can host machine learning models using Streamlit and FastAPI, including ResNext.

Serving Machine Learning Models (DCGAN, PGAN, ResNext) using FastAPI and Streamlit

Overview

medium.com

And more articles below on practical usages of machine learning.

Generate Image from Text

Text to image using Jupyter Notebook on Google Colab.

medium.com

YOLO using FastAPI WebSocket and React

Overview

medium.com

RPA and Web Scraping using Jupyter

Overview

medium.com

Serverless Machine Learning APIs using Lambda and EFS

Overview

medium.com

#machine-learning #python #jupyter-notebook #programming #deep-learning

< Go to the original

Image Classification: CLIP and ResNext

Overview

Overview

Google Colab

Project and Library Setup

Import Libraries and Pre-trained Models

Prediction using CLIP and ResNext on ImageNet Classes

Image Classification Testing

Serving Machine Learning Models (DCGAN, PGAN, ResNext) using FastAPI and Streamlit

Overview

Generate Image from Text

Text to image using Jupyter Notebook on Google Colab.

YOLO using FastAPI WebSocket and React

Overview

RPA and Web Scraping using Jupyter

Overview

Serverless Machine Learning APIs using Lambda and EFS

Overview

Reporting a Problem