Overview
CLIP (Contrastive Language–Image Pre-training) is a new neural network introduced by OpenAI. There is a very detailed paper talking about it and you can go through it if you are interested. It is claimed that CLIP's performance is much more representative of how it will fare on datasets that measure accuracy in different, non-ImageNet settings.

ResNext is a simple, highly modularized network architecture for image classification. The network is constructed by repeating a building block that aggregates a set of transformations with the same topology.
Without going into any theory, I am going to use both models and perform some image classification testing using Jupyter Notebook and Google Colab.
Google Colab
From Google Colab, open the notebook available in this repository.

For better performance, ensure that you change the runtime type to GPU or TPU under Runtime -> Change runtime type in Google Colab.

Project and Library Setup
Let's install the Python libraries and clone the repository to download additional Python files and the images that will be used for the testing.
Since the Colab virtual machine comes with PyTorch and cudatoolkit pre-installed, I will not be installing them again.

As you can see from the screenshot above, the current CUDA version is 10.1
Import Libraries and Pre-trained Models
Let's import the required libraries used by the notebook.
For the CLIP pre-trained model, I download it from the OpenAI site using the provided CLIP code snippet.
For the ResNext pre-trained model, I use the model from PyTorch Hub.

Prediction using CLIP and ResNext on ImageNet Classes
I implemented 2 methods — predict_clip and predict_resnext using the 1000 ImageNet classes. Both methods return the top 5 probable classes.

Image Classification Testing
Using a combination of different images, I performed a test using both prediction methods.
Using a simple panda image, both models are able to predict correctly.

And here is the test for other images.

Just by this quick test and the observation from the results, it seems CLIP is able to make better predictions for unseen object categories. However, CLIP is taking a much longer time to come out with the prediction.
Also check out the following articles to see how we can host machine learning models using Streamlit and FastAPI, including ResNext.
And more articles below on practical usages of machine learning.