I'm sure that anyone who has ever built a Deep Neural Network has used Keras. Most Neural Network architecture written in TensorFlow is primarily built upon Keras. Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. It was developed with a focus on enabling fast experimentation.

With the new Keras Core or Keras 3.0 there are some massive changes coming to the library. In this article, we are going to delve deeper into what's new and exciting in Keras 3.0 and understand it behind the scenes.

Table Of Contents

  • Making Keras multi-backend again
  • Keras vs. TensorFlow Core
  • Keras Architecture
  • A detailed look at Layer and Conv2D class internal structure
  • What has Keras 3.0 achieved?

Are you looking for AI content that's both original and insightful instead of repetitive and copy-pasted content? Want to delve deeper into the technological aspects rather than skimming through surface-level tips and tricks?

Discover the AIGuys Digest Newsletter — a breath of fresh air in the AI content landscape. Here, we prioritize substance over fluff, ensuring you get straight-to-the-point, in-depth insights into the ever-evolving world of Artificial Intelligence. Join us for a journey into the heart of AI, where quality content and groundbreaking updates await you.

Making Keras multi-backend again

Keras Core, or Keras 3.0 is a full rewrite of the Keras codebase that rebases it on top of a modular backend architecture. It makes it possible to run Keras workflows on top of arbitrary frameworks — starting with TensorFlow, JAX, and PyTorch.

Keras Core is also a drop-in replacement for tf.keras, with near-full backwards compatibility with tf.keras code when using the TensorFlow backend. In the vast majority of cases we can just start importing it via import keras_core as keras in place of from tensorflow import keras and the existing code will run with no issue — and generally with slightly improved performance, thanks to XLA compilation.

Not so long ago, Keras could run on top of Theano, TensorFlow, and CNTK (even MXNet!). In 2018, the decision was made to refocus Keras development exclusively on TensorFlow. At the time, TensorFlow was the only viable option available: Theano and CNTK had discontinued development. The added cost of supporting multiple backends was simply no longer worth it.

In 2023, this is no longer true. According to large-scale developer surveys such as the 2023 StackOverflow Developer Survey and the 2022 Kaggle Machine Learning & Data Science Survey (as well as other adoption metrics such as PyPI downloads, Conda downloads, and Colab import statistics, which all paint the same picture), TensorFlow has between 55% and 60% market share and is the top choice for production ML, while PyTorch has between 40% and 45% market share and is the top choice for ML research. At the same time, JAX, while having a much smaller market share, has been embraced by top players in generative AI such as Google DeepMind, Midjourney, Cohere, and more.

Keras vs. TensorFlow Core

How Keras completely transforms TensorFlow will be exactly clear in the below example. The below example provides a comparison of a basic Neural Network written in both Keras and TensorFlow Core.

Although TensorFlow Core gives much finer control over each variable, Keras gives ease of use and fast prototyping.

TensorFlow Core Implementation

import tensorflow as tf

# Placeholder for input and output
x = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
y = tf.placeholder(tf.float32, shape=[None, 10])

# First Convolutional Layer
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))
h_conv1 = tf.nn.relu(tf.nn.conv2d(x, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Second Convolutional Layer
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Densely Connected Layer
W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Readout Layer
W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]))
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

# Define loss and optimizer
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Session to run the training
# ...

Keras implementation

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential([
    Conv2D(32, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, kernel_size=(5, 5), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(1024, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Training the model is as simple as calling model.fit with training data
# ...

We can clearly see how Keras makes our lives so much easier :) All the big functions are combined into small lines, removing the hassle of dimension mismatch. Believe me, keeping track of dimensions is one of the biggest challenges for beginners and intermediate AI developers, especially when the networks start becoming bigger.

The amount of time saved and complexity reduced is massive, that's why Keras is one of the most loved and important AI libraries out there.

Keras Architecture

To understand the mechanism and architecture of Keras' we need to understand the Sequentialand Model classes, we need to delve into their design and how they function internally. These classes serve as the core of model building in Keras, providing a framework for assembling layers and defining the computational graph.

`Sequential` Model Class

The Sequential model is a linear stack of layers. It's a subclass of the Model class and is designed for simple cases where the model consists of a linear stack of layers with one input and one output.

None
Keras Sequential class

Key Characteristics:

1. Simplicity: The primary feature of Sequential is its simplicity. You only need to list the layers in the order you want them to be executed.

2. Automated Forward Pass: When you add layers to a Sequential model, Keras automatically connects the output of each layer to the input of the next, creating the forward pass without requiring manual intervention.

3. Internal State Management: The Sequential model manages the state of its layers (like weights and biases) and the computational graph. When you call compile on the model, it configures the learning process by specifying the optimizer, loss function, and metrics.

4. Training and Inference: The Sequential class provides methods like fit, evaluate, and predict for training, evaluating, and making predictions with the model, respectively. These methods handle the training loop and inference process internally.

`Model` Class

The Model class, used with the functional API, offers more flexibility than Sequential. It's designed for more complex architectures, including models with multiple inputs or outputs, shared layers, and non-linear topology.

None
keras Model Class

Key Characteristics:

1. Graph of Layers: Instead of a linear stack, Model allows you to create a graph of layers. This means you can define models where layers connect to more than just the previous and next layers.

2. Explicit Input and Output Management: In the functional API, you explicitly define the input and outputs of your model. This allows for more complex architectures compared to the Sequential model.

3. Flexibility in Connectivity: The Model class can handle models with branches, multiple inputs and outputs, and shared layers, making it suitable for a wide range of applications beyond simple feed-forward networks.

4. State and Training Management: Similar to Sequential, the Model class manages the state (weights, biases) of all its layers and the training process. However, it provides more control over how layers are connected and how data flows through the model.

Underlying Mechanism

Both Sequential and Model classes rely on the following mechanisms:

1. Layer Registration: When you add a layer to these models, the layer is registered internally, and its parameters are added to the model's list of parameters.

2. Automatic Differentiation: During training, Keras uses automatic differentiation (through the backend engine, e.g., TensorFlow) to compute gradients. This process is abstracted away from the user.

3. Backend Execution: The actual computations (like matrix multiplications, activations, etc.) are handled by the backend engine, which executes the computational graph defined by the model.

4. Serialization and Deserialization: These classes include methods for saving and loading models, which involve serializing the model's architecture and weights.

In essence, the Sequentialand Model classes in Keras abstract away much of the complexity involved in defining and managing a computational graph, allowing users to focus on the architecture of the neural network rather than the underlying computational mechanics. They handle the intricate details of how layers are interconnected, how data flows through the network, and how training and inference operations are conducted.

None
Computation graphs for automatic differentiation

A detailed look at Layer and Conv2D class internal structure

The implementation of the Layer class and the Conv2D class in Keras is quite comprehensive and involves a deep dive into the Keras source code. Due to the complexity and length of the code, here's a high-level overview and key excerpts from the implementation.

Layer Class in Keras

The Layer class is a base class for all layers in Keras. It provides a framework within which all other layers (like Conv2D, Dense, etc.) operate. Here are some key aspects of its implementation:

  1. Initialization: The Layer class has an __init__ method that initializes a layer, setting various attributes like trainable, name, etc.
  2. Build Method: The build method is where the layer can create its weights. This method is called automatically the first time the layer is used. Layers that inherit from Layer need to implement this method to set up their weights.
  3. Call Method: The call method defines the layer's forward pass computation. It's where the actual computation for the layer happens.
  4. Weights Management: The Layer class provides mechanisms to add and manage weights.
from tensorflow.keras import backend as K
from tensorflow.keras.utils import generic_utils
from tensorflow.python.framework import ops

class Layer:
    def __init__(self, trainable=True, name=None, dtype=None, dynamic=False, **kwargs):
        self.trainable = trainable
        self.built = False
        self.dtype = dtype or K.floatx()
        self.dynamic = dynamic
        if not name:
            prefix = self.__class__.__name__.lower()
            name = prefix + '_' + str(K.get_uid(prefix))
        self.name = name
        self._trainable_weights = []
        self._non_trainable_weights = []
        # ...

    def build(self, input_shape):
        self.built = True

    def call(self, inputs, **kwargs):
        raise NotImplementedError

    def compute_output_shape(self, input_shape):
        raise NotImplementedError

    def add_weight(self, name=None, shape=None, dtype=None, initializer=None, regularizer=None, trainable=True, constraint=None):
        if dtype is None:
            dtype = self.dtype
        weight = K.variable(initializer(shape), dtype=dtype, name=name, constraint=constraint)
        if regularizer is not None:
            self._handle_weight_regularization(name, weight, regularizer)
        if trainable:
            self._trainable_weights.append(weight)
        else:
            self._non_trainable_weights.append(weight)
        return weight

    # ... Additional methods and properties ...

Note: We are still importing a lot of functions like import backend as K, the actual code is much bigger and imports from many other classes.

Conv2D Class in Keras

The Conv2D class, which inherits from the Layer class, is specific to 2D convolution operations. Key aspects of its implementation include:

  1. Initialization: In addition to the base initialization from Layer, Conv2D initializes convolution-specific parameters like the number of filters, kernel size, strides, padding, etc.
  2. Build Method: The Conv2D class overrides the build method to create the convolution kernels (weights) and biases.
  3. Call Method: The call method in Conv2D handles the application of the convolution operation to the input data. The call method of the Conv2D class implements the forward pass of the layer. It is where the actual convolution operation is performed using the layer's weights.
  4. Utilizing Backend Operations: The convolution operation itself is performed using functions provided by the backend (like TensorFlow). This is where the abstraction comes into play — Keras defines what to compute, and the backend determines how to compute it.
from tensorflow.keras.layers import Layer
from tensorflow.keras import backend as K
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.regularizers import l2
from tensorflow.python.ops import nn

class Conv2D(Layer):
    def __init__(self, filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=glorot_uniform(), bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs):
        super(Conv2D, self).__init__(**kwargs)
        self.filters = filters
        self.kernel_size = kernel_size
        self.strides = strides
        self.padding = padding.upper()
        self.data_format = data_format
        self.dilation_rate = dilation_rate
        self.activation = activation
        self.use_bias = use_bias
        self.kernel_initializer = kernel_initializer
        self.bias_initializer = bias_initializer
        self.kernel_regularizer = kernel_regularizer
        self.bias_regularizer = bias_regularizer
        self.activity_regularizer = activity_regularizer
        self.kernel_constraint = kernel_constraint
        self.bias_constraint = bias_constraint
        # ...

    def build(self, input_shape):
        if self.data_format == 'channels_first':
            channel_axis = 1
        else:
            channel_axis = -1
        if input_shape[channel_axis] is None:
            raise ValueError('The channel dimension of the inputs should be defined. Found `None`.')
        input_dim = input_shape[channel_axis]
        kernel_shape = self.kernel_size + (input_dim, self.filters)

        self.kernel = self.add_weight(shape=kernel_shape, initializer=self.kernel_initializer, name='kernel', regularizer=self.kernel_regularizer, constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.filters,), initializer=self.bias_initializer, name='bias', regularizer=self.bias_regularizer, constraint=self.bias_constraint)
        else:
            self.bias = None
        self.built = True

    def call(self, inputs):
        outputs = nn.convolution(inputs, self.kernel, strides=self.strides, padding=self.padding, data_format=self.data_format, dilations=self.dilation_rate)
        if self.use_bias:
            outputs = nn.bias_add(outputs, self.bias, data_format=self.data_format)
        if self.activation is not None:
            return self.activation(outputs)
        return outputs

    # ... Additional methods and properties ...

Just look at the complexity of Conv2D layer, imagine if you have to write the entire network like this, it will be near impossible for most of us to create NN without a library like Keras.

None
Using Conv2D in Keras

What has Keras 3.0 achieved?

Keras 3.0 has introduced several significant features and improvements, making it an even more versatile and efficient tool for deep learning development. Here are some key highlights of Keras 3.0:

  1. Multi-Backend Support: Keras 3.0 acts as a connector, allowing seamless use of TensorFlow, JAX, and PyTorch together. This flexibility enables developers to mix and match tools for specific tasks without changing the code.
  • Write a low-level JAX training loop to train a Keras model using an optax optimizer, jax.grad, jax.jit, jax.pmap.
  • Write a low-level TensorFlow training loop to train a Keras model using tf.GradientTape and tf.distribute.
  • Write a low-level PyTorch training loop to train a Keras model using a torch.optim optimizer, a torch loss function, and the torch.nn.parallel.DistributedDataParallel wrapper.
  • Use a Keras layer or model as part of a torch.nn.Module. This means that PyTorch users can start leveraging Keras models whether or not they use Keras APIs! You can treat a Keras model just like any other PyTorch Module.
None
Keras running TensorFlow, PyTorch, and JAX (Img Src)
None
Custom components from different backends (Img Src)

2. Performance Optimization: By default, Keras 3.0 leverages XLA (Accelerated Linear Algebra) compilation, which optimizes mathematical computations for faster execution on hardware like GPUs and TPUs. It dynamically selects the best backend for AI models, ensuring optimal efficiency.

3. Expanded Ecosystem Surface: Models built in Keras can be used as PyTorch Modules, TensorFlow SavedModels, or within JAX's TPU training infrastructure, providing flexibility and allowing users to take advantage of the strengths of each framework.

4. Cross-Framework Low-Level Language: The introduction of the keras_core.ops namespace is a groundbreaking feature. It allows custom operations to be written once and used across different deep learning frameworks. The keras_core.ops provides tools and functions resembling the popular NumPy API.

  • A near-full implementation of the NumPy API. Not something "NumPy-like" — just literally the NumPy API, with the same functions and the same arguments. You get ops.matmul, ops.sum, ops.stack, ops.einsum, etc.
  • A set of neural network-specific functions that are absent from NumPy, such as ops.softmax, ops.binary_crossentropy, ops.conv, etc.
None
Img Src

5. Progressive Disclosure of Complexity: Keras 3.0 adopts a design approach that presents the simplest workflows to beginners and progressively unveils advanced features and low-level functionalities.

6. Stateless API for Layers, Models, Metrics, and Optimizers: Embracing the statelessness principle of JAX, Keras 3.0 allows components like layers, models, metrics, and optimizers to be designed in a stateless manner, enhancing compatibility in modern AI development.

All stateful objects in Keras (i.e. objects that own numerical variables that get updated during training or evaluation) now have a stateless API, making it possible to use them in JAX functions (which are required to be fully stateless):

  • All layers and models have a stateless_call() method which mirrors __call__().
  • All optimizers have a stateless_apply() method which mirrors apply().
  • All metrics have a stateless_update_state() method which mirrors update_state() and a stateless_result() method which mirrors result().

These methods have no side-effects whatsoever: they take as input the current value of the state variables of the target object, and return the update values as part of their outputs, e.g.:

outputs, updated_non_trainable_variables = layer.stateless_call(
    trainable_variables,
    non_trainable_variables,
    inputs,
)

Conclusion

Overall, this article aimed to provide an in-depth understanding of Keras's internal structure and how Keras 3.0 added support for multi-backend. Keras 3.0 is a massive update in unifying the scattered landscape of AI development and will definitely help all of us streamline AI development.

Writing such articles is very time-consuming; show some love and respect by clapping and sharing the article. Happy learning ❤ I Hope you had fun. Don't forget to check out other awesome AI articles by clicking my profile.

And if you want to up your AI game, please check my new book on AI, which covers a lot of AI optimizations and hands-on code:

References:

[1] https://keras.io/keras_core/announcement/

[2] https://keras.io/api/