What is an activation function and why use them?

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

Explanation: We know, the neural network has neurons that work in correspondence with weight, bias, and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases.

Why do we need a Non-linear activation function?

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.

Here's a list of activation functions available in TensorFlow v2.11 with a simple explanation:

  1. ReLU (Rectified Linear Unit): A simple activation function that sets all negative values to zero and leaves all positive values as they are. This function is commonly used in deep neural networks as it can help prevent the vanishing gradient problem.
  2. Leaky ReLU: A variation of ReLU that allows a slight gradient when the input is negative. This helps to avoid "dead" neurons that have a zero output and never activate again.
  3. ELU (Exponential Linear Unit): Another variation of ReLU that uses the exponential function for negative inputs, resulting in a smoother transition from negative to positive values.
  4. SELU (Scaled Exponential Linear Unit): A self-normalizing version of the ELU activation function that helps to prevent exploding or vanishing gradients in deep neural networks.
  5. Softmax: Used to normalize a set of values into a probability distribution. Softmax is commonly used in the output layer of a neural network for multi-class classification problems.
  6. Sigmoid: Another activation function used in the output layer for binary classification problems. Sigmoid maps any input value to a value between 0 and 1.
  7. Tanh (Hyperbolic Tangent): Similar to sigmoid, but maps input values to a range between -1 and 1. Tanh is also used in the output layer for binary classification problems.
  8. Swish: A relatively new activation function that is similar to Sigmoid, but uses a linear function on the positive inputs, making it smoother than ReLU.
  9. GELU (Gaussian Error Linear Unit): A variant of the Swish activation function that is inspired by the Gaussian cumulative distribution function. GELU is known to improve the performance of neural networks on various tasks.
  10. CReLU (Concatenated Rectified Linear Unit): Concatenates the outputs of ReLU and the negation of ReLU for every input feature map along the channel dimension, resulting in double the number of output feature maps. This activation function is commonly used in computer vision tasks.
  11. Softplus: Similar to ReLU, but uses the softplus function for negative inputs, resulting in a smoother curve.

These are some of the most commonly used activation functions in TensorFlow v2.11. If you find anything wrong with this post please comment and let me know.

Thank you for reading this post.

Have a great day!