GELU

Created
TagsActivation Function

The Gaussian Error Linear Unit (GELU) is a type of activation function that has gained popularity in deep learning, particularly in the field of natural language processing (NLP) and beyond. It was first introduced in the paper "Gaussian Error Linear Units (GELUs)" by Dan Hendrycks and Kevin Gimpel. The GELU function is similar to other rectified units such as ReLU, Leaky ReLU, and ELU, but it introduces non-linearity in a slightly different way, incorporating elements of both stochastic and non-stochastic approaches.

Formula

The GELU activation function is defined as:

GELU(x)=xΦ(x) \text{GELU}(x) = x \Phi(x) 

where Φ(x)\Phi(x) is the cumulative distribution function (CDF) of the standard Gaussian distribution. In simpler terms, Φ(x)\Phi(x) represents the probability that a random variable with a standard normal distribution takes on a value less than or equal to x .

An approximate formulation of the GELU function, which is computationally more efficient, is given by:

 GELU(x)0.5x(1+tanh[2π(x+0.044715x3)]) \text{GELU}(x) \approx 0.5 x \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \left(x + 0.044715 x^3\right)\right]\right)

This approximation makes it easier to implement and compute in practice while maintaining similar characteristics to the exact formulation.

Characteristics and Advantages

Applications

The GELU activation function has seen significant adoption in state-of-the-art models, especially in the field of NLP. For instance, it is used as the default activation function in the Transformer model architecture, as seen in models like GPT (Generative Pretrained Transformer) and BERT (Bidirectional Encoder Representations from Transformers). Its success in these models highlights its effectiveness in handling complex patterns and sequences in data.

Implementing GELU in Python

Most deep learning frameworks, such as TensorFlow and PyTorch, include built-in support for the GELU activation function. Here's how you can use it in PyTorch:

import torch
import torch.nn.functional as F

x = torch.tensor([-1.0, 0.0, 1.0])
y = F.gelu(x)

print(y)

And in TensorFlow:

import tensorflow as tf

x = tf.constant([-1.0, 0.0, 1.0])
y = tf.nn.gelu(x)

print(y)

These examples demonstrate how to apply the GELU activation function to a tensor in both PyTorch and TensorFlow, showcasing its ease of use in modern deep learning frameworks. The adoption of GELU in prominent models underscores its utility and effectiveness as an activation function in complex neural network architectures.