Development of a fast relu activation function algorithm for deep learning problems


Properties of the Activation Function


Download 1.34 Mb.
bet4/6
Sana05.05.2023
Hajmi1.34 Mb.
#1431413
1   2   3   4   5   6
Bog'liq
Article ReLU

Properties of the Activation Function


  • Non-Linearity: The main purpose of the activation function is to introduce Non-linearity in the network such that the network would be capable of learning more complex patterns.

  • Continuous differentiable:Activation function should be continuously differentiable with respect to the weights for the gradient-based optimization methods.

  • Monotonic: Activation function helps the neural network to converge easier into a more precise model.

What is ReLU?


ReLU function is one of the most widely used hidden layer activation function and it is non-linear, monotonic and also continuously differentiable almost everywhere (except at 0, the left derivative at z = 0 is 0 and the right derivative is 1).


  • ReLU is called piecewise linear function or hinge function because the rectified function is linear for half of the input domain and non-linear for the other half.

  • The ReLU layer does not change the size of its input.

  • ReLU does not activate all neurons, if the input is negative it converts to zero this makes the network sparse, efficient and easy for computation.

  • ReLU is non-smoothy, can only be used in the hidden layer.

Why ReLU is popular?


Observe the figure mentioned below to differentiate between activation functions.


  • Other activation functions like sigmoid, tanh suffer from vanishing gradient problem. Both ends of these curves are ‘almost- horizontal’. Gradient values at these parts of the curve are very small or have vanished. Because of that, the network refuses to learn further or the learning is drastically slow.

  • Rectifiers are faster, simply because they involve simpler mathematical operations. they do not require any normalization and exponential computation (such as those in sigmoid or tanh activation function).

  • The training of Neural Network on ReLU can be faster up to 6 times in comparison to other activation functions.

However, the rectifier has another problem dying ReLU problem. for argument lower than the value 0 the gradient vanishes. Neurons that went into that state stop responding to changes in input or error (i.e at gradient value 0, nothing changes).

Download 1.34 Mb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling