-
Non-Linearity: The main purpose of the activation function is to introduce Non-linearity in the network such that the network would be capable of learning more complex patterns.
-
Continuous differentiable:Activation function should be continuously differentiable with respect to the weights for the gradient-based optimization methods.
-
Monotonic: Activation function helps the neural network to converge easier into a more precise model.
What is ReLU?
ReLU function is one of the most widely used hidden layer activation function and it is non-linear, monotonic and also continuously differentiable almost everywhere (except at 0, the left derivative at z = 0 is 0 and the right derivative is 1).
-
ReLU is called piecewise linear function or hinge function because the rectified function is linear for half of the input domain and non-linear for the other half.
-
The ReLU layer does not change the size of its input.
-
ReLU does not activate all neurons, if the input is negative it converts to zero this makes the network sparse, efficient and easy for computation.
-
ReLU is non-smoothy, can only be used in the hidden layer.
Why ReLU is popular?
Observe the figure mentioned below to differentiate between activation functions.
-
Other activation functions like sigmoid, tanh suffer from vanishing gradient problem. Both ends of these curves are ‘almost- horizontal’. Gradient values at these parts of the curve are very small or have vanished. Because of that, the network refuses to learn further or the learning is drastically slow.
-
Rectifiers are faster, simply because they involve simpler mathematical operations. they do not require any normalization and exponential computation (such as those in sigmoid or tanh activation function).
-
The training of Neural Network on ReLU can be faster up to 6 times in comparison to other activation functions.
However, the rectifier has another problem dying ReLU problem. for argument lower than the value 0 the gradient vanishes. Neurons that went into that state stop responding to changes in input or error (i.e at gradient value 0, nothing changes).
Do'stlaringiz bilan baham: |