T O P

[Article] The Swish Activation Function

[Article] The Swish Activation Function

ThisIsPlanA

> The best discovered activation function, which we call Swish, is f(x)=x⋅sigmoid(βx), where β is a constant or trainable parameter. It looks like, as β increases, Swish more closely resembles relu.