[Article] The Swish Activation Function
By - hellopaperspace
Not trying to discount the research here, but isn't it logical that any 'complex' non-linear function in general make a great activation function and the only reason we ReLU like functions in place of a very complex activation function is computational speed and efficient resource utilization.
Code for https://arxiv.org/abs/1710.05941 found: https://github.com/swordgeek/SR
[Paper link](https://arxiv.org/abs/1710.05941) | [List of all code implementations](https://www.catalyzex.com/paper/arxiv:1710.05941/code)