Super-convergence in deep learning is a term coined by research Leslie N. Smith in describing a phenomenon where deep neural networks are trained an order of magnitude faster then when using traditional techniques. The technique has lead to some phenomenal results in the Dawnbench project, leading to the cheapest and
Choosing a good learning rate is the most important hyper-parameter choice when training a deep neural network (assuming a gradient based optimization algorithm is used).
Choosing a learning rate that's too small leads to extremely long training times. Whereas a learning rate that's too large might miss the optimum and