Second order optimization methods and Neural Networks
Published:
Optimization of deep Neural Networks is often done using Gradient-based methods such as mini-batch gradient descent and its extensions such as Momentm, RMSprop, and Adam. Second order optimization methods such as Newton, BFGS, etc are widely used in different areas of statsitics and Machine Learning. Why are these methods are not popular in deep learning?