Lec_6_Training Neural Networks, Part II
Happy Moment: 功夫老鼠—看我的锁喉功
Parameter Updates
The common method
Update problem-1 TOO SLOW
Momentum update
- Physical interpretation as ball rolling down the loss function + friction (mu coefficient).
- mu = usually ~0.5, 0.9, or 0.99 (Sometimes annealed over time, e.g. from 0.5 -> 0.99)
Nesterov Momentum update
AdaGrad update
RMSProp update
Adam update
Update problem-2 HYPERPARAMETER NEEDED
Second order optimization methods
L-BFGS
Summary for update problems 1+2
IN PRACTICE
- Adam is a good default choice in most cases
- If you can afford to do full batch updates then try out L-BFGS (and don’t forget to disable all sources of noise)
Evaluation: Model Ensembles
- Train multiple independent models
- At test time average their results
- Enjoy 2% extra performance
Regularization(dropout)
Regularization: Dropout “randomly set some neurons to zero in the forward pass”
Dropout 的目的是防止过拟合
-
-
Gradient Checking
- see notes
Convolutional Neural Networks
A bit of history:
- 1959 Hubel & Wiesel with cat
https://youtu.be/8VdFf3egwfg?t=1m10s - 1980 Fukushima
- 1998 LeCun
- 2012 ImageNET
- Fast-forward to today: ConvNets are everywhere
Tomorrow is Weekend, Have a Good Relax!