public interface BatchNorm
Batch Normalization [1] determines the mean and standard deviation (stdev) of each input element individually using the training data. It then applies a transform (minus mean, divide stdev) for each individual element to ensure it has zero mean and a standard deviation of 1 across the training set. This alleviates many problems with choosing appropriate initial parameters for inputs across all layers.
During training, batch norm computes a mean and variance for each the input element. Mean/stdev are computed by finding the mean and stdev for a mini-batch and then applying a decaying average. For evaluation the previously computed mean and stdev are fixed and applied to each input element in an element-wise fashion, see below. The final mean and stdev can be computed from the decaying mean/stdev or from the true mean/stdev across the entire dataset, implementation dependent.
It can optionally also learn two parameters, gamma and beta, which can be used to learn to undo batch normalization if helpful. The complete transformation is shown below
output[i] = ((x[i]-mean[i])/sqrt(variance[i]+EPS)*gamma[i] + beta[i]Where 'i' is an element in the tensor. EPS is a small number used to prevent divide by zero errors and is a tuning hyper parameter. EPS is 1e-9 for double and 1e-5 for float by default.
Training Update:
mean[i+1] = learn_rate*mean + (1.0-learn_rate)*mean[i] stdev[i+1] = learn_rate*stdev + (1.0-learn_rate)*stdev[i] TODO change to variance?where (mean,stdev) with no index refers to the statistics from the current mini-batch its being trained on. learn_rate determines how quickly it adjusts the mean and can have a value from 0 to 1, higher values for faster but less stable learning, e.g. 0 = no learning and 1 = old results discarded. Notes:
SpatialBatchNorm[1] Sergey Ioffe, Christian Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" 11 Feb 2015, http://arxiv.org/abs/1502.03167
| Modifier and Type | Method and Description |
|---|---|
double |
getEPS() |
boolean |
hasGammaBeta()
If it returns true then it expects a second set of parameters that defines gamma and beta.
|
void |
setEPS(double EPS)
Used to specify the EPS value.
|
boolean hasGammaBeta()
double getEPS()
void setEPS(double EPS)
EPS - Value of EPS