The picture depicts BatchNorm
correctly.
In BatchNorm
we compute the mean and variance using the spatial feature maps of the same channel in the whole batch. If you look at the picture that you've attached It may sound confusing because, in that picture, the data is single-channel, which means each grid/matrix represents 1 data sample, however, if you think of colored images, those will require 3 such grid/matrix to represent 1 data sample as they have 3 channels (RGB) per sample. So in your picture, you could think of taking the same element (index) from every m
grid/matrices and then calculate their mean and variance.
So your picture does show the computation of mean and variance for BatchNorm
correctly, however when you'll think of multi-channel data, you might get confused as the picture only good for understanding single-channel data. To make that case (multi-channel) a bit clear, you may think of a colored image dataset. So in every batch, there are a number of images, and each image has 3 channels, RED, GREEN, and BLUE (to visualize, think of RED as a matrix, GREEN as a matrix, and BLUE as a matrix, so 3 matrices per image). So in BatchNorm, what you would do now is (assume batch size is 32) take all the 32 matrices of RED channel and calculate their mean and variance, similarly, you'll repeat the process for GREEN and BLUE channels, so that's how you'd do for multi-channeled data.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…