# Computer Vision

• The structure of images is complex
• invariances
• scale
• translation
• cropping
• dilation
• homogeneity
• Perceptual sensitivity
• color
• edges
• orientations
• Extracting semantics is challenging
• occlusion
• deformation
• illumination
• viewpoint
• object pose

# Convolutional Network

• Translation invariance
• Convolutional kernels are a spatially localized receptive field whose weights are shared across spatial locations.

Padding: what to do at the edges: valid or same

stride: pixels to shift when applying a kernel

The number of model parameters is independent of image size.

• How many parameters are in a single layer

• $(filter width \times filter height) \times (input depth) \times (output depth)$
• How much computational cost in a single layer

• $(filter width \times filter height) \times (input depth) \times (output depth) \times (input width \div stride) \times (input height \div stride)$
• Scaling to high resolution images

• Computational demand grows as quadratically as the image size
• Spatial pooling builds invariance across spatial dimensions
• Regularization mitigates overfitting (e.g. weight decay, dropout)
• Normalization empirically accelerates training and makes better model images

Convolutional layers take up most of computation, but fully connected layer have most of the parameters.

• Trends in netowrk architecture

• Normalization methods are an important ingredient for achieving state-of-the-art performance
• Many variations, none of which is strictly biological
• Almost all vision models employ some form of normalization throughout a network representation
• Deeper and larger networks lead to better predictive performance
• Multi-scale architectures provide greate predictive performance while minimizing computational demand.
• Normalization styles

• Batch Norm
• Layer Norm
• Instance Norm
• Group Norm
• Calculate the mean $\mu$ and variance $\sigma^2$ within each group of channels and normalize
• $\mu = \frac{1}{|G|}\sum_{i\in G}x_c$
• $\sigma^2=\frac{1}{|G|}\sum_{i\in G}(x_c - \mu)^2$
• We can get $\bar{x_c} = \frac{x_c - \mu}{\sqrt{\sigma^2 + \epsilon}}$
• Learn the mean and variance $(\gamma, \beta)$ of each layer as parameters
• $y_c = \gamma \bar{x_c} + \beta$

Normalization stabilizes activations during training

PPT