Flux.jl/docs/src/models/regularisation.md

# Regularisation

Applying regularisation to model parameters is straightforward. We just need to
apply an appropriate regulariser, such as `norm`, to each model parameter and
add the result to the overall loss.

For example, say we have a simple regression.

```julia
using Flux: crossentropy
m = Dense(10, 5)
loss(x, y) = crossentropy(softmax(m(x)), y)
```

We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b`.

```julia
using LinearAlgebra

penalty() = norm(m.W) + norm(m.b)
loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()
```

When working with layers, Flux provides the `params` function to grab all
parameters at once. We can easily penalise everything with `sum(norm, params)`.

```julia
julia> params(m)
2-element Array{Any,1}:
 param([0.355408 0.533092; … 0.430459 0.171498])
 param([0.0, 0.0, 0.0, 0.0, 0.0])

julia> sum(norm, params(m))
26.01749952921026 (tracked)
```

Here's a larger example with a multi-layer perceptron.

```julia
m = Chain(
  Dense(28^2, 128, relu),
  Dense(128, 32, relu),
  Dense(32, 10), softmax)

loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))

loss(rand(28^2), rand(10))
```

One can also easily add per-layer regularisation via the `activations` function:

```julia
julia> using Flux: activations

julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)

julia> activations(c, rand(10))
3-element Array{Any,1}:
 Float32[0.84682214, 0.6704139, 0.42177814, 0.257832, 0.36255655]
 Float32[0.1501253, 0.073269576]                                 
 Float32[0.5192045, 0.48079553]                                  

julia> sum(norm, ans)
2.1166067f0
```
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
+								# Regularisation
 								Applying regularisation to model parameters is straightforward. We just need to
-												fix vecnorm in docs

											
										
										
											2018-08-29 22:34:41 +00:00
+								apply an appropriate regulariser, such as `norm`, to each model parameter and
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
+								add the result to the overall loss.
 								For example, say we have a simple regression.
 								```julia
-												add 'using Flux: crossentropy'

Following the suggestion from MikeInnes to use 'using Flux: crossentropy' instead 'Flux.crossentropy'
											
										
										
											2018-06-05 12:30:14 +00:00
+								using Flux: crossentropy
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
+								m = Dense(10, 5)
-												add 'using Flux: crossentropy'

Following the suggestion from MikeInnes to use 'using Flux: crossentropy' instead 'Flux.crossentropy'
											
										
										
											2018-06-05 12:30:14 +00:00
+								loss(x, y) = crossentropy(softmax(m(x)), y)
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
+								```
 								We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b`.
 								```julia
-												docs mostly fixed

											
										
										
											2019-09-10 14:17:07 +00:00
+								using LinearAlgebra
-												fix vecnorm in docs

											
										
										
											2018-08-29 22:34:41 +00:00
+								penalty() = norm(m.W) + norm(m.b)
-												add 'using Flux: crossentropy'

Following the suggestion from MikeInnes to use 'using Flux: crossentropy' instead 'Flux.crossentropy'
											
										
										
											2018-06-05 12:30:14 +00:00
+								loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
+								```
 								When working with layers, Flux provides the `params` function to grab all
-												fix vecnorm in docs

											
										
										
											2018-08-29 22:34:41 +00:00
+								parameters at once. We can easily penalise everything with `sum(norm, params)`.
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
 								```julia
 								julia> params(m)
 -element Array{Any,1}:
 								 param([0.355408 0.533092; … 0.430459 0.171498])
 								 param([0.0, 0.0, 0.0, 0.0, 0.0])
-												fix vecnorm in docs

											
										
										
											2018-08-29 22:34:41 +00:00
+								julia> sum(norm, params(m))
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
+.01749952921026 (tracked)
 								```
 								Here's a larger example with a multi-layer perceptron.
 								```julia
 								m = Chain(
 								  Dense(28^2, 128, relu),
 								  Dense(128, 32, relu),
 								  Dense(32, 10), softmax)
-												fix vecnorm in docs

											
										
										
											2018-08-29 22:34:41 +00:00
+								loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))
-												document regularisation, fixes #160

											
										
										
											2018-02-09 19:00:26 +00:00
 								loss(rand(28^2), rand(10))
 								```
-												add `activations` docs

											
										
										
											2018-06-26 13:35:03 +00:00
 								One can also easily add per-layer regularisation via the `activations` function:
 								```julia
-												docs mostly fixed

											
										
										
											2019-09-10 14:17:07 +00:00
+								julia> using Flux: activations
-												add `activations` docs

											
										
										
											2018-06-26 13:35:03 +00:00
+								julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
-												docs mostly fixed

											
										
										
											2019-09-10 14:17:07 +00:00
+								Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
-												add `activations` docs

											
										
										
											2018-06-26 13:35:03 +00:00
 								julia> activations(c, rand(10))
 -element Array{Any,1}:
-												docs mostly fixed

											
										
										
											2019-09-10 14:17:07 +00:00
+								 Float32[0.84682214, 0.6704139, 0.42177814, 0.257832, 0.36255655]
 								 Float32[0.1501253, 0.073269576]
 								 Float32[0.5192045, 0.48079553]
-												add `activations` docs

											
										
										
											2018-06-26 13:35:03 +00:00
-												fix vecnorm in docs

											
										
										
											2018-08-29 22:34:41 +00:00
+								julia> sum(norm, ans)
-												docs mostly fixed

											
										
										
											2019-09-10 14:17:07 +00:00
+.1166067f0
-												add `activations` docs

											
										
										
											2018-06-26 13:35:03 +00:00
+								```