2018-02-09 19:00:26 +00:00
|
|
|
|
# Regularisation
|
|
|
|
|
|
|
|
|
|
Applying regularisation to model parameters is straightforward. We just need to
|
2018-08-29 22:34:41 +00:00
|
|
|
|
apply an appropriate regulariser, such as `norm`, to each model parameter and
|
2018-02-09 19:00:26 +00:00
|
|
|
|
add the result to the overall loss.
|
|
|
|
|
|
|
|
|
|
For example, say we have a simple regression.
|
|
|
|
|
|
|
|
|
|
```julia
|
2018-06-05 12:30:14 +00:00
|
|
|
|
using Flux: crossentropy
|
2018-02-09 19:00:26 +00:00
|
|
|
|
m = Dense(10, 5)
|
2018-06-05 12:30:14 +00:00
|
|
|
|
loss(x, y) = crossentropy(softmax(m(x)), y)
|
2018-02-09 19:00:26 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b`.
|
|
|
|
|
|
|
|
|
|
```julia
|
2019-09-10 14:17:07 +00:00
|
|
|
|
using LinearAlgebra
|
|
|
|
|
|
2018-08-29 22:34:41 +00:00
|
|
|
|
penalty() = norm(m.W) + norm(m.b)
|
2018-06-05 12:30:14 +00:00
|
|
|
|
loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()
|
2018-02-09 19:00:26 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
When working with layers, Flux provides the `params` function to grab all
|
2018-08-29 22:34:41 +00:00
|
|
|
|
parameters at once. We can easily penalise everything with `sum(norm, params)`.
|
2018-02-09 19:00:26 +00:00
|
|
|
|
|
|
|
|
|
```julia
|
|
|
|
|
julia> params(m)
|
|
|
|
|
2-element Array{Any,1}:
|
|
|
|
|
param([0.355408 0.533092; … 0.430459 0.171498])
|
|
|
|
|
param([0.0, 0.0, 0.0, 0.0, 0.0])
|
|
|
|
|
|
2018-08-29 22:34:41 +00:00
|
|
|
|
julia> sum(norm, params(m))
|
2018-02-09 19:00:26 +00:00
|
|
|
|
26.01749952921026 (tracked)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Here's a larger example with a multi-layer perceptron.
|
|
|
|
|
|
|
|
|
|
```julia
|
|
|
|
|
m = Chain(
|
|
|
|
|
Dense(28^2, 128, relu),
|
|
|
|
|
Dense(128, 32, relu),
|
|
|
|
|
Dense(32, 10), softmax)
|
|
|
|
|
|
2018-08-29 22:34:41 +00:00
|
|
|
|
loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))
|
2018-02-09 19:00:26 +00:00
|
|
|
|
|
|
|
|
|
loss(rand(28^2), rand(10))
|
|
|
|
|
```
|
2018-06-26 13:35:03 +00:00
|
|
|
|
|
|
|
|
|
One can also easily add per-layer regularisation via the `activations` function:
|
|
|
|
|
|
|
|
|
|
```julia
|
2019-09-10 14:17:07 +00:00
|
|
|
|
julia> using Flux: activations
|
|
|
|
|
|
2018-06-26 13:35:03 +00:00
|
|
|
|
julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
|
2019-09-10 14:17:07 +00:00
|
|
|
|
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
|
2018-06-26 13:35:03 +00:00
|
|
|
|
|
|
|
|
|
julia> activations(c, rand(10))
|
|
|
|
|
3-element Array{Any,1}:
|
2019-09-10 14:17:07 +00:00
|
|
|
|
Float32[0.84682214, 0.6704139, 0.42177814, 0.257832, 0.36255655]
|
|
|
|
|
Float32[0.1501253, 0.073269576]
|
|
|
|
|
Float32[0.5192045, 0.48079553]
|
2018-06-26 13:35:03 +00:00
|
|
|
|
|
2018-08-29 22:34:41 +00:00
|
|
|
|
julia> sum(norm, ans)
|
2019-09-10 14:17:07 +00:00
|
|
|
|
2.1166067f0
|
2018-06-26 13:35:03 +00:00
|
|
|
|
```
|