Flux.jl/docs/src/training/optimisers.md

# Optimisers

Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.

```julia
using Flux

W = rand(2, 5)
b = rand(2)

predict(x) = (W * x) .+ b
loss(x, y) = sum((predict(x) .- y).^2)

x, y = rand(5), rand(2) # Dummy data
l = loss(x, y) # ~ 3

θ = Params([W, b])
grads = gradient(() -> loss(x, y), θ)
```

We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:

```julia
using Flux: update!

η = 0.1 # Learning Rate
for p in (W, b)
  update!(p, -η * grads[p])
end
```

Running this will alter the parameters `W` and `b` and our loss should go down. Flux provides a more general way to do optimiser updates like this.

```julia
opt = Descent(0.1) # Gradient descent with learning rate 0.1

for p in (W, b)
  update!(opt, p, grads[p])
end
```

An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.

## Optimiser Reference

All optimisers return an object that, when passed to `train!`, will update the parameters passed to it.

```@docs
Descent
Momentum
Nesterov
RMSProp
ADAM
AdaMax
ADAGrad
ADADelta
AMSGrad
NADAM
ADAMW
```
basic training docs 2017-09-10 01:01:19 +00:00			`# Optimisers`

link fixes 2017-09-12 10:34:04 +00:00			Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.
basic training docs 2017-09-10 01:01:19 +00:00
			```julia
cleanup 2019-09-11 13:48:50 +00:00			`using Flux`
update docs 2018-07-11 14:31:22 +00:00
removed extra parenthesis 2019-09-14 05:26:17 +00:00			`W = rand(2, 5)`
optimiser docs 2019-09-10 15:19:15 +00:00			`b = rand(2)`
basic training docs 2017-09-10 01:01:19 +00:00
optimiser docs 2019-09-10 15:19:15 +00:00			`predict(x) = (W * x) .+ b`
basic training docs 2017-09-10 01:01:19 +00:00			`loss(x, y) = sum((predict(x) .- y).^2)`

			`x, y = rand(5), rand(2) # Dummy data`
			`l = loss(x, y) # ~ 3`
update docs 2018-07-11 14:31:22 +00:00
extend update! with an optimiser 2019-01-28 14:10:09 +00:00			`θ = Params([W, b])`
Update docs/src/training/optimisers.md Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com> 2019-09-11 13:51:15 +00:00			`grads = gradient(() -> loss(x, y), θ)`
basic training docs 2017-09-10 01:01:19 +00:00			```

			`We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:`

			```julia
optimiser docs 2019-09-10 15:19:15 +00:00			`using Flux: update!`
avoid implementation details in docs 2018-06-29 12:53:50 +00:00
tweaks 2019-01-10 11:01:57 +00:00			`η = 0.1 # Learning Rate`
			`for p in (W, b)`
			`update!(p, -η * grads[p])`
basic training docs 2017-09-10 01:01:19 +00:00			`end`
			```

tweaks 2019-01-10 11:01:57 +00:00			Running this will alter the parameters `W` and `b` and our loss should go down. Flux provides a more general way to do optimiser updates like this.
basic training docs 2017-09-10 01:01:19 +00:00
			```julia
[WIP] add optimiser docs 2018-11-12 12:12:52 +00:00			`opt = Descent(0.1) # Gradient descent with learning rate 0.1`
basic training docs 2017-09-10 01:01:19 +00:00
tweaks 2019-01-10 11:01:57 +00:00			`for p in (W, b)`
extend update! with an optimiser 2019-01-28 14:10:09 +00:00			`update!(opt, p, grads[p])`
tweaks 2019-01-10 11:01:57 +00:00			`end`
basic training docs 2017-09-10 01:01:19 +00:00			```

tweaks 2019-01-10 11:01:57 +00:00			An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.
optimiser docs 2017-10-18 11:07:43 +00:00
			`## Optimiser Reference`

fixed optimisers syntax 2018-12-04 10:38:03 +00:00			All optimisers return an object that, when passed to `train!`, will update the parameters passed to it.
optimiser clarity 2017-10-18 11:22:45 +00:00
fixed optimisers syntax 2018-12-04 10:38:03 +00:00			```@docs
modernize documentation 2019-01-10 13:54:17 +00:00			`Descent`
fixed optimisers syntax 2018-12-04 10:38:03 +00:00			`Momentum`
			`Nesterov`
add other optimizers to documentation 2019-04-04 20:55:21 +00:00			`RMSProp`
fixed optimisers syntax 2018-12-04 10:38:03 +00:00			`ADAM`
add other optimizers to documentation 2019-04-04 20:55:21 +00:00			`AdaMax`
			`ADAGrad`
			`ADADelta`
			`AMSGrad`
			`NADAM`
			`ADAMW`
[WIP] add optimiser docs 2018-11-12 12:12:52 +00:00			```