Flux.jl/docs/src/training/optimisers.md

# Optimisers

Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.

```julia
using Flux.Tracker

W = param(rand(2, 5))
b = param(rand(2))

predict(x) = W*x .+ b
loss(x, y) = sum((predict(x) .- y).^2)

x, y = rand(5), rand(2) # Dummy data
l = loss(x, y) # ~ 3

params = Params([W, b])
grads = Tracker.gradient(() -> loss(x, y), params)
```

We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:

```julia
using Flux.Tracker: grad, update!

η = 0.1 # Learning Rate
for p in (W, b)
  update!(p, -η * grads[p])
end
```

Running this will alter the parameters `W` and `b` and our loss should go down. Flux provides a more general way to do optimiser updates like this.

```julia
opt = Descent(0.1) # Gradient descent with learning rate 0.1

for p in (W, b)
  update!(opt, p, -η * grads[p])
end
```

An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.

## Optimiser Reference

All optimisers return an object that, when passed to `train!`, will update the parameters passed to it.

```@docs
SGD
Momentum
Nesterov
ADAM
```
basic training docs 2017-09-10 01:01:19 +00:00			`# Optimisers`

link fixes 2017-09-12 10:34:04 +00:00			Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.
basic training docs 2017-09-10 01:01:19 +00:00
			```julia
update docs 2018-07-11 14:31:22 +00:00			`using Flux.Tracker`

basic training docs 2017-09-10 01:01:19 +00:00			`W = param(rand(2, 5))`
			`b = param(rand(2))`

			`predict(x) = W*x .+ b`
			`loss(x, y) = sum((predict(x) .- y).^2)`

			`x, y = rand(5), rand(2) # Dummy data`
			`l = loss(x, y) # ~ 3`
update docs 2018-07-11 14:31:22 +00:00
			`params = Params([W, b])`
			`grads = Tracker.gradient(() -> loss(x, y), params)`
basic training docs 2017-09-10 01:01:19 +00:00			```

			`We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:`

			```julia
avoid implementation details in docs 2018-06-29 12:53:50 +00:00			`using Flux.Tracker: grad, update!`

tweaks 2019-01-10 11:01:57 +00:00			`η = 0.1 # Learning Rate`
			`for p in (W, b)`
			`update!(p, -η * grads[p])`
basic training docs 2017-09-10 01:01:19 +00:00			`end`
			```

tweaks 2019-01-10 11:01:57 +00:00			Running this will alter the parameters `W` and `b` and our loss should go down. Flux provides a more general way to do optimiser updates like this.
basic training docs 2017-09-10 01:01:19 +00:00
			```julia
[WIP] add optimiser docs 2018-11-12 12:12:52 +00:00			`opt = Descent(0.1) # Gradient descent with learning rate 0.1`
basic training docs 2017-09-10 01:01:19 +00:00
tweaks 2019-01-10 11:01:57 +00:00			`for p in (W, b)`
			`update!(opt, p, -η * grads[p])`
			`end`
basic training docs 2017-09-10 01:01:19 +00:00			```

tweaks 2019-01-10 11:01:57 +00:00			An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.
optimiser docs 2017-10-18 11:07:43 +00:00
			`## Optimiser Reference`

fixed optimisers syntax 2018-12-04 10:38:03 +00:00			All optimisers return an object that, when passed to `train!`, will update the parameters passed to it.
optimiser clarity 2017-10-18 11:22:45 +00:00
fixed optimisers syntax 2018-12-04 10:38:03 +00:00			```@docs
			`SGD`
			`Momentum`
			`Nesterov`
			`ADAM`
[WIP] add optimiser docs 2018-11-12 12:12:52 +00:00			```