From 81e5551256f7d1261de5adb65aa70d967df055d4 Mon Sep 17 00:00:00 2001 From: Mike J Innes Date: Thu, 10 Jan 2019 11:01:57 +0000 Subject: [PATCH 1/2] tweaks --- docs/src/training/optimisers.md | 30 ++++++++---------------------- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/docs/src/training/optimisers.md b/docs/src/training/optimisers.md index 58854a8f..1fc49fca 100644 --- a/docs/src/training/optimisers.md +++ b/docs/src/training/optimisers.md @@ -23,37 +23,23 @@ We want to update each parameter, using the gradient, in order to improve (reduc ```julia using Flux.Tracker: grad, update! -function sgd() - η = 0.1 # Learning Rate - for p in (W, b) - update!(p, -η * grads[p]) - end +η = 0.1 # Learning Rate +for p in (W, b) + update!(p, -η * grads[p]) end ``` -If we call `sgd`, the parameters `W` and `b` will change and our loss should go down. - -There are two pieces here: one is that we need a list of trainable parameters for the model (`[W, b]` in this case), and the other is the update step. In this case the update is simply gradient descent (`x .-= η .* Δ`), but we might choose to do something more advanced, like adding momentum. - -In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers. - -```julia -m = Chain( - Dense(10, 5, σ), - Dense(5, 2), softmax) -``` - -Instead of having to write `[m[1].W, m[1].b, ...]`, Flux provides a params function `params(m)` that returns a list of all parameters in the model for you. - -For the update step, there's nothing whatsoever wrong with writing the loop above – it'll work just fine – but Flux provides various *optimisers* that make it more convenient. +Running this will alter the parameters `W` and `b` and our loss should go down. Flux provides a more general way to do optimiser updates like this. ```julia opt = Descent(0.1) # Gradient descent with learning rate 0.1 -Optimise.update!(opt, [W, b]) # Carry out the update, modifying `W` and `b`. +for p in (W, b) + update!(opt, p, -η * grads[p]) +end ``` -An optimiser takes a parameter list and returns a object that holds the current values in the optimiser. We can pass `opt` to our [training loop](training.md), which will then run the `update!` step for the optimiser after every mini-batch of data. +An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`. ## Optimiser Reference From e6f925f9770beeebfe8013a5bdddf716857abe36 Mon Sep 17 00:00:00 2001 From: Mike J Innes Date: Thu, 10 Jan 2019 11:05:21 +0000 Subject: [PATCH 2/2] train docstring simplification --- src/optimise/train.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/optimise/train.jl b/src/optimise/train.jl index 571627a1..40baa5fb 100644 --- a/src/optimise/train.jl +++ b/src/optimise/train.jl @@ -45,7 +45,7 @@ function stop() end """ - train!(loss, params, data, opt; cb = () -> ()) + train!(loss, params, data, opt; cb) For each datapoint `d` in `data` computes the gradient of `loss(d...)` through backpropagation and calls the optimizer `opt`.