basic training docs
This commit is contained in:
parent
33a5d26e57
commit
17e40b1f76
|
@ -7,10 +7,13 @@ makedocs(modules=[Flux],
|
|||
sitename = "Flux",
|
||||
assets = ["../flux.css"],
|
||||
pages = ["Home" => "index.md",
|
||||
"Models" =>
|
||||
"Building Models" =>
|
||||
["Basics" => "models/basics.md",
|
||||
"Recurrence" => "models/recurrence.md",
|
||||
"Layer Reference" => "models/layers.md"],
|
||||
"Training Models" =>
|
||||
["Optimisers" => "training/optimisers.md",
|
||||
"Training" => "training/training.md"],
|
||||
"Contributing & Help" => "contributing.md"])
|
||||
|
||||
deploydocs(
|
||||
|
|
|
@ -0,0 +1,54 @@
|
|||
# Optimisers
|
||||
|
||||
Consider a [simple linear regression](../models/basics.html). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.
|
||||
|
||||
```julia
|
||||
W = param(rand(2, 5))
|
||||
b = param(rand(2))
|
||||
|
||||
predict(x) = W*x .+ b
|
||||
loss(x, y) = sum((predict(x) .- y).^2)
|
||||
|
||||
x, y = rand(5), rand(2) # Dummy data
|
||||
l = loss(x, y) # ~ 3
|
||||
back!(l)
|
||||
```
|
||||
|
||||
We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:
|
||||
|
||||
```julia
|
||||
using Flux.Tracker: data, grad
|
||||
|
||||
function update()
|
||||
η = 0.1 # Learning Rate
|
||||
for p in (W, b)
|
||||
x, Δ = data(p), grad(p)
|
||||
x .-= η .* Δ # Apply the update
|
||||
Δ .= 0 # Clear the gradient
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
If we call `update`, the parameters `W` and `b` will change and our loss should go down.
|
||||
|
||||
There are two pieces here: one is that we need a list of trainable parameters for the model (`[W, b]` in this case), and the other is the update step. In this case the update is simply gradient descent (`x .-= η .* Δ`), but we might choose to do something more advanced, like adding momentum.
|
||||
|
||||
In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers.
|
||||
|
||||
```julia
|
||||
m = Chain(
|
||||
Dense(10, 5, σ),
|
||||
Dense(5, 2), softmax)
|
||||
```
|
||||
|
||||
Instead of having to write `[m[1].W, m[1].b, ...]`, Flux provides a params function `params(m)` that returns a list of all parameters in the model for you.
|
||||
|
||||
For the update step, there's nothing whatsoever wrong with writing the loop above – it'll work just fine – but Flux provides various *optimisers* that make it more convenient.
|
||||
|
||||
```julia
|
||||
opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1
|
||||
|
||||
opt()
|
||||
```
|
||||
|
||||
An optimiser takes a parameter list and returns a function that does the same thing as `update` above. We can pass either `opt` or `update` to our [training loop](training.html), which will then run the optimiser after every mini-batch of data.
|
|
@ -0,0 +1,4 @@
|
|||
```julia
|
||||
Flux.train!(loss, repeated((x,y), 1000), SGD(params(m), 0.1),
|
||||
cb = throttle(() -> @show(loss(x, y)), 5))
|
||||
```
|
Loading…
Reference in New Issue