cleanup
This commit is contained in:
parent
b08c949b99
commit
b6926f07a5
@ -3,7 +3,7 @@
|
|||||||
Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.
|
Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.
|
||||||
|
|
||||||
```julia
|
```julia
|
||||||
using Flux, Flux.Zygote
|
using Flux
|
||||||
|
|
||||||
W = rand(2, 5))
|
W = rand(2, 5))
|
||||||
b = rand(2)
|
b = rand(2)
|
||||||
@ -58,78 +58,3 @@ AMSGrad
|
|||||||
NADAM
|
NADAM
|
||||||
ADAMW
|
ADAMW
|
||||||
```
|
```
|
||||||
|
|
||||||
## Optimiser Interface
|
|
||||||
|
|
||||||
Flux's optimsers are built around a `struct` that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the `apply!` function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.
|
|
||||||
|
|
||||||
In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.
|
|
||||||
|
|
||||||
```julia
|
|
||||||
mutable struct Momentum{T,S,D}
|
|
||||||
eta::T
|
|
||||||
rho::S
|
|
||||||
velocity::D
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
The `Momentum` type will act as our optimiser in this case. Notice that we have added all the parameters as fields, along with the velocity which we will use as our state. **Note that this behaviour is set to change in consequent versions of Flux**. We can now define the rule applied when this optimiser is invoked.
|
|
||||||
|
|
||||||
```julia
|
|
||||||
function apply!(o::Momentum, x, Δ)
|
|
||||||
η, ρ = o.eta, o.rho
|
|
||||||
v = get!(o.velocity, x, zero(x))::typeof(x)
|
|
||||||
@. v = ρ * v - η * Δ
|
|
||||||
@. Δ = -v
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
This is the basic definition of a Momentum update rule given by:
|
|
||||||
$v = ρ * v - η * Δ$
|
|
||||||
$w = w - v$
|
|
||||||
|
|
||||||
The `apply!` defines the update rules for an optimsier `opt`, given the parameters and gradients. It returns the updated gradients usually. Here, every parameter `x` is retrieved from the running state `v` and subsequently updates the state of the optimiser.
|
|
||||||
|
|
||||||
Flux internally calls on this function via the `update!` function. It shares the API with `apply!` but ensures that multiple parameters are handled gracefully. In the future, it will also be delegating immutable update operations.
|
|
||||||
|
|
||||||
## Composing Optimisers
|
|
||||||
|
|
||||||
Flux defines a special kind of optimiser called simply as `Optimiser` which takes in a arbitrary optimisers as input. Its behaviour is similar to the usual optimisers, but differs in that it acts by calling the optimsers listed in it sequentially. Each optimiser produces a modified gradient
|
|
||||||
that will be fed into the next, and the resultant update will be applied to the parameter as usual. A classic use case is where adding decays is desirable. Flux defines some basic decays including `ExpDecay`, `InvDecay` etc.
|
|
||||||
|
|
||||||
```julia
|
|
||||||
opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())
|
|
||||||
```
|
|
||||||
|
|
||||||
Here we apply exponential decay to the `Descent` optimser. The defaults of `ExpDecay` say that its learning rate will be decayed every 1000 steps.
|
|
||||||
It is then applied like any optimser.
|
|
||||||
|
|
||||||
```julia
|
|
||||||
w = randn(10, 10)
|
|
||||||
w1 = randn(10,10)
|
|
||||||
ps = Params([w, w1])
|
|
||||||
|
|
||||||
loss(x) = Flux.mse(w * x, w1 * x)
|
|
||||||
|
|
||||||
loss(rand(10)) # around 9
|
|
||||||
|
|
||||||
for t = 1:10^5
|
|
||||||
θ = Params([w, w1])
|
|
||||||
θ̄ = gradient(() -> loss(rand(10)), θ)
|
|
||||||
Flux.Optimise.update!(opt, θ, θ̄)
|
|
||||||
end
|
|
||||||
|
|
||||||
loss(rand(10)) # around 0.9
|
|
||||||
```
|
|
||||||
|
|
||||||
In this manner it is possible to compose optimisers for some added flexibility.
|
|
||||||
|
|
||||||
## Decays
|
|
||||||
|
|
||||||
Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.
|
|
||||||
|
|
||||||
```@docs
|
|
||||||
ExpDecay
|
|
||||||
InvDecay
|
|
||||||
WeightDecay
|
|
||||||
```
|
|
Loading…
Reference in New Issue
Block a user