[WIP] add docstrings and doc improvements
This commit is contained in:
parent
07397bc950
commit
1ea8c5a293
@ -59,18 +59,20 @@ An optimiser takes a parameter list and returns a function that does the same th
|
|||||||
|
|
||||||
All optimisers return a `struct` that, when called with their `update!`, will update the parameters passed to it.
|
All optimisers return a `struct` that, when called with their `update!`, will update the parameters passed to it.
|
||||||
|
|
||||||
```@docs
|
* [Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
|
||||||
- [Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
|
* [Momentum](https://arxiv.org/abs/1712.09677)
|
||||||
- Momentum
|
* [Nesterov](https://arxiv.org/abs/1607.01981)
|
||||||
- Nesterov
|
* [RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
|
||||||
- RMSProp
|
* [ADAM](https://arxiv.org/abs/1412.6980v8)
|
||||||
- ADAM
|
* [AdaMax](https://arxiv.org/abs/1412.6980v9)
|
||||||
- AdaMax
|
* [ADAGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
|
||||||
- ADAGrad
|
* [ADADelta](http://arxiv.org/abs/1212.5701)
|
||||||
- ADADelta
|
* [AMSGrad](https://openreview.net/forum?id=ryQu7f-RZ)
|
||||||
- AMSGrad
|
* [NADAM](http://cs229.stanford.edu/proj2015/054_report.pdf)
|
||||||
- NADAM
|
* [ADAMW](https://arxiv.org/abs/1711.05101)
|
||||||
```
|
* InvDecay
|
||||||
|
* ExpDecay
|
||||||
|
* WeightDecay
|
||||||
|
|
||||||
## Optimiser API
|
## Optimiser API
|
||||||
|
|
||||||
@ -100,13 +102,13 @@ opt.eta = 0.2 # valid statment, useful for annealing/ scaling
|
|||||||
The `ExpDecay` function defined within Flux, takes advantage of this flexibility. It can be used as a way of scheduling the learning rate. It makes it easy to scale the learning rate, every `n` epochs. Additionaly, it is easy to specify a `clip` or a bound to the learning rate, beyond which it will be maintained throughout the remainder of the training.
|
The `ExpDecay` function defined within Flux, takes advantage of this flexibility. It can be used as a way of scheduling the learning rate. It makes it easy to scale the learning rate, every `n` epochs. Additionaly, it is easy to specify a `clip` or a bound to the learning rate, beyond which it will be maintained throughout the remainder of the training.
|
||||||
|
|
||||||
```julia
|
```julia
|
||||||
mutable struct ExpDecay
|
ExpDecay(opt = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4)
|
||||||
eta::Float64
|
```
|
||||||
decay::Float64
|
|
||||||
step::Int64
|
The above would take the initial learning rate `0.001`, and decay it by `0.1` every `1000` steps until it reaches a minimum of `1e-4`. It can be used such that it can be applied on to any optimiser like so:
|
||||||
clip::Float64
|
|
||||||
current::IdDict
|
```julia
|
||||||
end
|
Optimiser(ExpDecay(...), Descent(...))
|
||||||
```
|
```
|
||||||
|
|
||||||
## Optimiser
|
## Optimiser
|
||||||
|
@ -257,6 +257,14 @@ function update!(o::Optimiser, x, Δ)
|
|||||||
return Δ
|
return Δ
|
||||||
end
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
`InvDecay(γ)`
|
||||||
|
|
||||||
|
Apply inverse time decay to an optimiser
|
||||||
|
```julia
|
||||||
|
Optimiser(InvDecay(..), Opt(..))
|
||||||
|
```
|
||||||
|
"""
|
||||||
mutable struct InvDecay
|
mutable struct InvDecay
|
||||||
gamma::Float64
|
gamma::Float64
|
||||||
state::IdDict
|
state::IdDict
|
||||||
@ -272,6 +280,16 @@ function update!(o::InvDecay, x, Δ)
|
|||||||
return Δ
|
return Δ
|
||||||
end
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
`ExpDecay(eta, decay, decay_step, clip)`
|
||||||
|
|
||||||
|
Schedule the learning rate `eta` by `decay` every `decay_step` till a minimum of `clip`.
|
||||||
|
|
||||||
|
To apply exponential decay to an optimiser:
|
||||||
|
```julia
|
||||||
|
Optimiser(ExpDecay(..), Opt(..))
|
||||||
|
```
|
||||||
|
"""
|
||||||
mutable struct ExpDecay
|
mutable struct ExpDecay
|
||||||
eta::Float64
|
eta::Float64
|
||||||
decay::Float64
|
decay::Float64
|
||||||
@ -292,6 +310,11 @@ function update!(o::ExpDecay, x, Δ)
|
|||||||
@. Δ *= decay
|
@. Δ *= decay
|
||||||
end
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
`WeightDecay(wd)`
|
||||||
|
|
||||||
|
Decay the weight parameter by `wd`
|
||||||
|
"""
|
||||||
mutable struct WeightDecay
|
mutable struct WeightDecay
|
||||||
wd::Real
|
wd::Real
|
||||||
end
|
end
|
||||||
|
Loading…
Reference in New Issue
Block a user