diff --git a/latest/models/layers.html b/latest/models/layers.html index 14abe6c8..8888c305 100644 --- a/latest/models/layers.html +++ b/latest/models/layers.html @@ -6,9 +6,9 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview'); -

Layer Reference

Model Layers

Flux.ChainType.
Chain(layers...)

Chain multiple layers / functions together, so that they are called in sequence on a given input.

m = Chain(x -> x^2, x -> x+1)
+

Layer Reference

Model Layers

These core layers form the foundation of almost all neural networks.

Flux.ChainType.
Chain(layers...)

Chain multiple layers / functions together, so that they are called in sequence on a given input.

m = Chain(x -> x^2, x -> x+1)
 m(5) == 26
 
 m = Chain(Dense(10, 5), Dense(5, 2))
 x = rand(10)
-m(x) == m[2](m[1](x))

Chain also supports indexing and slicing, e.g. m[2] or m[1:end-1]. m[1:3](x) will calculate the output of the first three layers.

source
Flux.DenseType.
Dense(in::Integer, out::Integer, σ = identity)

Creates a traditional Dense layer with parameters W and b.

y = σ.(W * x .+ b)

The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be a vector or batch of length in.

source
+m(x) == m[2](m[1](x))

Chain also supports indexing and slicing, e.g. m[2] or m[1:end-1]. m[1:3](x) will calculate the output of the first three layers.

source
Flux.DenseType.
Dense(in::Integer, out::Integer, σ = identity)

Creates a traditional Dense layer with parameters W and b.

y = σ.(W * x .+ b)

The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be a vector or batch of length in.

source
diff --git a/latest/search_index.js b/latest/search_index.js index adf62e53..18646cf8 100644 --- a/latest/search_index.js +++ b/latest/search_index.js @@ -141,7 +141,7 @@ var documenterSearchIndex = {"docs": [ "page": "Layer Reference", "title": "Model Layers", "category": "section", - "text": "Chain\nDense" + "text": "These core layers form the foundation of almost all neural networks.Chain\nDense" }, { diff --git a/latest/training/optimisers.html b/latest/training/optimisers.html index a0662e36..4aec9378 100644 --- a/latest/training/optimisers.html +++ b/latest/training/optimisers.html @@ -27,4 +27,4 @@ end

If we call update, the parameters W Dense(10, 5, σ), Dense(5, 2), softmax)

Instead of having to write [m[1].W, m[1].b, ...], Flux provides a params function params(m) that returns a list of all parameters in the model for you.

For the update step, there's nothing whatsoever wrong with writing the loop above – it'll work just fine – but Flux provides various optimisers that make it more convenient.

opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1
 
-opt() # Carry out the update, modifying `W` and `b`.

An optimiser takes a parameter list and returns a function that does the same thing as update above. We can pass either opt or update to our training loop, which will then run the optimiser after every mini-batch of data.

Optimiser Reference

All optimisers return a function that, when called, will update the parameters passed to it.

Flux.Optimise.SGDFunction.
SGD(params, η = 1; decay = 0)

Classic gradient descent optimiser. For each parameter p and its gradient δp, this runs p -= η*δp.

Supports decayed learning rate decay if the decay argument is provided.

source
Flux.Optimise.MomentumFunction.
Momentum(params, ρ, decay = 0)

SGD with momentum ρ and optional learning rate decay.

source
Flux.Optimise.NesterovFunction.
Nesterov(params, ρ, decay = 0)

SGD with Nesterov momentum ρ and optional learning rate decay.

source
Flux.Optimise.RMSPropFunction.
RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)

RMSProp optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.

source
Flux.Optimise.ADAMFunction.
ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)

ADAM optimiser.

source
Flux.Optimise.ADAGradFunction.
ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)

ADAGrad optimiser. Parameters don't need tuning.

source
Flux.Optimise.ADADeltaFunction.
ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)

ADADelta optimiser. Parameters don't need tuning.

source
+opt() # Carry out the update, modifying `W` and `b`.

An optimiser takes a parameter list and returns a function that does the same thing as update above. We can pass either opt or update to our training loop, which will then run the optimiser after every mini-batch of data.

Optimiser Reference

All optimisers return a function that, when called, will update the parameters passed to it.

Flux.Optimise.SGDFunction.
SGD(params, η = 1; decay = 0)

Classic gradient descent optimiser. For each parameter p and its gradient δp, this runs p -= η*δp.

Supports decayed learning rate decay if the decay argument is provided.

source
Flux.Optimise.MomentumFunction.
Momentum(params, ρ, decay = 0)

SGD with momentum ρ and optional learning rate decay.

source
Flux.Optimise.NesterovFunction.
Nesterov(params, ρ, decay = 0)

SGD with Nesterov momentum ρ and optional learning rate decay.

source
Flux.Optimise.RMSPropFunction.
RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)

RMSProp optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.

source
Flux.Optimise.ADAMFunction.
ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)

ADAM optimiser.

source
Flux.Optimise.ADAGradFunction.
ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)

ADAGrad optimiser. Parameters don't need tuning.

source
Flux.Optimise.ADADeltaFunction.
ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)

ADADelta optimiser. Parameters don't need tuning.

source