build based on 7426faf
This commit is contained in:
parent
253df58fab
commit
dc14435678
@ -11,4 +11,4 @@ m(5) == 26
|
||||
|
||||
m = Chain(Dense(10, 5), Dense(5, 2))
|
||||
x = rand(10)
|
||||
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/b26f77489e2b7176ab59330eb3bc2ddb00ce26bd/src/layers/basic.jl#L1-L16">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>in</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/b26f77489e2b7176ab59330eb3bc2ddb00ce26bd/src/layers/basic.jl#L38-L47">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
|
||||
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/7426faf37dc2eb75ea56ebb6312c248e487fdee9/src/layers/basic.jl#L1-L16">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>in</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/7426faf37dc2eb75ea56ebb6312c248e487fdee9/src/layers/basic.jl#L38-L47">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
|
||||
|
@ -160,6 +160,70 @@ var documenterSearchIndex = {"docs": [
|
||||
"text": "Consider a simple linear regression. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters W and b.W = param(rand(2, 5))\nb = param(rand(2))\n\npredict(x) = W*x .+ b\nloss(x, y) = sum((predict(x) .- y).^2)\n\nx, y = rand(5), rand(2) # Dummy data\nl = loss(x, y) # ~ 3\nback!(l)We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:using Flux.Tracker: data, grad\n\nfunction update()\n η = 0.1 # Learning Rate\n for p in (W, b)\n x, Δ = data(p), grad(p)\n x .-= η .* Δ # Apply the update\n Δ .= 0 # Clear the gradient\n end\nendIf we call update, the parameters W and b will change and our loss should go down.There are two pieces here: one is that we need a list of trainable parameters for the model ([W, b] in this case), and the other is the update step. In this case the update is simply gradient descent (x .-= η .* Δ), but we might choose to do something more advanced, like adding momentum.In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers.m = Chain(\n Dense(10, 5, σ),\n Dense(5, 2), softmax)Instead of having to write [m[1].W, m[1].b, ...], Flux provides a params function params(m) that returns a list of all parameters in the model for you.For the update step, there's nothing whatsoever wrong with writing the loop above – it'll work just fine – but Flux provides various optimisers that make it more convenient.opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1\n\nopt()An optimiser takes a parameter list and returns a function that does the same thing as update above. We can pass either opt or update to our training loop, which will then run the optimiser after every mini-batch of data."
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.SGD",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.SGD",
|
||||
"category": "Function",
|
||||
"text": "SGD(params, η = 1; decay = 0)\n\nClassic gradient descent optimiser. For each parameter p and its gradient δp, this runs p -= η*δp.\n\nSupports decayed learning rate decay if the decay argument is provided.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.Momentum",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.Momentum",
|
||||
"category": "Function",
|
||||
"text": "Momentum(params, ρ, decay = 0)\n\nSGD with momentum ρ and optional learning rate decay.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.Nesterov",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.Nesterov",
|
||||
"category": "Function",
|
||||
"text": "Nesterov(params, ρ, decay = 0)\n\nSGD with Nesterov momentum ρ and optional learning rate decay.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.RMSProp",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.RMSProp",
|
||||
"category": "Function",
|
||||
"text": "RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)\n\nRMSProp optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.ADAM",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.ADAM",
|
||||
"category": "Function",
|
||||
"text": "ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)\n\nADAM optimiser.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.ADAGrad",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.ADAGrad",
|
||||
"category": "Function",
|
||||
"text": "ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)\n\nADAGrad optimiser. Parameters don't need tuning.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Flux.Optimise.ADADelta",
|
||||
"page": "Optimisers",
|
||||
"title": "Flux.Optimise.ADADelta",
|
||||
"category": "Function",
|
||||
"text": "ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)\n\nADADelta optimiser. Parameters don't need tuning.\n\n\n\n"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/optimisers.html#Optimiser-Reference-1",
|
||||
"page": "Optimisers",
|
||||
"title": "Optimiser Reference",
|
||||
"category": "section",
|
||||
"text": "SGD\nMomentum\nNesterov\nRMSProp\nADAM\nADAGrad\nADADelta"
|
||||
},
|
||||
|
||||
{
|
||||
"location": "training/training.html#",
|
||||
"page": "Training",
|
||||
|
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user