build based on 7426faf

This commit is contained in:
autodocs 2017-10-18 11:15:49 +00:00
parent 253df58fab
commit dc14435678
3 changed files with 67 additions and 3 deletions

View File

@ -11,4 +11,4 @@ m(5) == 26
m = Chain(Dense(10, 5), Dense(5, 2))
x = rand(10)
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/b26f77489e2b7176ab59330eb3bc2ddb00ce26bd/src/layers/basic.jl#L1-L16">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a><span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>in</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/b26f77489e2b7176ab59330eb3bc2ddb00ce26bd/src/layers/basic.jl#L38-L47">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/7426faf37dc2eb75ea56ebb6312c248e487fdee9/src/layers/basic.jl#L1-L16">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a><span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>in</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/7426faf37dc2eb75ea56ebb6312c248e487fdee9/src/layers/basic.jl#L38-L47">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>

View File

@ -160,6 +160,70 @@ var documenterSearchIndex = {"docs": [
"text": "Consider a simple linear regression. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters W and b.W = param(rand(2, 5))\nb = param(rand(2))\n\npredict(x) = W*x .+ b\nloss(x, y) = sum((predict(x) .- y).^2)\n\nx, y = rand(5), rand(2) # Dummy data\nl = loss(x, y) # ~ 3\nback!(l)We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:using Flux.Tracker: data, grad\n\nfunction update()\n η = 0.1 # Learning Rate\n for p in (W, b)\n x, Δ = data(p), grad(p)\n x .-= η .* Δ # Apply the update\n Δ .= 0 # Clear the gradient\n end\nendIf we call update, the parameters W and b will change and our loss should go down.There are two pieces here: one is that we need a list of trainable parameters for the model ([W, b] in this case), and the other is the update step. In this case the update is simply gradient descent (x .-= η .* Δ), but we might choose to do something more advanced, like adding momentum.In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers.m = Chain(\n Dense(10, 5, σ),\n Dense(5, 2), softmax)Instead of having to write [m[1].W, m[1].b, ...], Flux provides a params function params(m) that returns a list of all parameters in the model for you.For the update step, there's nothing whatsoever wrong with writing the loop above it'll work just fine but Flux provides various optimisers that make it more convenient.opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1\n\nopt()An optimiser takes a parameter list and returns a function that does the same thing as update above. We can pass either opt or update to our training loop, which will then run the optimiser after every mini-batch of data."
},
{
"location": "training/optimisers.html#Flux.Optimise.SGD",
"page": "Optimisers",
"title": "Flux.Optimise.SGD",
"category": "Function",
"text": "SGD(params, η = 1; decay = 0)\n\nClassic gradient descent optimiser. For each parameter p and its gradient δp, this runs p -= η*δp.\n\nSupports decayed learning rate decay if the decay argument is provided.\n\n\n\n"
},
{
"location": "training/optimisers.html#Flux.Optimise.Momentum",
"page": "Optimisers",
"title": "Flux.Optimise.Momentum",
"category": "Function",
"text": "Momentum(params, ρ, decay = 0)\n\nSGD with momentum ρ and optional learning rate decay.\n\n\n\n"
},
{
"location": "training/optimisers.html#Flux.Optimise.Nesterov",
"page": "Optimisers",
"title": "Flux.Optimise.Nesterov",
"category": "Function",
"text": "Nesterov(params, ρ, decay = 0)\n\nSGD with Nesterov momentum ρ and optional learning rate decay.\n\n\n\n"
},
{
"location": "training/optimisers.html#Flux.Optimise.RMSProp",
"page": "Optimisers",
"title": "Flux.Optimise.RMSProp",
"category": "Function",
"text": "RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)\n\nRMSProp optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.\n\n\n\n"
},
{
"location": "training/optimisers.html#Flux.Optimise.ADAM",
"page": "Optimisers",
"title": "Flux.Optimise.ADAM",
"category": "Function",
"text": "ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)\n\nADAM optimiser.\n\n\n\n"
},
{
"location": "training/optimisers.html#Flux.Optimise.ADAGrad",
"page": "Optimisers",
"title": "Flux.Optimise.ADAGrad",
"category": "Function",
"text": "ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)\n\nADAGrad optimiser. Parameters don't need tuning.\n\n\n\n"
},
{
"location": "training/optimisers.html#Flux.Optimise.ADADelta",
"page": "Optimisers",
"title": "Flux.Optimise.ADADelta",
"category": "Function",
"text": "ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)\n\nADADelta optimiser. Parameters don't need tuning.\n\n\n\n"
},
{
"location": "training/optimisers.html#Optimiser-Reference-1",
"page": "Optimisers",
"title": "Optimiser Reference",
"category": "section",
"text": "SGD\nMomentum\nNesterov\nRMSProp\nADAM\nADAGrad\nADADelta"
},
{
"location": "training/training.html#",
"page": "Training",

File diff suppressed because one or more lines are too long