build based on 7426faf

2017-10-18 11:15:49 +00:00 · 2017-10-18 11:15:49 +00:00 · dc14435678
commit dc14435678
parent 253df58fab
3 changed files with 67 additions and 3 deletions
--- a/latest/models/layers.html
+++ b/latest/models/layers.html
@ -11,4 +11,4 @@ m(5) == 26

 m = Chain(Dense(10, 5), Dense(5, 2))
 x = rand(10)
-m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/b26f77489e2b7176ab59330eb3bc2ddb00ce26bd/src/layers/basic.jl#L1-L16">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>in</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/b26f77489e2b7176ab59330eb3bc2ddb00ce26bd/src/layers/basic.jl#L38-L47">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
+m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/7426faf37dc2eb75ea56ebb6312c248e487fdee9/src/layers/basic.jl#L1-L16">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>in</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/7426faf37dc2eb75ea56ebb6312c248e487fdee9/src/layers/basic.jl#L38-L47">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
--- a/latest/search_index.js
+++ b/latest/search_index.js
@ -160,6 +160,70 @@ var documenterSearchIndex = {"docs": [
    "text": "Consider a simple linear regression. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters W and b.W = param(rand(2, 5))\nb = param(rand(2))\n\npredict(x) = W*x .+ b\nloss(x, y) = sum((predict(x) .- y).^2)\n\nx, y = rand(5), rand(2) # Dummy data\nl = loss(x, y) # ~ 3\nback!(l)We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:using Flux.Tracker: data, grad\n\nfunction update()\n  η = 0.1 # Learning Rate\n  for p in (W, b)\n    x, Δ = data(p), grad(p)\n    x .-= η .* Δ # Apply the update\n    Δ .= 0       # Clear the gradient\n  end\nendIf we call update, the parameters W and b will change and our loss should go down.There are two pieces here: one is that we need a list of trainable parameters for the model ([W, b] in this case), and the other is the update step. In this case the update is simply gradient descent (x .-= η .* Δ), but we might choose to do something more advanced, like adding momentum.In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers.m = Chain(\n  Dense(10, 5, σ),\n  Dense(5, 2), softmax)Instead of having to write [m[1].W, m[1].b, ...], Flux provides a params function params(m) that returns a list of all parameters in the model for you.For the update step, there's nothing whatsoever wrong with writing the loop above – it'll work just fine – but Flux provides various optimisers that make it more convenient.opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1\n\nopt()An optimiser takes a parameter list and returns a function that does the same thing as update above. We can pass either opt or update to our training loop, which will then run the optimiser after every mini-batch of data."
 },

+{
+    "location": "training/optimisers.html#Flux.Optimise.SGD",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.SGD",
+    "category": "Function",
+    "text": "SGD(params, η = 1; decay = 0)\n\nClassic gradient descent optimiser. For each parameter p and its gradient δp, this runs p -= η*δp.\n\nSupports decayed learning rate decay if the decay argument is provided.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Flux.Optimise.Momentum",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.Momentum",
+    "category": "Function",
+    "text": "Momentum(params, ρ, decay = 0)\n\nSGD with momentum ρ and optional learning rate decay.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Flux.Optimise.Nesterov",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.Nesterov",
+    "category": "Function",
+    "text": "Nesterov(params, ρ, decay = 0)\n\nSGD with Nesterov momentum ρ and optional learning rate decay.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Flux.Optimise.RMSProp",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.RMSProp",
+    "category": "Function",
+    "text": "RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)\n\nRMSProp optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Flux.Optimise.ADAM",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.ADAM",
+    "category": "Function",
+    "text": "ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)\n\nADAM optimiser.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Flux.Optimise.ADAGrad",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.ADAGrad",
+    "category": "Function",
+    "text": "ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)\n\nADAGrad optimiser. Parameters don't need tuning.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Flux.Optimise.ADADelta",
+    "page": "Optimisers",
+    "title": "Flux.Optimise.ADADelta",
+    "category": "Function",
+    "text": "ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)\n\nADADelta optimiser. Parameters don't need tuning.\n\n\n\n"
+},
+
+{
+    "location": "training/optimisers.html#Optimiser-Reference-1",
+    "page": "Optimisers",
+    "title": "Optimiser Reference",
+    "category": "section",
+    "text": "SGD\nMomentum\nNesterov\nRMSProp\nADAM\nADAGrad\nADADelta"
+},
+
 {
    "location": "training/training.html#",
    "page": "Training",
--- a/latest/training/optimisers.html
+++ b/latest/training/optimisers.html