From b173894f077c39c1bc31427d23c5026b6b975ecc Mon Sep 17 00:00:00 2001
From: autodocs <autodocs>
Date: Thu, 19 Oct 2017 10:29:43 +0000
Subject: [PATCH] build based on e5c8f6d

---
 latest/models/layers.html       |  6 ++--
 latest/search_index.js          | 56 ---------------------------------
 latest/training/optimisers.html |  8 ++++-
 3 files changed, 10 insertions(+), 60 deletions(-)
diff --git a/latest/models/layers.html b/latest/models/layers.html
index 9bf6917b..1f936a77 100644
--- a/latest/models/layers.html
+++ b/latest/models/layers.html
@@ -11,16 +11,16 @@ m(5) == 26
 
 m = Chain(Dense(10, 5), Dense(5, 2))
 x = rand(10)
-m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
+m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e5c8f6d835fbb22857c2126ff76064077b106659/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
 Dense(5, 2)
 
 julia&gt; d(rand(5))
 Tracked 2-element Array{Float64,1}:
   0.00257447
-  -0.00449443</code></pre></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/layers/basic.jl#L40-L59">source</a></section><h2><a class="nav-anchor" id="Recurrent-Cells-1" href="#Recurrent-Cells-1">Recurrent Cells</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/layers/recurrent.jl#L75-L80">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">LSTM(in::Integer, out::Integer, σ = tanh)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/layers/recurrent.jl#L120-L128">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
+  -0.00449443</code></pre></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e5c8f6d835fbb22857c2126ff76064077b106659/src/layers/basic.jl#L40-L59">source</a></section><h2><a class="nav-anchor" id="Recurrent-Cells-1" href="#Recurrent-Cells-1">Recurrent Cells</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e5c8f6d835fbb22857c2126ff76064077b106659/src/layers/recurrent.jl#L75-L80">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">LSTM(in::Integer, out::Integer, σ = tanh)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e5c8f6d835fbb22857c2126ff76064077b106659/src/layers/recurrent.jl#L120-L128">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span>.</div><div><pre><code class="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
 rnn = Flux.Recur(accum, 0)
 rnn(2) # 2
 rnn(3) # 3
 rnn.state # 5
 rnn.(1:10) # apply to a sequence
-rnn.state # 60</code></pre></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/layers/recurrent.jl#L6-L25">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L1-L6">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L12-L17">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p><p>You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L20-L27">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">elu(x; α = 1) = x &gt; 0 ? x : α * (exp(x) - one(x)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a></p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L30-L35">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">swish(x) = x * σ(x)</code></pre><p>Self-gated actvation function.</p><p>See <a href="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L38-L44">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
+rnn.state # 60</code></pre></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e5c8f6d835fbb22857c2126ff76064077b106659/src/layers/recurrent.jl#L6-L25">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L1-L6">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L12-L17">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p><p>You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L20-L27">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">elu(x; α = 1) = x &gt; 0 ? x : α * (exp(x) - one(x)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a></p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L30-L35">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">swish(x) = x * σ(x)</code></pre><p>Self-gated actvation function.</p><p>See <a href="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/NNlib.jl/blob/e4b48c1f41b2786ae5d1efef1ba54ff82eeeb49c/src/activation.jl#L38-L44">source</a></section><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="../training/optimisers.html"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
diff --git a/latest/search_index.js b/latest/search_index.js
index 34a8237a..8fc1c962 100644
--- a/latest/search_index.js
+++ b/latest/search_index.js
@@ -240,62 +240,6 @@ var documenterSearchIndex = {"docs": [
     "text": "Consider a simple linear regression. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters W and b.W = param(rand(2, 5))\nb = param(rand(2))\n\npredict(x) = W*x .+ b\nloss(x, y) = sum((predict(x) .- y).^2)\n\nx, y = rand(5), rand(2) # Dummy data\nl = loss(x, y) # ~ 3\nback!(l)We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:function update()\n  η = 0.1 # Learning Rate\n  for p in (W, b)\n    p.data .-= η .* p.grad # Apply the update\n    p.grad .= 0            # Clear the gradient\n  end\nendIf we call update, the parameters W and b will change and our loss should go down.There are two pieces here: one is that we need a list of trainable parameters for the model ([W, b] in this case), and the other is the update step. In this case the update is simply gradient descent (x .-= η .* Δ), but we might choose to do something more advanced, like adding momentum.In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers.m = Chain(\n  Dense(10, 5, σ),\n  Dense(5, 2), softmax)Instead of having to write [m[1].W, m[1].b, ...], Flux provides a params function params(m) that returns a list of all parameters in the model for you.For the update step, there's nothing whatsoever wrong with writing the loop above – it'll work just fine – but Flux provides various optimisers that make it more convenient.opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1\n\nopt() # Carry out the update, modifying `W` and `b`.An optimiser takes a parameter list and returns a function that does the same thing as update above. We can pass either opt or update to our training loop, which will then run the optimiser after every mini-batch of data."
 },
 
-{
-    "location": "training/optimisers.html#Flux.Optimise.SGD",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.SGD",
-    "category": "Function",
-    "text": "SGD(params, η = 1; decay = 0)\n\nClassic gradient descent optimiser. For each parameter p and its gradient δp, this runs p -= η*δp.\n\nSupports decayed learning rate decay if the decay argument is provided.\n\n\n\n"
-},
-
-{
-    "location": "training/optimisers.html#Flux.Optimise.Momentum",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.Momentum",
-    "category": "Function",
-    "text": "Momentum(params, ρ, decay = 0)\n\nSGD with momentum ρ and optional learning rate decay.\n\n\n\n"
-},
-
-{
-    "location": "training/optimisers.html#Flux.Optimise.Nesterov",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.Nesterov",
-    "category": "Function",
-    "text": "Nesterov(params, ρ, decay = 0)\n\nSGD with Nesterov momentum ρ and optional learning rate decay.\n\n\n\n"
-},
-
-{
-    "location": "training/optimisers.html#Flux.Optimise.RMSProp",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.RMSProp",
-    "category": "Function",
-    "text": "RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)\n\nRMSProp optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.\n\n\n\n"
-},
-
-{
-    "location": "training/optimisers.html#Flux.Optimise.ADAM",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.ADAM",
-    "category": "Function",
-    "text": "ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)\n\nADAM optimiser.\n\n\n\n"
-},
-
-{
-    "location": "training/optimisers.html#Flux.Optimise.ADAGrad",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.ADAGrad",
-    "category": "Function",
-    "text": "ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)\n\nADAGrad optimiser. Parameters don't need tuning.\n\n\n\n"
-},
-
-{
-    "location": "training/optimisers.html#Flux.Optimise.ADADelta",
-    "page": "Optimisers",
-    "title": "Flux.Optimise.ADADelta",
-    "category": "Function",
-    "text": "ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)\n\nADADelta optimiser. Parameters don't need tuning.\n\n\n\n"
-},
-
 {
     "location": "training/optimisers.html#Optimiser-Reference-1",
     "page": "Optimisers",
diff --git a/latest/training/optimisers.html b/latest/training/optimisers.html
index 7b9ca8ff..19e97ab8 100644
--- a/latest/training/optimisers.html
+++ b/latest/training/optimisers.html
@@ -24,4 +24,10 @@ end</code></pre><p>If we call <code>update</code>, the parameters <code>W</code>
   Dense(10, 5, σ),
   Dense(5, 2), softmax)</code></pre><p>Instead of having to write <code>[m[1].W, m[1].b, ...]</code>, Flux provides a params function <code>params(m)</code> that returns a list of all parameters in the model for you.</p><p>For the update step, there&#39;s nothing whatsoever wrong with writing the loop above – it&#39;ll work just fine – but Flux provides various <em>optimisers</em> that make it more convenient.</p><pre><code class="language-julia">opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1
 
-opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">SGD(params, η = 1; decay = 0)</code></pre><p>Classic gradient descent optimiser. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports decayed learning rate decay if the <code>decay</code> argument is provided.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L12-L19">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">Momentum(params, ρ, decay = 0)</code></pre><p>SGD with momentum <code>ρ</code> and optional learning rate decay.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L23-L27">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">Nesterov(params, ρ, decay = 0)</code></pre><p>SGD with Nesterov momentum <code>ρ</code> and optional learning rate decay.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L31-L35">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.RMSProp" href="#Flux.Optimise.RMSProp"><code>Flux.Optimise.RMSProp</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">RMSProp(params; η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0)</code></pre><p><a href="http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a> optimiser. Parameters other than learning rate don&#39;t need tuning. Often a good choice for recurrent networks.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L39-L45">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">ADAM(params; η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L49-L53">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAGrad" href="#Flux.Optimise.ADAGrad"><code>Flux.Optimise.ADAGrad</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">ADAGrad(params; η = 0.01, ϵ = 1e-8, decay = 0)</code></pre><p><a href="http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf">ADAGrad</a> optimiser. Parameters don&#39;t need tuning.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L57-L62">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADADelta" href="#Flux.Optimise.ADADelta"><code>Flux.Optimise.ADADelta</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">ADADelta(params; η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0)</code></pre><p><a href="http://arxiv.org/abs/1212.5701">ADADelta</a> optimiser. Parameters don&#39;t need tuning.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/5b6a5667ed31d23c7413cca6f149344f9e56c10b/src/optimise/interface.jl#L66-L71">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
+opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><pre><code class="language-none">SGD
+Momentum
+Nesterov
+RMSProp
+ADAM
+ADAGrad
+ADADelta</code></pre><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>