build based on 99f98ca

2019-11-28 16:20:11 +00:00 · 2019-11-28 16:20:11 +00:00 · 3c7e03bb0a
commit 3c7e03bb0a
parent 6c9157cfb3
2 changed files with 22 additions and 22 deletions
--- a/dev/models/layers/index.html
+++ b/dev/models/layers/index.html
@ -11,34 +11,34 @@ m(5) == 26

 m = Chain(Dense(10, 5), Dense(5, 2))
 x = rand(10)
-m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
+m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
 Dense(5, 2)

 julia&gt; d(rand(5))
 Tracked 2-element Array{Float64,1}:
  0.00257447
-  -0.00449443</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/basic.jl#L65-L84">source</a></section><h2><a class="nav-anchor" id="Convolution-and-Pooling-Layers-1" href="#Convolution-and-Pooling-Layers-1">Convolution and Pooling Layers</a></h2><p>These layers are used to build convolutional neural networks (CNNs).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Conv(size, in=&gt;out)
+  -0.00449443</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/basic.jl#L65-L84">source</a></section><h2><a class="nav-anchor" id="Convolution-and-Pooling-Layers-1" href="#Convolution-and-Pooling-Layers-1">Convolution and Pooling Layers</a></h2><p>These layers are used to build convolutional neural networks (CNNs).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Conv(size, in=&gt;out)
 Conv(size, in=&gt;out, relu)</code></pre><p>Standard convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Example: Applying Conv layer to a 1-channel input using a 2x2 window size,          giving us a 16-channel output. Output is activated with ReLU.</p><pre><code class="language-none">size = (2,2)
 in = 1
 out = 16
-Conv((2, 2), 1=&gt;16, relu)</code></pre><p>Data should be stored in WHCN order (width, height, # channels, # batches). In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/conv.jl#L5-L25">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/conv.jl#L278-L284">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/conv.jl#L307-L313">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">DepthwiseConv(size, in=&gt;out)
-DepthwiseConv(size, in=&gt;out, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively. Note that <code>out</code> must be an integer multiple of <code>in</code>.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/conv.jl#L143-L155">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.ConvTranspose" href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ConvTranspose(size, in=&gt;out)
-ConvTranspose(size, in=&gt;out, relu)</code></pre><p>Standard convolutional transpose layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/conv.jl#L71-L82">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.CrossCor" href="#Flux.CrossCor"><code>Flux.CrossCor</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">CrossCor(size, in=&gt;out)
+Conv((2, 2), 1=&gt;16, relu)</code></pre><p>Data should be stored in WHCN order (width, height, # channels, # batches). In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/conv.jl#L5-L25">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/conv.jl#L278-L284">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/conv.jl#L307-L313">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">DepthwiseConv(size, in=&gt;out)
+DepthwiseConv(size, in=&gt;out, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively. Note that <code>out</code> must be an integer multiple of <code>in</code>.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/conv.jl#L143-L155">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.ConvTranspose" href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ConvTranspose(size, in=&gt;out)
+ConvTranspose(size, in=&gt;out, relu)</code></pre><p>Standard convolutional transpose layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/conv.jl#L71-L82">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.CrossCor" href="#Flux.CrossCor"><code>Flux.CrossCor</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">CrossCor(size, in=&gt;out)
 CrossCor(size, in=&gt;out, relu)</code></pre><p>Standard cross convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Example: Applying CrossCor layer to a 1-channel input using a 2x2 window size,          giving us a 16-channel output. Output is activated with ReLU.</p><pre><code class="language-none">size = (2,2)
 in = 1
 out = 16
-CrossCor((2, 2), 1=&gt;16, relu)</code></pre><p>Data should be stored in WHCN order (width, height, # channels, # batches). In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/conv.jl#L207-L227">source</a></section><h2><a class="nav-anchor" id="Recurrent-Layers-1" href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/recurrent.jl#L91-L96">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/recurrent.jl#L136-L144">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/recurrent.jl#L177-L185">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
+CrossCor((2, 2), 1=&gt;16, relu)</code></pre><p>Data should be stored in WHCN order (width, height, # channels, # batches). In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/conv.jl#L207-L227">source</a></section><h2><a class="nav-anchor" id="Recurrent-Layers-1" href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/recurrent.jl#L91-L96">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/recurrent.jl#L136-L144">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/recurrent.jl#L177-L185">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
 rnn = Flux.Recur(accum, 0)
 rnn(2) # 2
 rnn(3) # 3
 rnn.state # 5
 rnn.(1:10) # apply to a sequence
-rnn.state # 60</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/recurrent.jl#L7-L26">source</a></section><h2><a class="nav-anchor" id="Other-General-Purpose-Layers-1" href="#Other-General-Purpose-Layers-1">Other General Purpose Layers</a></h2><p>These are marginally more obscure than the Basic Layers. But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Maxout" href="#Flux.Maxout"><code>Flux.Maxout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Maxout(over)</code></pre><p><code>Maxout</code> is a neural network layer, which has a number of internal layers, which all have the same input, and the maxout returns the elementwise maximium of the internal layers&#39; outputs.</p><p>Maxout over linear dense layers satisfies the univeral approximation theorem.</p><p>Reference: Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio.</p><ol><li>Maxout networks.</li></ol><p>In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML&#39;13), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. JMLR.org III-1319-III-1327. https://arxiv.org/pdf/1302.4389.pdf</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/basic.jl#L149-L164">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.SkipConnection" href="#Flux.SkipConnection"><code>Flux.SkipConnection</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">SkipConnection(layers, connection)</code></pre><p>Creates a Skip Connection, of a layer or <code>Chain</code> of consecutive layers plus a shortcut connection. The connection function will combine the result of the layers with the original input, to give the final output.</p><p>The simplest &#39;ResNet&#39;-type connection is just <code>SkipConnection(layer, +)</code>, and requires the output of the layers to be the same shape as the input. Here is a more complicated example:</p><pre><code class="language-none">m = Conv((3,3), 4=&gt;7, pad=(1,1))
+rnn.state # 60</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/recurrent.jl#L7-L26">source</a></section><h2><a class="nav-anchor" id="Other-General-Purpose-Layers-1" href="#Other-General-Purpose-Layers-1">Other General Purpose Layers</a></h2><p>These are marginally more obscure than the Basic Layers. But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Maxout" href="#Flux.Maxout"><code>Flux.Maxout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Maxout(over)</code></pre><p><code>Maxout</code> is a neural network layer, which has a number of internal layers, which all have the same input, and the maxout returns the elementwise maximium of the internal layers&#39; outputs.</p><p>Maxout over linear dense layers satisfies the univeral approximation theorem.</p><p>Reference: Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio.</p><ol><li>Maxout networks.</li></ol><p>In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML&#39;13), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. JMLR.org III-1319-III-1327. https://arxiv.org/pdf/1302.4389.pdf</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/basic.jl#L149-L164">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.SkipConnection" href="#Flux.SkipConnection"><code>Flux.SkipConnection</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">SkipConnection(layers, connection)</code></pre><p>Creates a Skip Connection, of a layer or <code>Chain</code> of consecutive layers plus a shortcut connection. The connection function will combine the result of the layers with the original input, to give the final output.</p><p>The simplest &#39;ResNet&#39;-type connection is just <code>SkipConnection(layer, +)</code>, and requires the output of the layers to be the same shape as the input. Here is a more complicated example:</p><pre><code class="language-none">m = Conv((3,3), 4=&gt;7, pad=(1,1))
 x = ones(5,5,4,10);
 size(m(x)) == (5, 5, 7, 10)

 sm = SkipConnection(m, (mx, x) -&gt; cat(mx, x, dims=3))
-size(sm(x)) == (5, 5, 11, 10)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/basic.jl#L196-L214">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">elu(x, α = 1) =
+size(sm(x)) == (5, 5, 11, 10)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/basic.jl#L196-L214">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">elu(x, α = 1) =
  x &gt; 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">swish(x) = x * σ(x)</code></pre><p>Self-gated actvation function. See <a href="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div></div></section><h2><a class="nav-anchor" id="Normalisation-and-Regularisation-1" href="#Normalisation-and-Regularisation-1">Normalisation &amp; Regularisation</a></h2><p>These layers don&#39;t affect the structure of the network but may improve training times or reduce overfitting.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">BatchNorm(channels::Integer, σ = identity;
          initβ = zeros, initγ = ones,
          ϵ = 1e-8, momentum = .1)</code></pre><p>Batch Normalization layer. The <code>channels</code> input should be the size of the channel dimension in your data (see below).</p><p>Given an array with <code>N</code> dimensions, call the <code>N-1</code>th the channel dimension. (For a batch of feature vectors this is just the data dimension, for <code>WHCN</code> images it&#39;s the usual channel dimension.)</p><p><code>BatchNorm</code> computes the mean and variance for each each <code>W×H×1×N</code> slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel <code>bias</code> and <code>scale</code> parameters).</p><p>See <a href="https://arxiv.org/pdf/1502.03167.pdf">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</a>.</p><p>Example:</p><pre><code class="language-julia">m = Chain(
@ -46,7 +46,7 @@ size(sm(x)) == (5, 5, 11, 10)</code></pre></div></div><a class="source-link" tar
  BatchNorm(64, relu),
  Dense(64, 10),
  BatchNorm(10),
-  softmax)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/normalise.jl#L93-L121">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Dropout(p, dims = :)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. The <code>dims</code> argument is to specified the unbroadcasted  dimensions, i.e. <code>dims=1</code> does dropout along columns and <code>dims=2</code> along rows. This is  used as a regularisation, i.e. it reduces overfitting during training. see also <a href="models/@ref"><code>dropout</code></a>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/normalise.jl#L18-L25">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.AlphaDropout" href="#Flux.AlphaDropout"><code>Flux.AlphaDropout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AlphaDropout(p)</code></pre><p>A dropout layer. It is used in Self-Normalizing Neural Networks. (https://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf) The AlphaDropout layer ensures that mean and variance of activations remains the same as before.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/normalise.jl#L44-L49">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">LayerNorm(h::Integer)</code></pre><p>A <a href="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/normalise.jl#L71-L77">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GroupNorm" href="#Flux.GroupNorm"><code>Flux.GroupNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>Group Normalization. This layer can outperform Batch-Normalization and Instance-Normalization.</p><pre><code class="language-none">GroupNorm(chs::Integer, G::Integer, λ = identity;
+  softmax)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/normalise.jl#L93-L121">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Dropout(p, dims = :)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. The <code>dims</code> argument is to specified the unbroadcasted  dimensions, i.e. <code>dims=1</code> does dropout along columns and <code>dims=2</code> along rows. This is  used as a regularisation, i.e. it reduces overfitting during training. see also <a href="models/@ref"><code>dropout</code></a>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/normalise.jl#L18-L25">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.AlphaDropout" href="#Flux.AlphaDropout"><code>Flux.AlphaDropout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AlphaDropout(p)</code></pre><p>A dropout layer. It is used in Self-Normalizing Neural Networks. (https://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf) The AlphaDropout layer ensures that mean and variance of activations remains the same as before.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/normalise.jl#L44-L49">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">LayerNorm(h::Integer)</code></pre><p>A <a href="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/normalise.jl#L71-L77">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GroupNorm" href="#Flux.GroupNorm"><code>Flux.GroupNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>Group Normalization. This layer can outperform Batch-Normalization and Instance-Normalization.</p><pre><code class="language-none">GroupNorm(chs::Integer, G::Integer, λ = identity;
          initβ = (i) -&gt; zeros(Float32, i), initγ = (i) -&gt; ones(Float32, i),
          ϵ = 1f-5, momentum = 0.1f0)</code></pre><p><span>$chs$</span> is the number of channels, the channel dimension of your input. For an array of N dimensions, the (N-1)th index is the channel dimension.</p><p><span>$G$</span> is the number of groups along which the statistics would be computed. The number of channels must be an integer multiple of the number of groups.</p><p>Example:</p><pre><code class="language-none">m = Chain(Conv((3,3), 1=&gt;32, leakyrelu;pad = 1),
-          GroupNorm(32,16)) # 32 channels, 16 groups (G = 16), thus 2 channels per group used</code></pre><p>Link : https://arxiv.org/pdf/1803.08494.pdf</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/layers/normalise.jl#L272-L293">source</a></section><footer><hr/><a class="previous" href="../regularisation/"><span class="direction">Previous</span><span class="title">Regularisation</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
+          GroupNorm(32,16)) # 32 channels, 16 groups (G = 16), thus 2 channels per group used</code></pre><p>Link : https://arxiv.org/pdf/1803.08494.pdf</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/layers/normalise.jl#L272-L293">source</a></section><footer><hr/><a class="previous" href="../regularisation/"><span class="direction">Previous</span><span class="title">Regularisation</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
--- a/dev/training/optimisers/index.html
+++ b/dev/training/optimisers/index.html
@ -37,23 +37,23 @@ gs = gradient(ps) do
  loss(x, y)
 end

-Flux.Optimise.update!(opt, ps, gs)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L9-L32">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Momentum(η, ρ)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (<code>η</code>): Amount by which gradients are discounted before updating the weights. Defaults to <code>0.01</code>.</li><li>Momentum (<code>ρ</code>): Parameter that accelerates descent in the relevant direction and dampens oscillations. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Momentum() # uses defaults of η = 0.01 and ρ = 0.9
+Flux.Optimise.update!(opt, ps, gs)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L9-L32">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Momentum(η, ρ)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (<code>η</code>): Amount by which gradients are discounted before updating the weights. Defaults to <code>0.01</code>.</li><li>Momentum (<code>ρ</code>): Parameter that accelerates descent in the relevant direction and dampens oscillations. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Momentum() # uses defaults of η = 0.01 and ρ = 0.9

-opt = Momentum(0.01, 0.99)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L43-L58">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Nesterov(η, ρ)</code></pre><p>Gradient descent with learning rate  <code>η</code> and Nesterov momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Amount by which the gradients are dicsounted berfore updating the weights. Defaults to <code>0.001</code>.</li><li>Nesterov Momentum (ρ): Paramters controlling the amount of nesterov momentum to be applied. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Nesterov() # uses defaults η = 0.001 and ρ = 0.9
+opt = Momentum(0.01, 0.99)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L43-L58">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Nesterov(η, ρ)</code></pre><p>Gradient descent with learning rate  <code>η</code> and Nesterov momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Amount by which the gradients are dicsounted berfore updating the weights. Defaults to <code>0.001</code>.</li><li>Nesterov Momentum (ρ): Paramters controlling the amount of nesterov momentum to be applied. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Nesterov() # uses defaults η = 0.001 and ρ = 0.9

-opt = Nesterov(0.003, 0.95)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L74-L89">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.RMSProp" href="#Flux.Optimise.RMSProp"><code>Flux.Optimise.RMSProp</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">RMSProp(η, ρ)</code></pre><p>Implements the RMSProp algortihm. Often a good choice for recurrent networks. Paramters other than learning rate generally don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Rho (ρ): Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = RMSProp() # uses default η = 0.001 and ρ = 0.9
+opt = Nesterov(0.003, 0.95)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L74-L89">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.RMSProp" href="#Flux.Optimise.RMSProp"><code>Flux.Optimise.RMSProp</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">RMSProp(η, ρ)</code></pre><p>Implements the RMSProp algortihm. Often a good choice for recurrent networks. Paramters other than learning rate generally don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Rho (ρ): Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = RMSProp() # uses default η = 0.001 and ρ = 0.9

-opt = RMSProp(0.002, 0.95)</code></pre><p><strong>References</strong></p><p><a href="https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L106-L124">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAM(η, β::Tuple)</code></pre><p>Implements the ADAM optimiser.</p><p><strong>Paramters</strong></p><ul><li>Learning Rate (<code>η</code>): Defaults to <code>0.001</code>.</li><li>Beta (<code>β::Tuple</code>): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAM() # uses the default η = 0.001 and β = (0.9, 0.999)
+opt = RMSProp(0.002, 0.95)</code></pre><p><strong>References</strong></p><p><a href="https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L106-L124">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAM(η, β::Tuple)</code></pre><p>Implements the ADAM optimiser.</p><p><strong>Paramters</strong></p><ul><li>Learning Rate (<code>η</code>): Defaults to <code>0.001</code>.</li><li>Beta (<code>β::Tuple</code>): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAM() # uses the default η = 0.001 and β = (0.9, 0.999)

-opt = ADAM(0.001, (0.9, 0.8))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L140-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AdaMax" href="#Flux.Optimise.AdaMax"><code>Flux.Optimise.AdaMax</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AdaMax(η, β::Tuple)</code></pre><p>Variant of ADAM based on ∞-norm.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code></li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AdaMax() # uses default η and β
+opt = ADAM(0.001, (0.9, 0.8))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L140-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AdaMax" href="#Flux.Optimise.AdaMax"><code>Flux.Optimise.AdaMax</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AdaMax(η, β::Tuple)</code></pre><p>Variant of ADAM based on ∞-norm.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code></li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AdaMax() # uses default η and β

-opt = AdaMax(0.001, (0.9, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v9">AdaMax</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L222-L239">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAGrad" href="#Flux.Optimise.ADAGrad"><code>Flux.Optimise.ADAGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAGrad(η)</code></pre><p>Implements AdaGrad. It has parameter specific learning rates based on how frequently it is updated.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.1</code></li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAGrad() # uses default η = 0.1
+opt = AdaMax(0.001, (0.9, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v9">AdaMax</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L222-L239">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAGrad" href="#Flux.Optimise.ADAGrad"><code>Flux.Optimise.ADAGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAGrad(η)</code></pre><p>Implements AdaGrad. It has parameter specific learning rates based on how frequently it is updated.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.1</code></li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAGrad() # uses default η = 0.1

-opt = ADAGrad(0.001)</code></pre><p><strong>References</strong></p><p><a href="http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf">ADAGrad</a> optimiser. Parameters don&#39;t need tuning.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L258-L276">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADADelta" href="#Flux.Optimise.ADADelta"><code>Flux.Optimise.ADADelta</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADADelta(ρ)</code></pre><p>Version of ADAGrad that adapts learning rate based on a window of past gradient updates. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Rho (ρ): Factor by which gradient is decayed at each time step. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADADelta() # uses default ρ = 0.9
-opt = ADADelta(0.89)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1212.5701">ADADelta</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L291-L307">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AMSGrad" href="#Flux.Optimise.AMSGrad"><code>Flux.Optimise.AMSGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AMSGrad(η, β::Tuple)</code></pre><p>Implements AMSGrad version of the ADAM optimiser. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AMSGrad() # uses default η and β
-opt = AMSGrad(0.001, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://openreview.net/forum?id=ryQu7f-RZ">AMSGrad</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L324-L341">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.NADAM" href="#Flux.Optimise.NADAM"><code>Flux.Optimise.NADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">NADAM(η, β::Tuple)</code></pre><p>Nesterov variant of ADAM. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = NADAM() # uses default η and β
-opt = NADAM(0.002, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="http://cs229.stanford.edu/proj2015/054_report.pdf">NADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L359-L376">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAMW" href="#Flux.Optimise.ADAMW"><code>Flux.Optimise.ADAMW</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">ADAMW(η, β::Tuple, decay)</code></pre><p>Variant of ADAM defined by fixing weight decay regularization.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to (0.9, 0.999).</li><li>decay: Decay applied to weights during optimisation. Defaults to 0.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAMW() # uses default η, β and decay
-opt = ADAMW(0.001, (0.89, 0.995), 0.1)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1711.05101">ADAMW</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L395-L413">source</a></section><h2><a class="nav-anchor" id="Optimiser-Interface-1" href="#Optimiser-Interface-1">Optimiser Interface</a></h2><p>Flux&#39;s optimsers are built around a <code>struct</code> that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the <code>apply!</code> function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.</p><p>In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let&#39;s work this with a simple example.</p><pre><code class="language-julia">mutable struct Momentum
+opt = ADAGrad(0.001)</code></pre><p><strong>References</strong></p><p><a href="http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf">ADAGrad</a> optimiser. Parameters don&#39;t need tuning.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L258-L276">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADADelta" href="#Flux.Optimise.ADADelta"><code>Flux.Optimise.ADADelta</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADADelta(ρ)</code></pre><p>Version of ADAGrad that adapts learning rate based on a window of past gradient updates. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Rho (ρ): Factor by which gradient is decayed at each time step. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADADelta() # uses default ρ = 0.9
+opt = ADADelta(0.89)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1212.5701">ADADelta</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L291-L307">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AMSGrad" href="#Flux.Optimise.AMSGrad"><code>Flux.Optimise.AMSGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AMSGrad(η, β::Tuple)</code></pre><p>Implements AMSGrad version of the ADAM optimiser. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AMSGrad() # uses default η and β
+opt = AMSGrad(0.001, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://openreview.net/forum?id=ryQu7f-RZ">AMSGrad</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L324-L341">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.NADAM" href="#Flux.Optimise.NADAM"><code>Flux.Optimise.NADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">NADAM(η, β::Tuple)</code></pre><p>Nesterov variant of ADAM. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = NADAM() # uses default η and β
+opt = NADAM(0.002, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="http://cs229.stanford.edu/proj2015/054_report.pdf">NADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L359-L376">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAMW" href="#Flux.Optimise.ADAMW"><code>Flux.Optimise.ADAMW</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">ADAMW(η, β::Tuple, decay)</code></pre><p>Variant of ADAM defined by fixing weight decay regularization.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to (0.9, 0.999).</li><li>decay: Decay applied to weights during optimisation. Defaults to 0.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAMW() # uses default η, β and decay
+opt = ADAMW(0.001, (0.89, 0.995), 0.1)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1711.05101">ADAMW</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L395-L413">source</a></section><h2><a class="nav-anchor" id="Optimiser-Interface-1" href="#Optimiser-Interface-1">Optimiser Interface</a></h2><p>Flux&#39;s optimsers are built around a <code>struct</code> that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the <code>apply!</code> function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.</p><p>In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let&#39;s work this with a simple example.</p><pre><code class="language-julia">mutable struct Momentum
  eta
  rho
  velocity
@ -81,4 +81,4 @@ end

 loss(rand(10)) # around 0.9</code></pre><p>In this manner it is possible to compose optimisers for some added flexibility.</p><h2><a class="nav-anchor" id="Decays-1" href="#Decays-1">Decays</a></h2><p>Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ExpDecay" href="#Flux.Optimise.ExpDecay"><code>Flux.Optimise.ExpDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>ExpDecay(eta, decay, decay_step, clip)</p><p>Discount the learning rate <code>eta</code> by <code>decay</code> every <code>decay_step</code> till a minimum of <code>clip</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (eta): Defaults to <code>0.001</code>.</li><li>decay: Factor by which the learning rate is discounted. Defaults to <code>0.1</code>.</li><li>decay_step: Schedules decay operations by setting number of steps between two decay operations. Defaults to <code>1000</code>.</li><li>clip: Minimum value of learning rate. Defaults to <code>1e-4</code>.</li></ul><p><strong>Example</strong></p><p>To apply exponential decay to an optimiser:</p><pre><code class="language-julia">  Optimiser(ExpDecay(..), Opt(..))

-  opt = Optimiser(ExpDecay(), ADAM())</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L472-L490">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.InvDecay" href="#Flux.Optimise.InvDecay"><code>Flux.Optimise.InvDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>InvDecay(γ)</p><p>Applies inverse time decay to an optimiser</p><p><strong>Parameters</strong></p><ul><li>gamma (γ): Defaults to <code>0.001</code></li></ul><p><strong>Example</strong></p><pre><code class="language-julia">  Optimiser(InvDecay(..), Opt(..))</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L444-L456">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.WeightDecay" href="#Flux.Optimise.WeightDecay"><code>Flux.Optimise.WeightDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>WeightDecay(wd)</p><p>Decays the weight by <code>wd</code></p><p><strong>Parameters</strong></p><ul><li>weight decay (wd): 0</li></ul></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ab450477f31e16a6ca001ce88ba9e7a353904835/src/optimise/optimisers.jl#L511-L518">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
+  opt = Optimiser(ExpDecay(), ADAM())</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L472-L490">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.InvDecay" href="#Flux.Optimise.InvDecay"><code>Flux.Optimise.InvDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>InvDecay(γ)</p><p>Applies inverse time decay to an optimiser</p><p><strong>Parameters</strong></p><ul><li>gamma (γ): Defaults to <code>0.001</code></li></ul><p><strong>Example</strong></p><pre><code class="language-julia">  Optimiser(InvDecay(..), Opt(..))</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L444-L456">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.WeightDecay" href="#Flux.Optimise.WeightDecay"><code>Flux.Optimise.WeightDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>WeightDecay(wd)</p><p>Decays the weight by <code>wd</code></p><p><strong>Parameters</strong></p><ul><li>weight decay (wd): 0</li></ul></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/99f98ca800ff959ab8a5e9c34758eb2a6f3ad00d/src/optimise/optimisers.jl#L511-L518">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>