</script><linkhref="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css"rel="stylesheet"type="text/css"/><linkhref="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono"rel="stylesheet"type="text/css"/><linkhref="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css"rel="stylesheet"type="text/css"/><linkhref="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css"rel="stylesheet"type="text/css"/><script>documenterBaseURL="../.."</script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js"data-main="../../assets/documenter.js"></script><scriptsrc="../../siteinfo.js"></script><scriptsrc="../../../versions.js"></script><linkhref="../../assets/documenter.css"rel="stylesheet"type="text/css"/><linkhref="../../assets/flux.css"rel="stylesheet"type="text/css"/></head><body><navclass="toc"><h1>Flux</h1><selectid="version-selector"onChange="window.location.href=this.value"style="visibility: hidden"></select><formclass="search"id="search-form"action="../../search/"><inputid="search-query"name="q"type="text"placeholder="Search docs"/></form><ul><li><aclass="toctext"href="../../">Home</a></li><li><spanclass="toctext">Building Models</span><ul><li><aclass="toctext"href="../basics/">Basics</a></li><li><aclass="toctext"href="../recurrence/">Recurrence</a></li><li><aclass="toctext"href="../regularisation/">Regularisation</a></li><liclass="current"><aclass="toctext"href>Model Reference</a><ulclass="internal"><li><aclass="toctext"href="#Basic-Layers-1">Basic Layers</a></li><li><aclass="toctext"href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></li><li><aclass="toctext"href="#Recurrent-Layers-1">Recurrent Layers</a></li><li><aclass="toctext"href="#Activation-Functions-1">Activation Functions</a></li><li><aclass="toctext"href="#Normalisation-and-Regularisation-1">Normalisation & Regularisation</a></li></ul></li></ul></li><li><spanclass="toctext">Training Models</span><ul><li><aclass="toctext"href="../../training/optimisers/">Optimisers</a></li><li><aclass="toctext"href="../../training/training/">Training</a></li></ul></li><li><aclass="toctext"href="../../data/onehot/">One-Hot Encoding</a></li><li><aclass="toctext"href="../../gpu/">GPU Support</a></li><li><aclass="toctext"href="../../saving/">Saving & Loading</a></li><li><aclass="toctext"href="../../performance/">Performance Tips</a></li><li><spanclass="toctext">Internals</span><ul><li><aclass="toctext"href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><aclass="toctext"href="../../community/">Community</a></li></ul></nav><articleid="docs"><header><nav><ul><li>Building Models</li><li><ahref>Model Reference</a></li></ul><aclass="edit-page"href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/layers.md"><spanclass="fa"></span> Edit on GitHub</a></nav><hr/><divid="topbar"><span>Model Reference</span><aclass="fa fa-bars"href="#"></a></div></header><h2><aclass="nav-anchor"id="Basic-Layers-1"href="#Basic-Layers-1">Basic Layers</a></h2><p>These core layers form the foundation of almost all neural networks.</p><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Chain"href="#Flux.Chain"><code>Flux.Chain</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">Chain(layers...)</code></pre><p>Chain multiple layers / functions together, so that they are called in sequence on a given input.</p><pre><codeclass="language-julia">m = Chain(x -> x^2, x -> x+1)
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/basic.jl#L1-L18">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Dense"href="#Flux.Dense"><code>Flux.Dense</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><codeclass="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><codeclass="language-julia">julia> d = Dense(5, 2)
Conv(size, in=>out, relu)</code></pre><p>Standard convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L8-L19">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.MaxPool"href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L159-L165">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.MeanPool"href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L181-L187">source</a></section><h2><aclass="nav-anchor"id="Additional-Convolution-Layers-1"href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></h2><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.DepthwiseConv"href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">DepthwiseConv(size, in)
DepthwiseConv(size, in=>mul, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>mul</code> specify the number of input channels and channel multiplier respectively. In case the <code>mul</code> is not specified it is taken as 1.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L108-L121">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.ConvTranspose"href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">ConvTranspose(size, in=>out)
ConvTranspose(size, in=>out, relu)</code></pre><p>Standard convolutional transpose layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively. Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array. Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L60-L69">source</a></section><h2><aclass="nav-anchor"id="Recurrent-Layers-1"href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.RNN"href="#Flux.RNN"><code>Flux.RNN</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L105-L110">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.LSTM"href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <ahref="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L150-L158">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.GRU"href="#Flux.GRU"><code>Flux.GRU</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <ahref="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L191-L199">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Recur"href="#Flux.Recur"><code>Flux.Recur</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><codeclass="language-none">h, y = cell(h, x...)</code></pre><p>For example, here's a recurrent network that keeps a running total of its inputs.</p><pre><codeclass="language-julia">accum(h, x) = (h+x, x)
rnn.state # 60</code></pre></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L7-L26">source</a></section><h2><aclass="nav-anchor"id="Activation-Functions-1"href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <ahref="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="NNlib.σ"href="#NNlib.σ"><code>NNlib.σ</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <ahref="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="NNlib.relu"href="#NNlib.relu"><code>NNlib.relu</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">relu(x) = max(0, x)</code></pre><p><ahref="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="NNlib.leakyrelu"href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <ahref="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="NNlib.elu"href="#NNlib.elu"><code>NNlib.elu</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">elu(x, α = 1) =
x > 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <ahref="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p></div></div></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="NNlib.swish"href="#NNlib.swish"><code>NNlib.swish</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">swish(x) = x * σ(x)</code></pre><p>Self-gated actvation function. See <ahref="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div></div></section><h2><aclass="nav-anchor"id="Normalisation-and-Regularisation-1"href="#Normalisation-and-Regularisation-1">Normalisation & Regularisation</a></h2><p>These layers don't affect the structure of the network but may improve training times or reduce overfitting.</p><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.testmode!"href="#Flux.testmode!"><code>Flux.testmode!</code></a> — <spanclass="docstring-category">Function</span>.</div><div><div><pre><codeclass="language-none">testmode!(m)
testmode!(m, false)</code></pre><p>Put layers like <ahref="#Flux.Dropout"><code>Dropout</code></a> and <ahref="#Flux.BatchNorm"><code>BatchNorm</code></a> into testing mode (or back to training mode with <code>false</code>).</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L1-L7">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.BatchNorm"href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">BatchNorm(channels::Integer, σ = identity;
ϵ = 1e-8, momentum = .1)</code></pre><p>Batch Normalization layer. The <code>channels</code> input should be the size of the channel dimension in your data (see below).</p><p>Given an array with <code>N</code> dimensions, call the <code>N-1</code>th the channel dimension. (For a batch of feature vectors this is just the data dimension, for <code>WHCN</code> images it's the usual channel dimension.)</p><p><code>BatchNorm</code> computes the mean and variance for each each <code>W×H×1×N</code> slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel <code>bias</code> and <code>scale</code> parameters).</p><p>See <ahref="https://arxiv.org/pdf/1502.03167.pdf">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</a>.</p><p>Example:</p><pre><codeclass="language-julia">m = Chain(
softmax)</code></pre></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L68-L96">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Dropout"href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">Dropout(p)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. This is used as a regularisation, i.e. it reduces overfitting during training.</p><p>Does nothing to the input once in <ahref="#Flux.testmode!"><code>testmode!</code></a>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L15-L23">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.LayerNorm"href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-none">LayerNorm(h::Integer)</code></pre><p>A <ahref="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L46-L52">source</a></section><footer><hr/><aclass="previous"href="../regularisation/"><spanclass="direction">Previous</span><spanclass="title">Regularisation</span></a><aclass="next"href="../../training/optimisers/"><spanclass="direction">Next</span><spanclass="title">Optimisers</span></a></footer></article></body></html>