build based on e1cac76

This commit is contained in:
zeptodoctor 2019-01-28 14:36:26 +00:00
parent 4bb0d054eb
commit bacbecc76d
4 changed files with 14 additions and 15 deletions

View File

@ -21,16 +21,15 @@ d2f(x) = Tracker.gradient(df, x; nest = true)[1]
d2f(2) # 6.0 (tracked)</code></pre><p>(We&#39;ll learn more about why these numbers show up as <code>(tracked)</code> below.)</p><p>When a function has many parameters, we can pass them all in explicitly:</p><pre><code class="language-julia">f(W, b, x) = W * x + b
Tracker.gradient(f, 2, 3, 4)
(4.0 (tracked), 1.0 (tracked), 2.0 (tracked))</code></pre><p>But machine learning models can have <em>hundreds</em> of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via <code>param</code>. Then we can collect these together and tell <code>gradient</code> to collect the gradients of all of them at once.</p><pre><code class="language-julia">W = param(2) # 2.0 (tracked)
# (4.0 (tracked), 1.0 (tracked), 2.0 (tracked))</code></pre><p>But machine learning models can have <em>hundreds</em> of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via <code>param</code>. Then we can collect these together and tell <code>gradient</code> to collect the gradients of all <code>params</code> at once.</p><pre><code class="language-julia">W = param(2) # 2.0 (tracked)
b = param(3) # 3.0 (tracked)
f(x) = W * x + b
params = Params([W, b])
grads = Tracker.gradient(() -&gt; f(4), params)
grads = Tracker.gradient(() -&gt; f(4), params(W, b))
grads[W] # 4.0
grads[b] # 1.0</code></pre><p>There are a few things to notice here. Firstly, <code>W</code> and <code>b</code> now show up as <em>tracked</em>. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. <code>gradient</code> takes a zero-argument function; no arguments are necessary because the <code>Params</code> tell it what to differentiate.</p><p>This will come in really handy when dealing with big, complicated models. For now, though, let&#39;s start with something simple.</p><h2><a class="nav-anchor" id="Simple-Models-1" href="#Simple-Models-1">Simple Models</a></h2><p>Consider a simple linear regression, which tries to predict an output array <code>y</code> from an input <code>x</code>.</p><pre><code class="language-julia">W = rand(2, 5)
grads[b] # 1.0</code></pre><p>There are a few things to notice here. Firstly, <code>W</code> and <code>b</code> now show up as <em>tracked</em>. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. <code>gradient</code> takes a zero-argument function; no arguments are necessary because the <code>params</code> tell it what to differentiate.</p><p>This will come in really handy when dealing with big, complicated models. For now, though, let&#39;s start with something simple.</p><h2><a class="nav-anchor" id="Simple-Models-1" href="#Simple-Models-1">Simple Models</a></h2><p>Consider a simple linear regression, which tries to predict an output array <code>y</code> from an input <code>x</code>.</p><pre><code class="language-julia">W = rand(2, 5)
b = rand(2)
predict(x) = W*x .+ b
@ -46,7 +45,7 @@ loss(x, y) # ~ 3</code></pre><p>To improve the prediction we can take the gradie
W = param(W)
b = param(b)
gs = Tracker.gradient(() -&gt; loss(x, y), Params([W, b]))</code></pre><p>Now that we have gradients, we can pull them out and update <code>W</code> to train the model. The <code>update!(W, Δ)</code> function applies <code>W = W + Δ</code>, which we can use for gradient descent.</p><pre><code class="language-julia">using Flux.Tracker: update!
gs = Tracker.gradient(() -&gt; loss(x, y), params(W, b))</code></pre><p>Now that we have gradients, we can pull them out and update <code>W</code> to train the model. The <code>update!(W, Δ)</code> function applies <code>W = W + Δ</code>, which we can use for gradient descent.</p><pre><code class="language-julia">using Flux.Tracker: update!
Δ = gs[W]

View File

@ -11,28 +11,28 @@ m(5) == 26
m = Chain(Dense(10, 5), Dense(5, 2))
x = rand(10)
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
Dense(5, 2)
julia&gt; d(rand(5))
Tracked 2-element Array{Float64,1}:
0.00257447
-0.00449443</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/basic.jl#L45-L64">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Conv(size, in=&gt;out)
Conv(size, in=&gt;out, relu)</code></pre><p>Standard convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/conv.jl#L8-L19">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/conv.jl#L111-L117">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/conv.jl#L133-L139">source</a></section><h2><a class="nav-anchor" id="Additional-Convolution-Layers-1" href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></h2><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">DepthwiseConv(size, in)
-0.00449443</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/basic.jl#L45-L64">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Conv(size, in=&gt;out)
Conv(size, in=&gt;out, relu)</code></pre><p>Standard convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/conv.jl#L8-L19">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/conv.jl#L111-L117">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/conv.jl#L133-L139">source</a></section><h2><a class="nav-anchor" id="Additional-Convolution-Layers-1" href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></h2><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">DepthwiseConv(size, in)
DepthwiseConv(size, in=&gt;mul)
DepthwiseConv(size, in=&gt;mul, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>mul</code> specify the number of input channels and channel multiplier respectively. In case the <code>mul</code> is not specified it is taken as 1.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/conv.jl#L60-L73">source</a></section><h2><a class="nav-anchor" id="Recurrent-Layers-1" href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/recurrent.jl#L105-L110">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/recurrent.jl#L150-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/recurrent.jl#L191-L199">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
DepthwiseConv(size, in=&gt;mul, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>mul</code> specify the number of input channels and channel multiplier respectively. In case the <code>mul</code> is not specified it is taken as 1.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/conv.jl#L60-L73">source</a></section><h2><a class="nav-anchor" id="Recurrent-Layers-1" href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/recurrent.jl#L105-L110">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/recurrent.jl#L150-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/recurrent.jl#L191-L199">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
rnn = Flux.Recur(accum, 0)
rnn(2) # 2
rnn(3) # 3
rnn.state # 5
rnn.(1:10) # apply to a sequence
rnn.state # 60</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/recurrent.jl#L7-L26">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">elu(x, α = 1) =
rnn.state # 60</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/recurrent.jl#L7-L26">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">elu(x, α = 1) =
x &gt; 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">swish(x) = x * σ(x)</code></pre><p>Self-gated actvation function. See <a href="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div></div></section><h2><a class="nav-anchor" id="Normalisation-and-Regularisation-1" href="#Normalisation-and-Regularisation-1">Normalisation &amp; Regularisation</a></h2><p>These layers don&#39;t affect the structure of the network but may improve training times or reduce overfitting.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.testmode!" href="#Flux.testmode!"><code>Flux.testmode!</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">testmode!(m)
testmode!(m, false)</code></pre><p>Put layers like <a href="#Flux.Dropout"><code>Dropout</code></a> and <a href="#Flux.BatchNorm"><code>BatchNorm</code></a> into testing mode (or back to training mode with <code>false</code>).</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/normalise.jl#L1-L7">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">BatchNorm(channels::Integer, σ = identity;
testmode!(m, false)</code></pre><p>Put layers like <a href="#Flux.Dropout"><code>Dropout</code></a> and <a href="#Flux.BatchNorm"><code>BatchNorm</code></a> into testing mode (or back to training mode with <code>false</code>).</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/normalise.jl#L1-L7">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">BatchNorm(channels::Integer, σ = identity;
initβ = zeros, initγ = ones,
ϵ = 1e-8, momentum = .1)</code></pre><p>Batch Normalization layer. The <code>channels</code> input should be the size of the channel dimension in your data (see below).</p><p>Given an array with <code>N</code> dimensions, call the <code>N-1</code>th the channel dimension. (For a batch of feature vectors this is just the data dimension, for <code>WHCN</code> images it&#39;s the usual channel dimension.)</p><p><code>BatchNorm</code> computes the mean and variance for each each <code>W×H×1×N</code> slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel <code>bias</code> and <code>scale</code> parameters).</p><p>See <a href="https://arxiv.org/pdf/1502.03167.pdf">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</a>.</p><p>Example:</p><pre><code class="language-julia">m = Chain(
Dense(28^2, 64),
BatchNorm(64, relu),
Dense(64, 10),
BatchNorm(10),
softmax)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/normalise.jl#L68-L96">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dropout(p)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. This is used as a regularisation, i.e. it reduces overfitting during training.</p><p>Does nothing to the input once in <a href="#Flux.testmode!"><code>testmode!</code></a>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/normalise.jl#L15-L23">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">LayerNorm(h::Integer)</code></pre><p>A <a href="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/layers/normalise.jl#L46-L52">source</a></section><footer><hr/><a class="previous" href="../regularisation/"><span class="direction">Previous</span><span class="title">Regularisation</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
softmax)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/normalise.jl#L68-L96">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dropout(p)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. This is used as a regularisation, i.e. it reduces overfitting during training.</p><p>Does nothing to the input once in <a href="#Flux.testmode!"><code>testmode!</code></a>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/normalise.jl#L15-L23">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">LayerNorm(h::Integer)</code></pre><p>A <a href="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/layers/normalise.jl#L46-L52">source</a></section><footer><hr/><a class="previous" href="../regularisation/"><span class="direction">Previous</span><span class="title">Regularisation</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>

View File

@ -53,7 +53,7 @@ var documenterSearchIndex = {"docs": [
"page": "Basics",
"title": "Taking Gradients",
"category": "section",
"text": "Flux\'s core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument. (It\'s a good idea to try pasting these examples in the Julia terminal.)using Flux.Tracker\n\nf(x) = 3x^2 + 2x + 1\n\n# df/dx = 6x + 2\ndf(x) = Tracker.gradient(f, x; nest = true)[1]\n\ndf(2) # 14.0 (tracked)\n\n# d²f/dx² = 6\nd2f(x) = Tracker.gradient(df, x; nest = true)[1]\n\nd2f(2) # 6.0 (tracked)(We\'ll learn more about why these numbers show up as (tracked) below.)When a function has many parameters, we can pass them all in explicitly:f(W, b, x) = W * x + b\n\nTracker.gradient(f, 2, 3, 4)\n(4.0 (tracked), 1.0 (tracked), 2.0 (tracked))But machine learning models can have hundreds of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via param. Then we can collect these together and tell gradient to collect the gradients of all of them at once.W = param(2) # 2.0 (tracked)\nb = param(3) # 3.0 (tracked)\n\nf(x) = W * x + b\n\nparams = Params([W, b])\ngrads = Tracker.gradient(() -> f(4), params)\n\ngrads[W] # 4.0\ngrads[b] # 1.0There are a few things to notice here. Firstly, W and b now show up as tracked. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. gradient takes a zero-argument function; no arguments are necessary because the Params tell it what to differentiate.This will come in really handy when dealing with big, complicated models. For now, though, let\'s start with something simple."
"text": "Flux\'s core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument. (It\'s a good idea to try pasting these examples in the Julia terminal.)using Flux.Tracker\n\nf(x) = 3x^2 + 2x + 1\n\n# df/dx = 6x + 2\ndf(x) = Tracker.gradient(f, x; nest = true)[1]\n\ndf(2) # 14.0 (tracked)\n\n# d²f/dx² = 6\nd2f(x) = Tracker.gradient(df, x; nest = true)[1]\n\nd2f(2) # 6.0 (tracked)(We\'ll learn more about why these numbers show up as (tracked) below.)When a function has many parameters, we can pass them all in explicitly:f(W, b, x) = W * x + b\n\nTracker.gradient(f, 2, 3, 4)\n# (4.0 (tracked), 1.0 (tracked), 2.0 (tracked))But machine learning models can have hundreds of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via param. Then we can collect these together and tell gradient to collect the gradients of all params at once.W = param(2) # 2.0 (tracked)\nb = param(3) # 3.0 (tracked)\n\nf(x) = W * x + b\n\ngrads = Tracker.gradient(() -> f(4), params(W, b))\n\ngrads[W] # 4.0\ngrads[b] # 1.0There are a few things to notice here. Firstly, W and b now show up as tracked. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. gradient takes a zero-argument function; no arguments are necessary because the params tell it what to differentiate.This will come in really handy when dealing with big, complicated models. For now, though, let\'s start with something simple."
},
{
@ -61,7 +61,7 @@ var documenterSearchIndex = {"docs": [
"page": "Basics",
"title": "Simple Models",
"category": "section",
"text": "Consider a simple linear regression, which tries to predict an output array y from an input x.W = rand(2, 5)\nb = rand(2)\n\npredict(x) = W*x .+ b\n\nfunction loss(x, y)\n ŷ = predict(x)\n sum((y .- ŷ).^2)\nend\n\nx, y = rand(5), rand(2) # Dummy data\nloss(x, y) # ~ 3To improve the prediction we can take the gradients of W and b with respect to the loss and perform gradient descent. Let\'s tell Flux that W and b are parameters, just like we did above.using Flux.Tracker\n\nW = param(W)\nb = param(b)\n\ngs = Tracker.gradient(() -> loss(x, y), Params([W, b]))Now that we have gradients, we can pull them out and update W to train the model. The update!(W, Δ) function applies W = W + Δ, which we can use for gradient descent.using Flux.Tracker: update!\n\nΔ = gs[W]\n\n# Update the parameter and reset the gradient\nupdate!(W, -0.1Δ)\n\nloss(x, y) # ~ 2.5The loss has decreased a little, meaning that our prediction x is closer to the target y. If we have some data we can already try training the model.All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can look very different they might have millions of parameters or complex control flow. Let\'s see how Flux handles more complex models."
"text": "Consider a simple linear regression, which tries to predict an output array y from an input x.W = rand(2, 5)\nb = rand(2)\n\npredict(x) = W*x .+ b\n\nfunction loss(x, y)\n ŷ = predict(x)\n sum((y .- ŷ).^2)\nend\n\nx, y = rand(5), rand(2) # Dummy data\nloss(x, y) # ~ 3To improve the prediction we can take the gradients of W and b with respect to the loss and perform gradient descent. Let\'s tell Flux that W and b are parameters, just like we did above.using Flux.Tracker\n\nW = param(W)\nb = param(b)\n\ngs = Tracker.gradient(() -> loss(x, y), params(W, b))Now that we have gradients, we can pull them out and update W to train the model. The update!(W, Δ) function applies W = W + Δ, which we can use for gradient descent.using Flux.Tracker: update!\n\nΔ = gs[W]\n\n# Update the parameter and reset the gradient\nupdate!(W, -0.1Δ)\n\nloss(x, y) # ~ 2.5The loss has decreased a little, meaning that our prediction x is closer to the target y. If we have some data we can already try training the model.All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can look very different they might have millions of parameters or complex control flow. Let\'s see how Flux handles more complex models."
},
{

View File

@ -27,4 +27,4 @@ end</code></pre><p>Running this will alter the parameters <code>W</code> and <co
for p in (W, b)
update!(opt, p, -η * grads[p])
end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <a href="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Descent" href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Descent(η)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/optimise/optimisers.jl#L9-L14">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/optimise/optimisers.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Nesterov(eta, ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and Nesterov momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/optimise/optimisers.jl#L45-L49">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">ADAM(η = 0.001, β = (0.9, 0.999))</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/bf0b5c5ceff59ab4a8f9a877b4cf30384b780edd/src/optimise/optimisers.jl#L88-L92">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <a href="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Descent" href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Descent(η)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/optimise/optimisers.jl#L9-L14">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/optimise/optimisers.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Nesterov(eta, ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and Nesterov momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/optimise/optimisers.jl#L45-L49">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a><span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">ADAM(η = 0.001, β = (0.9, 0.999))</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/e1cac76a34b22ea5c921c06fbe684229062ecd64/src/optimise/optimisers.jl#L88-L92">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>