build based on 193c4de

2018-09-05 16:02:11 +00:00 · 2018-09-05 16:02:11 +00:00 · c07144fe91
commit c07144fe91
parent f6d9776609
10 changed files with 79 additions and 58 deletions
--- a/latest/assets/search.js
+++ b/latest/assets/search.js
@ -39,7 +39,7 @@ parseUri.options = {
 requirejs.config({
    paths: {
        'jquery': 'https://cdnjs.cloudflare.com/ajax/libs/jquery/3.1.1/jquery.min',
-        'lunr': 'https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.1.3/lunr.min',
+        'lunr': 'https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.1/lunr.min',
        'lodash': 'https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.4/lodash.min',
    }
 });
--- a/latest/data/onehot.html
+++ b/latest/data/onehot.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search.html"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../index.html">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics.html">Basics</a></li><li><a class="toctext" href="../models/recurrence.html">Recurrence</a></li><li><a class="toctext" href="../models/regularisation.html">Regularisation</a></li><li><a class="toctext" href="../models/layers.html">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers.html">Optimisers</a></li><li><a class="toctext" href="../training/training.html">Training</a></li></ul></li><li class="current"><a class="toctext" href="onehot.html">One-Hot Encoding</a><ul class="internal"><li><a class="toctext" href="#Batches-1">Batches</a></li></ul></li><li><a class="toctext" href="../gpu.html">GPU Support</a></li><li><a class="toctext" href="../saving.html">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker.html">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community.html">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href="onehot.html">One-Hot Encoding</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/onehot.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>One-Hot Encoding</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="One-Hot-Encoding-1" href="#One-Hot-Encoding-1">One-Hot Encoding</a></h1><p>It&#39;s common to encode categorical variables (like <code>true</code>, <code>false</code> or <code>cat</code>, <code>dog</code>) in &quot;one-of-k&quot; or <a href="https://en.wikipedia.org/wiki/One-hot">&quot;one-hot&quot;</a> form. Flux provides the <code>onehot</code> function to make this easy.</p><pre><code class="language-none">julia&gt; using Flux: onehot
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search.html"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../index.html">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics.html">Basics</a></li><li><a class="toctext" href="../models/recurrence.html">Recurrence</a></li><li><a class="toctext" href="../models/regularisation.html">Regularisation</a></li><li><a class="toctext" href="../models/layers.html">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers.html">Optimisers</a></li><li><a class="toctext" href="../training/training.html">Training</a></li></ul></li><li class="current"><a class="toctext" href="onehot.html">One-Hot Encoding</a><ul class="internal"><li><a class="toctext" href="#Batches-1">Batches</a></li></ul></li><li><a class="toctext" href="../gpu.html">GPU Support</a></li><li><a class="toctext" href="../saving.html">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker.html">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community.html">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href="onehot.html">One-Hot Encoding</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/onehot.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>One-Hot Encoding</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="One-Hot-Encoding-1" href="#One-Hot-Encoding-1">One-Hot Encoding</a></h1><p>It&#39;s common to encode categorical variables (like <code>true</code>, <code>false</code> or <code>cat</code>, <code>dog</code>) in &quot;one-of-k&quot; or <a href="https://en.wikipedia.org/wiki/One-hot">&quot;one-hot&quot;</a> form. Flux provides the <code>onehot</code> function to make this easy.</p><pre><code class="language-none">julia&gt; using Flux: onehot, onecold

 julia&gt; onehot(:b, [:a, :b, :c])
 3-element Flux.OneHotVector:
@ -18,14 +18,14 @@ julia&gt; onehot(:c, [:a, :b, :c])
 3-element Flux.OneHotVector:
 false
 false
-  true</code></pre><p>The inverse is <code>argmax</code> (which can take a general probability distribution, as well as just booleans).</p><pre><code class="language-julia">julia&gt; argmax(ans, [:a, :b, :c])
+  true</code></pre><p>The inverse is <code>onecold</code> (which can take a general probability distribution, as well as just booleans).</p><pre><code class="language-julia">julia&gt; onecold(ans, [:a, :b, :c])
 :c

-julia&gt; argmax([true, false, false], [:a, :b, :c])
+julia&gt; onecold([true, false, false], [:a, :b, :c])
 :a

-julia&gt; argmax([0.3, 0.2, 0.5], [:a, :b, :c])
-:c</code></pre><h2><a class="nav-anchor" id="Batches-1" href="#Batches-1">Batches</a></h2><p><code>onehotbatch</code> creates a batch (matrix) of one-hot vectors, and <code>argmax</code> treats matrices as batches.</p><pre><code class="language-julia">julia&gt; using Flux: onehotbatch
+julia&gt; onecold([0.3, 0.2, 0.5], [:a, :b, :c])
+:c</code></pre><h2><a class="nav-anchor" id="Batches-1" href="#Batches-1">Batches</a></h2><p><code>onehotbatch</code> creates a batch (matrix) of one-hot vectors, and <code>onecold</code> treats matrices as batches.</p><pre><code class="language-julia">julia&gt; using Flux: onehotbatch

 julia&gt; onehotbatch([:b, :a, :b], [:a, :b, :c])
 3×3 Flux.OneHotMatrix:
--- a/latest/index.html
+++ b/latest/index.html
--- a/latest/internals/tracker.html
+++ b/latest/internals/tracker.html
@ -51,7 +51,7 @@ minus(a::TrackedArray, b::TrackedArray) = Tracker.track(minus, a, b)</code></pre
  return minus(data(a),data(b)), Δ -&gt; (Δ, -Δ)
 end</code></pre><p>This is essentially just a way of overloading the <code>forward</code> function we saw above. We strip tracking from <code>a</code> and <code>b</code> so that we are calling the original definition of <code>minus</code> (otherwise, we&#39;d just try to track the call again and hit an infinite regress).</p><p>Note that in the backpropagator we don&#39;t call <code>data(a)</code>; we <em>do</em> in fact want to track this, since nest AD will take a derivative through the backpropagator itself. For example, the gradient of <code>*</code> might look like this.</p><pre><code class="language-julia">@grad a * b = data(a)*data(b), Δ -&gt; (Δ*b, a*Δ)</code></pre><p>For multi-argument functions with custom gradients, you likely want to catch not just <code>minus(::TrackedArray, ::TrackedArray)</code> but also <code>minus(::Array, TrackedArray)</code> and so on. To do so, just define those extra signatures as needed:</p><pre><code class="language-julia">minus(a::AbstractArray, b::TrackedArray) = Tracker.track(minus, a, b)
 minus(a::TrackedArray, b::AbstractArray) = Tracker.track(minus, a, b)</code></pre><h2><a class="nav-anchor" id="Tracked-Internals-1" href="#Tracked-Internals-1">Tracked Internals</a></h2><p>All <code>Tracked*</code> objects (<code>TrackedArray</code>, <code>TrackedReal</code>) are light wrappers around the <code>Tracked</code> type, which you can access via the <code>.tracker</code> field.</p><pre><code class="language-julia">julia&gt; x.tracker
-Flux.Tracker.Tracked{Array{Float64,1}}(0x00000000, Flux.Tracker.Call{Void,Tuple{}}(nothing, ()), true, [5.0, 6.0], [-2.0, -2.0])</code></pre><p>The <code>Tracker</code> stores the gradient of a given object, which we&#39;ve seen before.</p><pre><code class="language-julia">julia&gt; x.tracker.grad
+Flux.Tracker.Tracked{Array{Float64,1}}(0x00000000, Flux.Tracker.Call{Nothing,Tuple{}}(nothing, ()), true, [5.0, 6.0], [-2.0, -2.0])</code></pre><p>The <code>Tracker</code> stores the gradient of a given object, which we&#39;ve seen before.</p><pre><code class="language-julia">julia&gt; x.tracker.grad
 2-element Array{Float64,1}:
 -2.0
 -2.0</code></pre><p>The tracker also contains a <code>Call</code> object, which simply represents a function call that was made at some point during the forward pass. For example, the <code>+</code> call would look like this:</p><pre><code class="language-julia">julia&gt; Tracker.Call(+, 1, 2)
--- a/latest/models/basics.html
+++ b/latest/models/basics.html
@ -93,7 +93,7 @@ model(x) = layer3(layer2(layer1(x)))</code></pre><p>For long chains, it might be

 layers = [Dense(10, 5, σ), Dense(5, 2), softmax]

-model(x) = foldl((x, m) -&gt; m(x), x, layers)
+model(x) = foldl((x, m) -&gt; m(x), layers, init = x)

 model(rand(10)) # =&gt; 2-element vector</code></pre><p>Handily, this is also provided for in Flux:</p><pre><code class="language-julia">model2 = Chain(
  Dense(10, 5, σ),
@ -104,4 +104,4 @@ model2(rand(10)) # =&gt; 2-element vector</code></pre><p>This quickly starts to

 m(rand(10))</code></pre><p>Likewise, <code>Chain</code> will happily work with any Julia function.</p><pre><code class="language-julia">m = Chain(x -&gt; x^2, x -&gt; x+1)

-m(5) # =&gt; 26</code></pre><h2><a class="nav-anchor" id="Layer-helpers-1" href="#Layer-helpers-1">Layer helpers</a></h2><p>Flux provides a set of helpers for custom layers, which you can enable by calling</p><pre><code class="language-julia">Flux.treelike(Affine)</code></pre><p>This enables a useful extra set of functionality for our <code>Affine</code> layer, such as <a href="../training/optimisers.html">collecting its parameters</a> or <a href="../gpu.html">moving it to the GPU</a>.</p><footer><hr/><a class="previous" href="../index.html"><span class="direction">Previous</span><span class="title">Home</span></a><a class="next" href="recurrence.html"><span class="direction">Next</span><span class="title">Recurrence</span></a></footer></article></body></html>
+m(5) # =&gt; 26</code></pre><h2><a class="nav-anchor" id="Layer-helpers-1" href="#Layer-helpers-1">Layer helpers</a></h2><p>Flux provides a set of helpers for custom layers, which you can enable by calling</p><pre><code class="language-julia">Flux.@treelike Affine</code></pre><p>This enables a useful extra set of functionality for our <code>Affine</code> layer, such as <a href="../training/optimisers.html">collecting its parameters</a> or <a href="../gpu.html">moving it to the GPU</a>.</p><footer><hr/><a class="previous" href="../index.html"><span class="direction">Previous</span><span class="title">Home</span></a><a class="next" href="recurrence.html"><span class="direction">Next</span><span class="title">Recurrence</span></a></footer></article></body></html>
--- a/latest/models/layers.html
+++ b/latest/models/layers.html
--- a/latest/models/regularisation.html
+++ b/latest/models/regularisation.html
@ -6,21 +6,21 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search.html"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../index.html">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="basics.html">Basics</a></li><li><a class="toctext" href="recurrence.html">Recurrence</a></li><li class="current"><a class="toctext" href="regularisation.html">Regularisation</a><ul class="internal"></ul></li><li><a class="toctext" href="layers.html">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers.html">Optimisers</a></li><li><a class="toctext" href="../training/training.html">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot.html">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu.html">GPU Support</a></li><li><a class="toctext" href="../saving.html">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker.html">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community.html">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href="regularisation.html">Regularisation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/regularisation.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Regularisation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Regularisation-1" href="#Regularisation-1">Regularisation</a></h1><p>Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as <code>vecnorm</code>, to each model parameter and add the result to the overall loss.</p><p>For example, say we have a simple regression.</p><pre><code class="language-julia">using Flux: crossentropy
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search.html"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../index.html">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="basics.html">Basics</a></li><li><a class="toctext" href="recurrence.html">Recurrence</a></li><li class="current"><a class="toctext" href="regularisation.html">Regularisation</a><ul class="internal"></ul></li><li><a class="toctext" href="layers.html">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers.html">Optimisers</a></li><li><a class="toctext" href="../training/training.html">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot.html">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu.html">GPU Support</a></li><li><a class="toctext" href="../saving.html">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker.html">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community.html">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href="regularisation.html">Regularisation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/regularisation.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Regularisation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Regularisation-1" href="#Regularisation-1">Regularisation</a></h1><p>Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as <code>norm</code>, to each model parameter and add the result to the overall loss.</p><p>For example, say we have a simple regression.</p><pre><code class="language-julia">using Flux: crossentropy
 m = Dense(10, 5)
-loss(x, y) = crossentropy(softmax(m(x)), y)</code></pre><p>We can regularise this by taking the (L2) norm of the parameters, <code>m.W</code> and <code>m.b</code>.</p><pre><code class="language-julia">penalty() = vecnorm(m.W) + vecnorm(m.b)
-loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()</code></pre><p>When working with layers, Flux provides the <code>params</code> function to grab all parameters at once. We can easily penalise everything with <code>sum(vecnorm, params)</code>.</p><pre><code class="language-julia">julia&gt; params(m)
+loss(x, y) = crossentropy(softmax(m(x)), y)</code></pre><p>We can regularise this by taking the (L2) norm of the parameters, <code>m.W</code> and <code>m.b</code>.</p><pre><code class="language-julia">penalty() = norm(m.W) + norm(m.b)
+loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()</code></pre><p>When working with layers, Flux provides the <code>params</code> function to grab all parameters at once. We can easily penalise everything with <code>sum(norm, params)</code>.</p><pre><code class="language-julia">julia&gt; params(m)
 2-element Array{Any,1}:
 param([0.355408 0.533092; … 0.430459 0.171498])
 param([0.0, 0.0, 0.0, 0.0, 0.0])

-julia&gt; sum(vecnorm, params(m))
+julia&gt; sum(norm, params(m))
 26.01749952921026 (tracked)</code></pre><p>Here&#39;s a larger example with a multi-layer perceptron.</p><pre><code class="language-julia">m = Chain(
  Dense(28^2, 128, relu),
  Dense(128, 32, relu),
  Dense(32, 10), softmax)

-loss(x, y) = crossentropy(m(x), y) + sum(vecnorm, params(m))
+loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))

 loss(rand(28^2), rand(10))</code></pre><p>One can also easily add per-layer regularisation via the <code>activations</code> function:</p><pre><code class="language-julia">julia&gt; c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
 Chain(Dense(10, 5, NNlib.σ), Dense(5, 2), NNlib.softmax)
@ -31,5 +31,5 @@ julia&gt; activations(c, rand(10))
 param([0.0330606, -0.456104])
 param([0.61991, 0.38009])

-julia&gt; sum(vecnorm, ans)
+julia&gt; sum(norm, ans)
 2.639678767773633 (tracked)</code></pre><footer><hr/><a class="previous" href="recurrence.html"><span class="direction">Previous</span><span class="title">Recurrence</span></a><a class="next" href="layers.html"><span class="direction">Next</span><span class="title">Model Reference</span></a></footer></article></body></html>
--- a/latest/search_index.js
+++ b/latest/search_index.js
@ -13,7 +13,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Home",
    "title": "Flux: The Julia Machine Learning Library",
    "category": "section",
-    "text": "Flux is a library for machine learning. It comes \"batteries-included\" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. The whole stack is implemented in clean Julia code (right down to the GPU kernels) and any part can be tweaked to your liking."
+    "text": "Flux is a library for machine learning. It comes \"batteries-included\" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.\nYou could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When it doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.\nPlay nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models."
 },

 {
@ -21,7 +21,15 @@ var documenterSearchIndex = {"docs": [
    "page": "Home",
    "title": "Installation",
    "category": "section",
-    "text": "Install Julia 0.6.0 or later, if you haven\'t already.Pkg.add(\"Flux\")\n# Optional but recommended\nPkg.update() # Keep your packages up to date\nPkg.test(\"Flux\") # Check things installed correctlyStart with the basics. The model zoo is also a good starting point for many common kinds of models.See GPU support for more details on installing and using Flux with GPUs."
+    "text": "Download Julia 1.0 or later, if you haven\'t already. You can add Flux from using Julia\'s package manager, by typing ] add Flux in the Julia prompt.If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details."
+},
+
+{
+    "location": "index.html#Learning-Flux-1",
+    "page": "Home",
+    "title": "Learning Flux",
+    "category": "section",
+    "text": "There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux\'s APIs, as well as a from-scratch introduction to Flux\'s take on models and how they work. Once you understand these docs, congratulations, you also understand Flux\'s source code, which is intended to be concise, legible and a good reference for more advanced concepts."
 },

 {
@ -69,7 +77,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Basics",
    "title": "Stacking It Up",
    "category": "section",
-    "text": "It\'s pretty common to write models that look something like:layer1 = Dense(10, 5, σ)\n# ...\nmodel(x) = layer3(layer2(layer1(x)))For long chains, it might be a bit more intuitive to have a list of layers, like this:using Flux\n\nlayers = [Dense(10, 5, σ), Dense(5, 2), softmax]\n\nmodel(x) = foldl((x, m) -> m(x), x, layers)\n\nmodel(rand(10)) # => 2-element vectorHandily, this is also provided for in Flux:model2 = Chain(\n  Dense(10, 5, σ),\n  Dense(5, 2),\n  softmax)\n\nmodel2(rand(10)) # => 2-element vectorThis quickly starts to look like a high-level deep learning library; yet you can see how it falls out of simple abstractions, and we lose none of the power of Julia code.A nice property of this approach is that because \"models\" are just functions (possibly with trainable parameters), you can also see this as simple function composition.m = Dense(5, 2) ∘ Dense(10, 5, σ)\n\nm(rand(10))Likewise, Chain will happily work with any Julia function.m = Chain(x -> x^2, x -> x+1)\n\nm(5) # => 26"
+    "text": "It\'s pretty common to write models that look something like:layer1 = Dense(10, 5, σ)\n# ...\nmodel(x) = layer3(layer2(layer1(x)))For long chains, it might be a bit more intuitive to have a list of layers, like this:using Flux\n\nlayers = [Dense(10, 5, σ), Dense(5, 2), softmax]\n\nmodel(x) = foldl((x, m) -> m(x), layers, init = x)\n\nmodel(rand(10)) # => 2-element vectorHandily, this is also provided for in Flux:model2 = Chain(\n  Dense(10, 5, σ),\n  Dense(5, 2),\n  softmax)\n\nmodel2(rand(10)) # => 2-element vectorThis quickly starts to look like a high-level deep learning library; yet you can see how it falls out of simple abstractions, and we lose none of the power of Julia code.A nice property of this approach is that because \"models\" are just functions (possibly with trainable parameters), you can also see this as simple function composition.m = Dense(5, 2) ∘ Dense(10, 5, σ)\n\nm(rand(10))Likewise, Chain will happily work with any Julia function.m = Chain(x -> x^2, x -> x+1)\n\nm(5) # => 26"
 },

 {
@ -77,7 +85,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Basics",
    "title": "Layer helpers",
    "category": "section",
-    "text": "Flux provides a set of helpers for custom layers, which you can enable by callingFlux.treelike(Affine)This enables a useful extra set of functionality for our Affine layer, such as collecting its parameters or moving it to the GPU."
+    "text": "Flux provides a set of helpers for custom layers, which you can enable by callingFlux.@treelike AffineThis enables a useful extra set of functionality for our Affine layer, such as collecting its parameters or moving it to the GPU."
 },

 {
@ -141,7 +149,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Regularisation",
    "title": "Regularisation",
    "category": "section",
-    "text": "Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as vecnorm, to each model parameter and add the result to the overall loss.For example, say we have a simple regression.using Flux: crossentropy\nm = Dense(10, 5)\nloss(x, y) = crossentropy(softmax(m(x)), y)We can regularise this by taking the (L2) norm of the parameters, m.W and m.b.penalty() = vecnorm(m.W) + vecnorm(m.b)\nloss(x, y) = crossentropy(softmax(m(x)), y) + penalty()When working with layers, Flux provides the params function to grab all parameters at once. We can easily penalise everything with sum(vecnorm, params).julia> params(m)\n2-element Array{Any,1}:\n param([0.355408 0.533092; … 0.430459 0.171498])\n param([0.0, 0.0, 0.0, 0.0, 0.0])\n\njulia> sum(vecnorm, params(m))\n26.01749952921026 (tracked)Here\'s a larger example with a multi-layer perceptron.m = Chain(\n  Dense(28^2, 128, relu),\n  Dense(128, 32, relu),\n  Dense(32, 10), softmax)\n\nloss(x, y) = crossentropy(m(x), y) + sum(vecnorm, params(m))\n\nloss(rand(28^2), rand(10))One can also easily add per-layer regularisation via the activations function:julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)\nChain(Dense(10, 5, NNlib.σ), Dense(5, 2), NNlib.softmax)\n\njulia> activations(c, rand(10))\n3-element Array{Any,1}:\n param([0.71068, 0.831145, 0.751219, 0.227116, 0.553074])\n param([0.0330606, -0.456104])\n param([0.61991, 0.38009])\n\njulia> sum(vecnorm, ans)\n2.639678767773633 (tracked)"
+    "text": "Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as norm, to each model parameter and add the result to the overall loss.For example, say we have a simple regression.using Flux: crossentropy\nm = Dense(10, 5)\nloss(x, y) = crossentropy(softmax(m(x)), y)We can regularise this by taking the (L2) norm of the parameters, m.W and m.b.penalty() = norm(m.W) + norm(m.b)\nloss(x, y) = crossentropy(softmax(m(x)), y) + penalty()When working with layers, Flux provides the params function to grab all parameters at once. We can easily penalise everything with sum(norm, params).julia> params(m)\n2-element Array{Any,1}:\n param([0.355408 0.533092; … 0.430459 0.171498])\n param([0.0, 0.0, 0.0, 0.0, 0.0])\n\njulia> sum(norm, params(m))\n26.01749952921026 (tracked)Here\'s a larger example with a multi-layer perceptron.m = Chain(\n  Dense(28^2, 128, relu),\n  Dense(128, 32, relu),\n  Dense(32, 10), softmax)\n\nloss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))\n\nloss(rand(28^2), rand(10))One can also easily add per-layer regularisation via the activations function:julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)\nChain(Dense(10, 5, NNlib.σ), Dense(5, 2), NNlib.softmax)\n\njulia> activations(c, rand(10))\n3-element Array{Any,1}:\n param([0.71068, 0.831145, 0.751219, 0.227116, 0.553074])\n param([0.0330606, -0.456104])\n param([0.61991, 0.38009])\n\njulia> sum(norm, ans)\n2.639678767773633 (tracked)"
 },

 {
@ -157,7 +165,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.Chain",
    "category": "type",
-    "text": "Chain(layers...)\n\nChain multiple layers / functions together, so that they are called in sequence on a given input.\n\nm = Chain(x -> x^2, x -> x+1)\nm(5) == 26\n\nm = Chain(Dense(10, 5), Dense(5, 2))\nx = rand(10)\nm(x) == m[2](m[1](x))\n\nChain also supports indexing and slicing, e.g. m[2] or m[1:end-1]. m[1:3](x) will calculate the output of the first three layers.\n\n\n\n"
+    "text": "Chain(layers...)\n\nChain multiple layers / functions together, so that they are called in sequence on a given input.\n\nm = Chain(x -> x^2, x -> x+1)\nm(5) == 26\n\nm = Chain(Dense(10, 5), Dense(5, 2))\nx = rand(10)\nm(x) == m[2](m[1](x))\n\nChain also supports indexing and slicing, e.g. m[2] or m[1:end-1]. m[1:3](x) will calculate the output of the first three layers.\n\n\n\n\n\n"
 },

 {
@ -165,7 +173,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.Dense",
    "category": "type",
-    "text": "Dense(in::Integer, out::Integer, σ = identity)\n\nCreates a traditional Dense layer with parameters W and b.\n\ny = σ.(W * x .+ b)\n\nThe input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be a vector or batch of length out.\n\njulia> d = Dense(5, 2)\nDense(5, 2)\n\njulia> d(rand(5))\nTracked 2-element Array{Float64,1}:\n  0.00257447\n  -0.00449443\n\n\n\n"
+    "text": "Dense(in::Integer, out::Integer, σ = identity)\n\nCreates a traditional Dense layer with parameters W and b.\n\ny = σ.(W * x .+ b)\n\nThe input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be a vector or batch of length out.\n\njulia> d = Dense(5, 2)\nDense(5, 2)\n\njulia> d(rand(5))\nTracked 2-element Array{Float64,1}:\n  0.00257447\n  -0.00449443\n\n\n\n\n\n"
 },

 {
@ -173,7 +181,23 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.Conv",
    "category": "type",
-    "text": "Conv(size, in=>out)\nConv(size, in=>out, relu)\n\nStandard convolutional layer. size should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.\n\nData should be stored in WHCN order. In other words, a 100×100 RGB image would be a 100×100×3 array, and a batch of 50 would be a 100×100×3×50 array.\n\nTakes the keyword arguments pad, stride and dilation.\n\n\n\n"
+    "text": "Conv(size, in=>out)\nConv(size, in=>out, relu)\n\nStandard convolutional layer. size should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.\n\nData should be stored in WHCN order. In other words, a 100×100 RGB image would be a 100×100×3 array, and a batch of 50 would be a 100×100×3×50 array.\n\nTakes the keyword arguments pad, stride and dilation.\n\n\n\n\n\n"
+},
+
+{
+    "location": "models/layers.html#Flux.MaxPool",
+    "page": "Model Reference",
+    "title": "Flux.MaxPool",
+    "category": "type",
+    "text": "MaxPool(k)\n\nMax pooling layer. k stands for the size of the window for each dimension of the input.\n\nTakes the keyword arguments pad and stride.\n\n\n\n\n\n"
+},
+
+{
+    "location": "models/layers.html#Flux.MeanPool",
+    "page": "Model Reference",
+    "title": "Flux.MeanPool",
+    "category": "type",
+    "text": "MeanPool(k)\n\nMean pooling layer. k stands for the size of the window for each dimension of the input.\n\nTakes the keyword arguments pad and stride.\n\n\n\n\n\n"
 },

 {
@ -181,7 +205,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Basic Layers",
    "category": "section",
-    "text": "These core layers form the foundation of almost all neural networks.Chain\nDense\nConv"
+    "text": "These core layers form the foundation of almost all neural networks.Chain\nDense\nConv\nMaxPool\nMeanPool"
 },

 {
@ -189,7 +213,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.RNN",
    "category": "function",
-    "text": "RNN(in::Integer, out::Integer, σ = tanh)\n\nThe most basic recurrent layer; essentially acts as a Dense layer, but with the output fed back into the input each time step.\n\n\n\n"
+    "text": "RNN(in::Integer, out::Integer, σ = tanh)\n\nThe most basic recurrent layer; essentially acts as a Dense layer, but with the output fed back into the input each time step.\n\n\n\n\n\n"
 },

 {
@ -197,7 +221,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.LSTM",
    "category": "function",
-    "text": "LSTM(in::Integer, out::Integer, σ = tanh)\n\nLong Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.\n\nSee this article for a good overview of the internals.\n\n\n\n"
+    "text": "LSTM(in::Integer, out::Integer, σ = tanh)\n\nLong Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.\n\nSee this article for a good overview of the internals.\n\n\n\n\n\n"
 },

 {
@ -205,7 +229,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.GRU",
    "category": "function",
-    "text": "GRU(in::Integer, out::Integer, σ = tanh)\n\nGated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.\n\nSee this article for a good overview of the internals.\n\n\n\n"
+    "text": "GRU(in::Integer, out::Integer, σ = tanh)\n\nGated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.\n\nSee this article for a good overview of the internals.\n\n\n\n\n\n"
 },

 {
@ -213,7 +237,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.Recur",
    "category": "type",
-    "text": "Recur(cell)\n\nRecur takes a recurrent cell and makes it stateful, managing the hidden state in the background. cell should be a model of the form:\n\nh, y = cell(h, x...)\n\nFor example, here\'s a recurrent network that keeps a running total of its inputs.\n\naccum(h, x) = (h+x, x)\nrnn = Flux.Recur(accum, 0)\nrnn(2) # 2\nrnn(3) # 3\nrnn.state # 5\nrnn.(1:10) # apply to a sequence\nrnn.state # 60\n\n\n\n"
+    "text": "Recur(cell)\n\nRecur takes a recurrent cell and makes it stateful, managing the hidden state in the background. cell should be a model of the form:\n\nh, y = cell(h, x...)\n\nFor example, here\'s a recurrent network that keeps a running total of its inputs.\n\naccum(h, x) = (h+x, x)\nrnn = Flux.Recur(accum, 0)\nrnn(2) # 2\nrnn(3) # 3\nrnn.state # 5\nrnn.(1:10) # apply to a sequence\nrnn.state # 60\n\n\n\n\n\n"
 },

 {
@ -229,7 +253,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "NNlib.σ",
    "category": "function",
-    "text": "σ(x) = 1 / (1 + exp(-x))\n\nClassic sigmoid activation function.\n\n\n\n"
+    "text": "σ(x) = 1 / (1 + exp(-x))\n\nClassic sigmoid activation function.\n\n\n\n\n\n"
 },

 {
@ -237,7 +261,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "NNlib.relu",
    "category": "function",
-    "text": "relu(x) = max(0, x)\n\nRectified Linear Unit activation function.\n\n\n\n"
+    "text": "relu(x) = max(0, x)\n\nRectified Linear Unit activation function.\n\n\n\n\n\n"
 },

 {
@ -245,7 +269,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "NNlib.leakyrelu",
    "category": "function",
-    "text": "leakyrelu(x) = max(0.01x, x)\n\nLeaky Rectified Linear Unit activation function. You can also specify the coefficient explicitly, e.g. leakyrelu(x, 0.01).\n\n\n\n"
+    "text": "leakyrelu(x) = max(0.01x, x)\n\nLeaky Rectified Linear Unit activation function. You can also specify the coefficient explicitly, e.g. leakyrelu(x, 0.01).\n\n\n\n\n\n"
 },

 {
@ -253,7 +277,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "NNlib.elu",
    "category": "function",
-    "text": "elu(x, α = 1) =\n  x > 0 ? x : α * (exp(x) - 1)\n\nExponential Linear Unit activation function. See Fast and Accurate Deep Network Learning by Exponential Linear Units. You can also specify the coefficient explicitly, e.g. elu(x, 1).\n\n\n\n"
+    "text": "elu(x, α = 1) =\n  x > 0 ? x : α * (exp(x) - 1)\n\nExponential Linear Unit activation function. See Fast and Accurate Deep Network Learning by Exponential Linear Units. You can also specify the coefficient explicitly, e.g. elu(x, 1).\n\n\n\n\n\n"
 },

 {
@ -261,7 +285,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "NNlib.swish",
    "category": "function",
-    "text": "swish(x) = x * σ(x)\n\nSelf-gated actvation function. See Swish: a Self-Gated Activation Function.\n\n\n\n"
+    "text": "swish(x) = x * σ(x)\n\nSelf-gated actvation function. See Swish: a Self-Gated Activation Function.\n\n\n\n\n\n"
 },

 {
@ -277,7 +301,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.testmode!",
    "category": "function",
-    "text": "testmode!(m)\ntestmode!(m, false)\n\nPut layers like Dropout and BatchNorm into testing mode (or back to training mode with false).\n\n\n\n"
+    "text": "testmode!(m)\ntestmode!(m, false)\n\nPut layers like Dropout and BatchNorm into testing mode (or back to training mode with false).\n\n\n\n\n\n"
 },

 {
@ -285,7 +309,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.BatchNorm",
    "category": "type",
-    "text": "BatchNorm(channels::Integer, σ = identity;\n          initβ = zeros, initγ = ones,\n          ϵ = 1e-8, momentum = .1)\n\nBatch Normalization layer. The channels input should be the size of the channel dimension in your data (see below).\n\nGiven an array with N dimensions, call the N-1th the channel dimension. (For a batch of feature vectors this is just the data dimension, for WHCN images it\'s the usual channel dimension.)\n\nBatchNorm computes the mean and variance for each each W×H×1×N slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel bias and scale parameters).\n\nSee Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.\n\nExample:\n\nm = Chain(\n  Dense(28^2, 64),\n  BatchNorm(64, relu),\n  Dense(64, 10),\n  BatchNorm(10),\n  softmax)\n\n\n\n"
+    "text": "BatchNorm(channels::Integer, σ = identity;\n          initβ = zeros, initγ = ones,\n          ϵ = 1e-8, momentum = .1)\n\nBatch Normalization layer. The channels input should be the size of the channel dimension in your data (see below).\n\nGiven an array with N dimensions, call the N-1th the channel dimension. (For a batch of feature vectors this is just the data dimension, for WHCN images it\'s the usual channel dimension.)\n\nBatchNorm computes the mean and variance for each each W×H×1×N slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel bias and scale parameters).\n\nSee Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.\n\nExample:\n\nm = Chain(\n  Dense(28^2, 64),\n  BatchNorm(64, relu),\n  Dense(64, 10),\n  BatchNorm(10),\n  softmax)\n\n\n\n\n\n"
 },

 {
@ -293,7 +317,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.Dropout",
    "category": "type",
-    "text": "Dropout(p)\n\nA Dropout layer. For each input, either sets that input to 0 (with probability p) or scales it by 1/(1-p). This is used as a regularisation, i.e. it reduces overfitting during training.\n\nDoes nothing to the input once in testmode!.\n\n\n\n"
+    "text": "Dropout(p)\n\nA Dropout layer. For each input, either sets that input to 0 (with probability p) or scales it by 1/(1-p). This is used as a regularisation, i.e. it reduces overfitting during training.\n\nDoes nothing to the input once in testmode!.\n\n\n\n\n\n"
 },

 {
@ -301,7 +325,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Reference",
    "title": "Flux.LayerNorm",
    "category": "type",
-    "text": "LayerNorm(h::Integer)\n\nA normalisation layer designed to be used with recurrent hidden states of size h. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.\n\n\n\n"
+    "text": "LayerNorm(h::Integer)\n\nA normalisation layer designed to be used with recurrent hidden states of size h. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.\n\n\n\n\n\n"
 },

 {
@ -333,7 +357,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Optimisers",
    "title": "Flux.Optimise.SGD",
    "category": "function",
-    "text": "SGD(params, η = 0.1; decay = 0)\n\nClassic gradient descent optimiser with learning rate η. For each parameter p and its gradient δp, this runs p -= η*δp.\n\nSupports inverse decaying learning rate if the decay argument is provided.\n\n\n\n"
+    "text": "SGD(params, η = 0.1; decay = 0)\n\nClassic gradient descent optimiser with learning rate η. For each parameter p and its gradient δp, this runs p -= η*δp.\n\nSupports inverse decaying learning rate if the decay argument is provided.\n\n\n\n\n\n"
 },

 {
@ -341,7 +365,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Optimisers",
    "title": "Flux.Optimise.Momentum",
    "category": "function",
-    "text": "Momentum(params, η = 0.01; ρ = 0.9, decay = 0)\n\nSGD with learning rate  η, momentum ρ and optional learning rate inverse decay.\n\n\n\n"
+    "text": "Momentum(params, η = 0.01; ρ = 0.9, decay = 0)\n\nSGD with learning rate  η, momentum ρ and optional learning rate inverse decay.\n\n\n\n\n\n"
 },

 {
@ -349,7 +373,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Optimisers",
    "title": "Flux.Optimise.Nesterov",
    "category": "function",
-    "text": "Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)\n\nSGD with learning rate  η, Nesterov momentum ρ and optional learning rate inverse decay.\n\n\n\n"
+    "text": "Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)\n\nSGD with learning rate  η, Nesterov momentum ρ and optional learning rate inverse decay.\n\n\n\n\n\n"
 },

 {
@ -357,7 +381,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Optimisers",
    "title": "Flux.Optimise.ADAM",
    "category": "function",
-    "text": "ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)\n\nADAM optimiser.\n\n\n\n"
+    "text": "ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)\n\nADAM optimiser.\n\n\n\n\n\n"
 },

 {
@ -421,7 +445,7 @@ var documenterSearchIndex = {"docs": [
    "page": "One-Hot Encoding",
    "title": "One-Hot Encoding",
    "category": "section",
-    "text": "It\'s common to encode categorical variables (like true, false or cat, dog) in \"one-of-k\" or \"one-hot\" form. Flux provides the onehot function to make this easy.julia> using Flux: onehot\n\njulia> onehot(:b, [:a, :b, :c])\n3-element Flux.OneHotVector:\n false\n  true\n false\n\njulia> onehot(:c, [:a, :b, :c])\n3-element Flux.OneHotVector:\n false\n false\n  trueThe inverse is argmax (which can take a general probability distribution, as well as just booleans).julia> argmax(ans, [:a, :b, :c])\n:c\n\njulia> argmax([true, false, false], [:a, :b, :c])\n:a\n\njulia> argmax([0.3, 0.2, 0.5], [:a, :b, :c])\n:c"
+    "text": "It\'s common to encode categorical variables (like true, false or cat, dog) in \"one-of-k\" or \"one-hot\" form. Flux provides the onehot function to make this easy.julia> using Flux: onehot, onecold\n\njulia> onehot(:b, [:a, :b, :c])\n3-element Flux.OneHotVector:\n false\n  true\n false\n\njulia> onehot(:c, [:a, :b, :c])\n3-element Flux.OneHotVector:\n false\n false\n  trueThe inverse is onecold (which can take a general probability distribution, as well as just booleans).julia> onecold(ans, [:a, :b, :c])\n:c\n\njulia> onecold([true, false, false], [:a, :b, :c])\n:a\n\njulia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])\n:c"
 },

 {
@ -429,7 +453,7 @@ var documenterSearchIndex = {"docs": [
    "page": "One-Hot Encoding",
    "title": "Batches",
    "category": "section",
-    "text": "onehotbatch creates a batch (matrix) of one-hot vectors, and argmax treats matrices as batches.julia> using Flux: onehotbatch\n\njulia> onehotbatch([:b, :a, :b], [:a, :b, :c])\n3×3 Flux.OneHotMatrix:\n false   true  false\n  true  false   true\n false  false  false\n\njulia> onecold(ans, [:a, :b, :c])\n3-element Array{Symbol,1}:\n  :b\n  :a\n  :bNote that these operations returned OneHotVector and OneHotMatrix rather than Arrays. OneHotVectors behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood."
+    "text": "onehotbatch creates a batch (matrix) of one-hot vectors, and onecold treats matrices as batches.julia> using Flux: onehotbatch\n\njulia> onehotbatch([:b, :a, :b], [:a, :b, :c])\n3×3 Flux.OneHotMatrix:\n false   true  false\n  true  false   true\n false  false  false\n\njulia> onecold(ans, [:a, :b, :c])\n3-element Array{Symbol,1}:\n  :b\n  :a\n  :bNote that these operations returned OneHotVector and OneHotMatrix rather than Arrays. OneHotVectors behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood."
 },

 {
@ -525,7 +549,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Backpropagation",
    "title": "Tracked Internals",
    "category": "section",
-    "text": "All Tracked* objects (TrackedArray, TrackedReal) are light wrappers around the Tracked type, which you can access via the .tracker field.julia> x.tracker\nFlux.Tracker.Tracked{Array{Float64,1}}(0x00000000, Flux.Tracker.Call{Void,Tuple{}}(nothing, ()), true, [5.0, 6.0], [-2.0, -2.0])The Tracker stores the gradient of a given object, which we\'ve seen before.julia> x.tracker.grad\n2-element Array{Float64,1}:\n -2.0\n -2.0The tracker also contains a Call object, which simply represents a function call that was made at some point during the forward pass. For example, the + call would look like this:julia> Tracker.Call(+, 1, 2)\nFlux.Tracker.Call{Base.#+,Tuple{Int64,Int64}}(+, (1, 2))In the case of the y we produced above, we can see that it stores the call that produced it – that is, W*x.julia> y.tracker.f\nFlux.Tracker.Call{...}(*, (param([1.0 2.0; 3.0 4.0]), param([5.0, 6.0])))Notice that because the arguments to the call may also be tracked arrays, storing their own calls, this means that Tracker ends up forming a data structure that records everything that happened during the forward pass (often known as a tape).When we call back!(y, [1, -1]), the sensitivities [1, -1] simply get forwarded to y\'s call (*), effectively callingTracker.back(*, [1, -1], W, x)which in turn calculates the sensitivities of the arguments (W and x) and back-propagates through their calls. This is recursive, so it will walk the entire program graph and propagate gradients to the original model parameters."
+    "text": "All Tracked* objects (TrackedArray, TrackedReal) are light wrappers around the Tracked type, which you can access via the .tracker field.julia> x.tracker\nFlux.Tracker.Tracked{Array{Float64,1}}(0x00000000, Flux.Tracker.Call{Nothing,Tuple{}}(nothing, ()), true, [5.0, 6.0], [-2.0, -2.0])The Tracker stores the gradient of a given object, which we\'ve seen before.julia> x.tracker.grad\n2-element Array{Float64,1}:\n -2.0\n -2.0The tracker also contains a Call object, which simply represents a function call that was made at some point during the forward pass. For example, the + call would look like this:julia> Tracker.Call(+, 1, 2)\nFlux.Tracker.Call{Base.#+,Tuple{Int64,Int64}}(+, (1, 2))In the case of the y we produced above, we can see that it stores the call that produced it – that is, W*x.julia> y.tracker.f\nFlux.Tracker.Call{...}(*, (param([1.0 2.0; 3.0 4.0]), param([5.0, 6.0])))Notice that because the arguments to the call may also be tracked arrays, storing their own calls, this means that Tracker ends up forming a data structure that records everything that happened during the forward pass (often known as a tape).When we call back!(y, [1, -1]), the sensitivities [1, -1] simply get forwarded to y\'s call (*), effectively callingTracker.back(*, [1, -1], W, x)which in turn calculates the sensitivities of the arguments (W and x) and back-propagates through their calls. This is recursive, so it will walk the entire program graph and propagate gradients to the original model parameters."
 },

 {
--- a/latest/training/optimisers.html
+++ b/latest/training/optimisers.html
@ -29,4 +29,4 @@ end</code></pre><p>If we call <code>sgd</code>, the parameters <code>W</code> an
  Dense(10, 5, σ),
  Dense(5, 2), softmax)</code></pre><p>Instead of having to write <code>[m[1].W, m[1].b, ...]</code>, Flux provides a params function <code>params(m)</code> that returns a list of all parameters in the model for you.</p><p>For the update step, there&#39;s nothing whatsoever wrong with writing the loop above – it&#39;ll work just fine – but Flux provides various <em>optimisers</em> that make it more convenient.</p><pre><code class="language-julia">opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1

-opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">SGD(params, η = 0.1; decay = 0)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports inverse decaying learning rate if the <code>decay</code> argument is provided.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a8ccc79f61e81d38d3235e53650fe9466693cbf9/src/optimise/interface.jl#L14-L21">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, momentum <code>ρ</code> and optional learning rate inverse decay.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a8ccc79f61e81d38d3235e53650fe9466693cbf9/src/optimise/interface.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, Nesterov momentum <code>ρ</code> and optional learning rate inverse decay.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a8ccc79f61e81d38d3235e53650fe9466693cbf9/src/optimise/interface.jl#L33-L37">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Function</span>.</div><div><pre><code class="language-none">ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/a8ccc79f61e81d38d3235e53650fe9466693cbf9/src/optimise/interface.jl#L51-L55">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
+opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">SGD(params, η = 0.1; decay = 0)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports inverse decaying learning rate if the <code>decay</code> argument is provided.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/193c4ded19290197fb27a4b058cffd34891073b6/src/optimise/interface.jl#L14-L21">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/193c4ded19290197fb27a4b058cffd34891073b6/src/optimise/interface.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, Nesterov momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/193c4ded19290197fb27a4b058cffd34891073b6/src/optimise/interface.jl#L33-L37">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/193c4ded19290197fb27a4b058cffd34891073b6/src/optimise/interface.jl#L51-L55">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
--- a/latest/training/training.html
+++ b/latest/training/training.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search.html"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../index.html">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics.html">Basics</a></li><li><a class="toctext" href="../models/recurrence.html">Recurrence</a></li><li><a class="toctext" href="../models/regularisation.html">Regularisation</a></li><li><a class="toctext" href="../models/layers.html">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="optimisers.html">Optimisers</a></li><li class="current"><a class="toctext" href="training.html">Training</a><ul class="internal"><li><a class="toctext" href="#Loss-Functions-1">Loss Functions</a></li><li><a class="toctext" href="#Datasets-1">Datasets</a></li><li><a class="toctext" href="#Callbacks-1">Callbacks</a></li></ul></li></ul></li><li><a class="toctext" href="../data/onehot.html">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu.html">GPU Support</a></li><li><a class="toctext" href="../saving.html">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker.html">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community.html">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href="training.html">Training</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/training.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Training</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Training-1" href="#Training-1">Training</a></h1><p>To actually train a model we need three things:</p><ul><li><p>A <em>objective function</em>, that evaluates how well a model is doing given some input data.</p></li><li><p>A collection of data points that will be provided to the objective function.</p></li><li><p>An <a href="optimisers.html">optimiser</a> that will update the model parameters appropriately.</p></li></ul><p>With these we can call <code>Flux.train!</code>:</p><pre><code class="language-julia">Flux.train!(objective, data, opt)</code></pre><p>There are plenty of examples in the <a href="https://github.com/FluxML/model-zoo">model zoo</a>.</p><h2><a class="nav-anchor" id="Loss-Functions-1" href="#Loss-Functions-1">Loss Functions</a></h2><p>The objective function must return a number representing how far the model is from its target – the <em>loss</em> of the model. The <code>loss</code> function that we defined in <a href="../models/basics.html">basics</a> will work as an objective. We can also define an objective in terms of some model:</p><pre><code class="language-julia">m = Chain(
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search.html"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../index.html">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics.html">Basics</a></li><li><a class="toctext" href="../models/recurrence.html">Recurrence</a></li><li><a class="toctext" href="../models/regularisation.html">Regularisation</a></li><li><a class="toctext" href="../models/layers.html">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="optimisers.html">Optimisers</a></li><li class="current"><a class="toctext" href="training.html">Training</a><ul class="internal"><li><a class="toctext" href="#Loss-Functions-1">Loss Functions</a></li><li><a class="toctext" href="#Datasets-1">Datasets</a></li><li><a class="toctext" href="#Callbacks-1">Callbacks</a></li></ul></li></ul></li><li><a class="toctext" href="../data/onehot.html">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu.html">GPU Support</a></li><li><a class="toctext" href="../saving.html">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker.html">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community.html">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href="training.html">Training</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/training.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Training</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Training-1" href="#Training-1">Training</a></h1><p>To actually train a model we need three things:</p><ul><li>A <em>objective function</em>, that evaluates how well a model is doing given some input data.</li><li>A collection of data points that will be provided to the objective function.</li><li>An <a href="optimisers.html">optimiser</a> that will update the model parameters appropriately.</li></ul><p>With these we can call <code>Flux.train!</code>:</p><pre><code class="language-julia">Flux.train!(objective, data, opt)</code></pre><p>There are plenty of examples in the <a href="https://github.com/FluxML/model-zoo">model zoo</a>.</p><h2><a class="nav-anchor" id="Loss-Functions-1" href="#Loss-Functions-1">Loss Functions</a></h2><p>The objective function must return a number representing how far the model is from its target – the <em>loss</em> of the model. The <code>loss</code> function that we defined in <a href="../models/basics.html">basics</a> will work as an objective. We can also define an objective in terms of some model:</p><pre><code class="language-julia">m = Chain(
  Dense(784, 32, σ),
  Dense(32, 10), softmax)