From 9a75fcd05657a4407bde66b2410134427d5916a7 Mon Sep 17 00:00:00 2001
From: zeptodoctor <44736852+zeptodoctor@users.noreply.github.com>
Date: Tue, 19 Feb 2019 15:20:47 +0000
Subject: [PATCH] build based on ebf50f4

---
 dev/community/index.html             |  2 +-
 dev/data/onehot/index.html           |  2 +-
 dev/gpu/index.html                   |  2 +-
 dev/index.html                       |  2 +-
 dev/internals/tracker/index.html     |  4 +--
 dev/models/basics/index.html         |  2 +-
 dev/models/layers/index.html         | 18 ++++++-------
 dev/models/recurrence/index.html     |  2 +-
 dev/models/regularisation/index.html |  2 +-
 dev/performance/index.html           | 20 ++++++++++++++
 dev/saving/index.html                |  4 +--
 dev/search/index.html                |  2 +-
 dev/search_index.js                  | 40 ++++++++++++++++++++++++++++
 dev/training/optimisers/index.html   |  4 +--
 dev/training/training/index.html     |  2 +-
 15 files changed, 84 insertions(+), 24 deletions(-)
 create mode 100644 dev/performance/index.html
diff --git a/dev/community/index.html b/dev/community/index.html
index be6a32be..6c6e0709 100644
--- a/dev/community/index.html
+++ b/dev/community/index.html
@@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li class="current"><a class="toctext" href>Community</a><ul class="internal"></ul></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Community</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/community.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Community</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Community-1" href="#Community-1">Community</a></h1><p>All Flux users are welcome to join our community on the <a href="https://discourse.julialang.org/">Julia forum</a>, the <a href="https://discourse.julialang.org/t/announcing-a-julia-slack/4866">slack</a> (channel #machine-learning), or Flux&#39;s <a href="https://gitter.im/FluxML/Lobby">Gitter</a>. If you have questions or issues we&#39;ll try to help you out.</p><p>If you&#39;re interested in hacking on Flux, the <a href="https://github.com/FluxML/Flux.jl">source code</a> is open and easy to understand – it&#39;s all just the same Julia code you work with normally. You might be interested in our <a href="https://github.com/FluxML/Flux.jl/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22">intro issues</a> to get started.</p><footer><hr/><a class="previous" href="../internals/tracker/"><span class="direction">Previous</span><span class="title">Backpropagation</span></a></footer></article></body></html>
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li class="current"><a class="toctext" href>Community</a><ul class="internal"></ul></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Community</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/community.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Community</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Community-1" href="#Community-1">Community</a></h1><p>All Flux users are welcome to join our community on the <a href="https://discourse.julialang.org/">Julia forum</a>, the <a href="https://discourse.julialang.org/t/announcing-a-julia-slack/4866">slack</a> (channel #machine-learning), or Flux&#39;s <a href="https://gitter.im/FluxML/Lobby">Gitter</a>. If you have questions or issues we&#39;ll try to help you out.</p><p>If you&#39;re interested in hacking on Flux, the <a href="https://github.com/FluxML/Flux.jl">source code</a> is open and easy to understand – it&#39;s all just the same Julia code you work with normally. You might be interested in our <a href="https://github.com/FluxML/Flux.jl/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22">intro issues</a> to get started.</p><footer><hr/><a class="previous" href="../internals/tracker/"><span class="direction">Previous</span><span class="title">Backpropagation</span></a></footer></article></body></html>
diff --git a/dev/data/onehot/index.html b/dev/data/onehot/index.html
index c8a24999..aac2ab23 100644
--- a/dev/data/onehot/index.html
+++ b/dev/data/onehot/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li class="current"><a class="toctext" href>One-Hot Encoding</a><ul class="internal"><li><a class="toctext" href="#Batches-1">Batches</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>One-Hot Encoding</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/onehot.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>One-Hot Encoding</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="One-Hot-Encoding-1" href="#One-Hot-Encoding-1">One-Hot Encoding</a></h1><p>It&#39;s common to encode categorical variables (like <code>true</code>, <code>false</code> or <code>cat</code>, <code>dog</code>) in &quot;one-of-k&quot; or <a href="https://en.wikipedia.org/wiki/One-hot">&quot;one-hot&quot;</a> form. Flux provides the <code>onehot</code> function to make this easy.</p><pre><code class="language-none">julia&gt; using Flux: onehot, onecold
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li class="current"><a class="toctext" href>One-Hot Encoding</a><ul class="internal"><li><a class="toctext" href="#Batches-1">Batches</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>One-Hot Encoding</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/onehot.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>One-Hot Encoding</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="One-Hot-Encoding-1" href="#One-Hot-Encoding-1">One-Hot Encoding</a></h1><p>It&#39;s common to encode categorical variables (like <code>true</code>, <code>false</code> or <code>cat</code>, <code>dog</code>) in &quot;one-of-k&quot; or <a href="https://en.wikipedia.org/wiki/One-hot">&quot;one-hot&quot;</a> form. Flux provides the <code>onehot</code> function to make this easy.</p><pre><code class="language-none">julia&gt; using Flux: onehot, onecold
 
 julia&gt; onehot(:b, [:a, :b, :c])
 3-element Flux.OneHotVector:
diff --git a/dev/gpu/index.html b/dev/gpu/index.html
index bcde7038..59d8376b 100644
--- a/dev/gpu/index.html
+++ b/dev/gpu/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li class="current"><a class="toctext" href>GPU Support</a><ul class="internal"><li><a class="toctext" href="#Installation-1">Installation</a></li><li><a class="toctext" href="#GPU-Usage-1">GPU Usage</a></li></ul></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>GPU Support</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/gpu.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>GPU Support</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="GPU-Support-1" href="#GPU-Support-1">GPU Support</a></h1><h2><a class="nav-anchor" id="Installation-1" href="#Installation-1">Installation</a></h2><p>To get GPU support for NVIDIA graphics cards, you need to install <code>CuArrays.jl</code></p><p><strong>Steps needed</strong></p><ol><li>Install <a href="https://developer.nvidia.com/cuda-downloads">NVIDIA toolkit</a></li><li>Install <a href="https://developer.nvidia.com/cudnn">NVIDIA cuDNN library</a></li><li>In Julia&#39;s terminal run <code>]add CuArrays</code></li></ol><h2><a class="nav-anchor" id="GPU-Usage-1" href="#GPU-Usage-1">GPU Usage</a></h2><p>Support for array operations on other hardware backends, like GPUs, is provided by external packages like <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays</a>. Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.</p><p>For example, we can use <code>CuArrays</code> (with the <code>cu</code> converter) to run our <a href="../models/basics/">basic example</a> on an NVIDIA GPU.</p><p>(Note that you need to have CUDA available to use CuArrays – please see the <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays.jl</a> instructions for more details.)</p><pre><code class="language-julia">using CuArrays
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li class="current"><a class="toctext" href>GPU Support</a><ul class="internal"><li><a class="toctext" href="#Installation-1">Installation</a></li><li><a class="toctext" href="#GPU-Usage-1">GPU Usage</a></li></ul></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>GPU Support</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/gpu.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>GPU Support</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="GPU-Support-1" href="#GPU-Support-1">GPU Support</a></h1><h2><a class="nav-anchor" id="Installation-1" href="#Installation-1">Installation</a></h2><p>To get GPU support for NVIDIA graphics cards, you need to install <code>CuArrays.jl</code></p><p><strong>Steps needed</strong></p><ol><li>Install <a href="https://developer.nvidia.com/cuda-downloads">NVIDIA toolkit</a></li><li>Install <a href="https://developer.nvidia.com/cudnn">NVIDIA cuDNN library</a></li><li>In Julia&#39;s terminal run <code>]add CuArrays</code></li></ol><h2><a class="nav-anchor" id="GPU-Usage-1" href="#GPU-Usage-1">GPU Usage</a></h2><p>Support for array operations on other hardware backends, like GPUs, is provided by external packages like <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays</a>. Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.</p><p>For example, we can use <code>CuArrays</code> (with the <code>cu</code> converter) to run our <a href="../models/basics/">basic example</a> on an NVIDIA GPU.</p><p>(Note that you need to have CUDA available to use CuArrays – please see the <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays.jl</a> instructions for more details.)</p><pre><code class="language-julia">using CuArrays
 
 W = cu(rand(2, 5)) # a 2×5 CuArray
 b = cu(rand(2))
diff --git a/dev/index.html b/dev/index.html
index 7b6afbeb..4c1aba7f 100644
--- a/dev/index.html
+++ b/dev/index.html
@@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="assets/documenter.js"></script><script src="siteinfo.js"></script><script src="../versions.js"></script><link href="assets/documenter.css" rel="stylesheet" type="text/css"/><link href="assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li class="current"><a class="toctext" href>Home</a><ul class="internal"><li><a class="toctext" href="#Installation-1">Installation</a></li><li><a class="toctext" href="#Learning-Flux-1">Learning Flux</a></li></ul></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="models/basics/">Basics</a></li><li><a class="toctext" href="models/recurrence/">Recurrence</a></li><li><a class="toctext" href="models/regularisation/">Regularisation</a></li><li><a class="toctext" href="models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="training/optimisers/">Optimisers</a></li><li><a class="toctext" href="training/training/">Training</a></li></ul></li><li><a class="toctext" href="data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="gpu/">GPU Support</a></li><li><a class="toctext" href="saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Home</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/index.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Home</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Flux:-The-Julia-Machine-Learning-Library-1" href="#Flux:-The-Julia-Machine-Learning-Library-1">Flux: The Julia Machine Learning Library</a></h1><p>Flux is a library for machine learning. It comes &quot;batteries-included&quot; with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:</p><ul><li><strong>Doing the obvious thing</strong>. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.</li><li><strong>You could have written Flux</strong>. All of it, from <a href="https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131">LSTMs</a> to <a href="https://github.com/JuliaGPU/CuArrays.jl">GPU kernels</a>, is straightforward Julia code. When in doubt, it’s well worth looking at <a href="https://github.com/FluxML/Flux.jl/">the source</a>. If you need something different, you can easily roll your own.</li><li><strong>Play nicely with others</strong>. Flux works well with Julia libraries from <a href="https://github.com/JuliaComputing/JuliaDB.jl">data frames</a> and <a href="https://github.com/JuliaImages/Images.jl">images</a> to <a href="https://github.com/JuliaDiffEq/DifferentialEquations.jl">differential equation solvers</a>, so you can easily build complex data processing pipelines that integrate Flux models.</li></ul><h2><a class="nav-anchor" id="Installation-1" href="#Installation-1">Installation</a></h2><p>Download <a href="https://julialang.org/">Julia 1.0</a> or later, if you haven&#39;t already. You can add Flux from using Julia&#39;s package manager, by typing <code>] add Flux</code> in the Julia prompt.</p><p>If you have CUDA you can also run <code>] add CuArrays</code> to get GPU support; see <a href="gpu/">here</a> for more details.</p><h2><a class="nav-anchor" id="Learning-Flux-1" href="#Learning-Flux-1">Learning Flux</a></h2><p>There are several different ways to learn Flux. If you just want to get started writing models, the <a href="https://github.com/FluxML/model-zoo/">model zoo</a> gives good starting points for many common ones. This documentation provides a reference to all of Flux&#39;s APIs, as well as a from-scratch introduction to Flux&#39;s take on models and how they work. Once you understand these docs, congratulations, you also understand <a href="https://github.com/FluxML/Flux.jl">Flux&#39;s source code</a>, which is intended to be concise, legible and a good reference for more advanced concepts.</p><footer><hr/><a class="next" href="models/basics/"><span class="direction">Next</span><span class="title">Basics</span></a></footer></article></body></html>
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="assets/documenter.js"></script><script src="siteinfo.js"></script><script src="../versions.js"></script><link href="assets/documenter.css" rel="stylesheet" type="text/css"/><link href="assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li class="current"><a class="toctext" href>Home</a><ul class="internal"><li><a class="toctext" href="#Installation-1">Installation</a></li><li><a class="toctext" href="#Learning-Flux-1">Learning Flux</a></li></ul></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="models/basics/">Basics</a></li><li><a class="toctext" href="models/recurrence/">Recurrence</a></li><li><a class="toctext" href="models/regularisation/">Regularisation</a></li><li><a class="toctext" href="models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="training/optimisers/">Optimisers</a></li><li><a class="toctext" href="training/training/">Training</a></li></ul></li><li><a class="toctext" href="data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="gpu/">GPU Support</a></li><li><a class="toctext" href="saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Home</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/index.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Home</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Flux:-The-Julia-Machine-Learning-Library-1" href="#Flux:-The-Julia-Machine-Learning-Library-1">Flux: The Julia Machine Learning Library</a></h1><p>Flux is a library for machine learning. It comes &quot;batteries-included&quot; with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:</p><ul><li><strong>Doing the obvious thing</strong>. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.</li><li><strong>You could have written Flux</strong>. All of it, from <a href="https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131">LSTMs</a> to <a href="https://github.com/JuliaGPU/CuArrays.jl">GPU kernels</a>, is straightforward Julia code. When in doubt, it’s well worth looking at <a href="https://github.com/FluxML/Flux.jl/">the source</a>. If you need something different, you can easily roll your own.</li><li><strong>Play nicely with others</strong>. Flux works well with Julia libraries from <a href="https://github.com/JuliaComputing/JuliaDB.jl">data frames</a> and <a href="https://github.com/JuliaImages/Images.jl">images</a> to <a href="https://github.com/JuliaDiffEq/DifferentialEquations.jl">differential equation solvers</a>, so you can easily build complex data processing pipelines that integrate Flux models.</li></ul><h2><a class="nav-anchor" id="Installation-1" href="#Installation-1">Installation</a></h2><p>Download <a href="https://julialang.org/">Julia 1.0</a> or later, if you haven&#39;t already. You can add Flux from using Julia&#39;s package manager, by typing <code>] add Flux</code> in the Julia prompt.</p><p>If you have CUDA you can also run <code>] add CuArrays</code> to get GPU support; see <a href="gpu/">here</a> for more details.</p><h2><a class="nav-anchor" id="Learning-Flux-1" href="#Learning-Flux-1">Learning Flux</a></h2><p>There are several different ways to learn Flux. If you just want to get started writing models, the <a href="https://github.com/FluxML/model-zoo/">model zoo</a> gives good starting points for many common ones. This documentation provides a reference to all of Flux&#39;s APIs, as well as a from-scratch introduction to Flux&#39;s take on models and how they work. Once you understand these docs, congratulations, you also understand <a href="https://github.com/FluxML/Flux.jl">Flux&#39;s source code</a>, which is intended to be concise, legible and a good reference for more advanced concepts.</p><footer><hr/><a class="next" href="models/basics/"><span class="direction">Next</span><span class="title">Basics</span></a></footer></article></body></html>
diff --git a/dev/internals/tracker/index.html b/dev/internals/tracker/index.html
index a4bbd412..04a64e31 100644
--- a/dev/internals/tracker/index.html
+++ b/dev/internals/tracker/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li class="current"><a class="toctext" href>Backpropagation</a><ul class="internal"><li><a class="toctext" href="#Taking-Gradients-1">Taking Gradients</a></li><li><a class="toctext" href="#Tracked-Arrays-1">Tracked Arrays</a></li><li><a class="toctext" href="#Custom-Gradients-1">Custom Gradients</a></li><li><a class="toctext" href="#Tracked-Internals-1">Tracked Internals</a></li></ul></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Internals</li><li><a href>Backpropagation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/internals/tracker.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Backpropagation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Flux.Tracker-1" href="#Flux.Tracker-1">Flux.Tracker</a></h1><p>Backpropagation, or reverse-mode automatic differentiation, is handled by the <code>Flux.Tracker</code> module.</p><pre><code class="language-julia">julia&gt; using Flux.Tracker</code></pre><p>Here we discuss some more advanced uses of this module, as well as covering its internals.</p><h2><a class="nav-anchor" id="Taking-Gradients-1" href="#Taking-Gradients-1">Taking Gradients</a></h2><p>In the <a href="../../models/basics/">basics section</a> we covered basic usage of the <code>gradient</code> function.</p><pre><code class="language-julia">using Flux.Tracker
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li class="current"><a class="toctext" href>Backpropagation</a><ul class="internal"><li><a class="toctext" href="#Taking-Gradients-1">Taking Gradients</a></li><li><a class="toctext" href="#Tracked-Arrays-1">Tracked Arrays</a></li><li><a class="toctext" href="#Custom-Gradients-1">Custom Gradients</a></li><li><a class="toctext" href="#Tracked-Internals-1">Tracked Internals</a></li></ul></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Internals</li><li><a href>Backpropagation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/internals/tracker.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Backpropagation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Flux.Tracker-1" href="#Flux.Tracker-1">Flux.Tracker</a></h1><p>Backpropagation, or reverse-mode automatic differentiation, is handled by the <code>Flux.Tracker</code> module.</p><pre><code class="language-julia">julia&gt; using Flux.Tracker</code></pre><p>Here we discuss some more advanced uses of this module, as well as covering its internals.</p><h2><a class="nav-anchor" id="Taking-Gradients-1" href="#Taking-Gradients-1">Taking Gradients</a></h2><p>In the <a href="../../models/basics/">basics section</a> we covered basic usage of the <code>gradient</code> function.</p><pre><code class="language-julia">using Flux.Tracker
 
 Tracker.gradient((a, b) -&gt; a*b, 2, 3) # (3.0 (tracked), 2.0 (tracked))</code></pre><p><code>gradient</code> is actually just a thin wrapper around the backpropagator-based interface, <code>forward</code>.</p><pre><code class="language-julia">using Flux.Tracker: forward
 
@@ -63,4 +63,4 @@ Flux.Tracker.Tracked{Array{Float64,1}}(0x00000000, Flux.Tracker.Call{Nothing,Tup
  -2.0
  -2.0</code></pre><p>The tracker also contains a <code>Call</code> object, which simply represents a function call that was made at some point during the forward pass. For example, the <code>+</code> call would look like this:</p><pre><code class="language-julia">julia&gt; Tracker.Call(+, 1, 2)
 Flux.Tracker.Call{Base.#+,Tuple{Int64,Int64}}(+, (1, 2))</code></pre><p>In the case of the <code>y</code> we produced above, we can see that it stores the call that produced it – that is, <code>W*x</code>.</p><pre><code class="language-julia">julia&gt; y.tracker.f
-Flux.Tracker.Call{...}(*, (param([1.0 2.0; 3.0 4.0]), param([5.0, 6.0])))</code></pre><p>Notice that because the arguments to the call may also be tracked arrays, storing their own calls, this means that <code>Tracker</code> ends up forming a data structure that records everything that happened during the forward pass (often known as a <em>tape</em>).</p><p>When we call <code>back!(y, [1, -1])</code>, the sensitivities <code>[1, -1]</code> simply get forwarded to <code>y</code>&#39;s call (<code>*</code>), effectively calling</p><pre><code class="language-julia">Tracker.back(*, [1, -1], W, x)</code></pre><p>which in turn calculates the sensitivities of the arguments (<code>W</code> and <code>x</code>) and back-propagates through their calls. This is recursive, so it will walk the entire program graph and propagate gradients to the original model parameters.</p><footer><hr/><a class="previous" href="../../saving/"><span class="direction">Previous</span><span class="title">Saving &amp; Loading</span></a><a class="next" href="../../community/"><span class="direction">Next</span><span class="title">Community</span></a></footer></article></body></html>
+Flux.Tracker.Call{...}(*, (param([1.0 2.0; 3.0 4.0]), param([5.0, 6.0])))</code></pre><p>Notice that because the arguments to the call may also be tracked arrays, storing their own calls, this means that <code>Tracker</code> ends up forming a data structure that records everything that happened during the forward pass (often known as a <em>tape</em>).</p><p>When we call <code>back!(y, [1, -1])</code>, the sensitivities <code>[1, -1]</code> simply get forwarded to <code>y</code>&#39;s call (<code>*</code>), effectively calling</p><pre><code class="language-julia">Tracker.back(*, [1, -1], W, x)</code></pre><p>which in turn calculates the sensitivities of the arguments (<code>W</code> and <code>x</code>) and back-propagates through their calls. This is recursive, so it will walk the entire program graph and propagate gradients to the original model parameters.</p><footer><hr/><a class="previous" href="../../performance/"><span class="direction">Previous</span><span class="title">Performance Tips</span></a><a class="next" href="../../community/"><span class="direction">Next</span><span class="title">Community</span></a></footer></article></body></html>
diff --git a/dev/models/basics/index.html b/dev/models/basics/index.html
index c51f4177..99089dff 100644
--- a/dev/models/basics/index.html
+++ b/dev/models/basics/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li class="current"><a class="toctext" href>Basics</a><ul class="internal"><li><a class="toctext" href="#Taking-Gradients-1">Taking Gradients</a></li><li><a class="toctext" href="#Simple-Models-1">Simple Models</a></li><li><a class="toctext" href="#Building-Layers-1">Building Layers</a></li><li><a class="toctext" href="#Stacking-It-Up-1">Stacking It Up</a></li><li><a class="toctext" href="#Layer-helpers-1">Layer helpers</a></li></ul></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Basics</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/basics.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Basics</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">Model-Building Basics</a></h1><h2><a class="nav-anchor" id="Taking-Gradients-1" href="#Taking-Gradients-1">Taking Gradients</a></h2><p>Flux&#39;s core feature is taking gradients of Julia code. The <code>gradient</code> function takes another Julia function <code>f</code> and a set of arguments, and returns the gradient with respect to each argument. (It&#39;s a good idea to try pasting these examples in the Julia terminal.)</p><pre><code class="language-julia-repl">julia&gt; using Flux.Tracker
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li class="current"><a class="toctext" href>Basics</a><ul class="internal"><li><a class="toctext" href="#Taking-Gradients-1">Taking Gradients</a></li><li><a class="toctext" href="#Simple-Models-1">Simple Models</a></li><li><a class="toctext" href="#Building-Layers-1">Building Layers</a></li><li><a class="toctext" href="#Stacking-It-Up-1">Stacking It Up</a></li><li><a class="toctext" href="#Layer-helpers-1">Layer helpers</a></li></ul></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Basics</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/basics.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Basics</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">Model-Building Basics</a></h1><h2><a class="nav-anchor" id="Taking-Gradients-1" href="#Taking-Gradients-1">Taking Gradients</a></h2><p>Flux&#39;s core feature is taking gradients of Julia code. The <code>gradient</code> function takes another Julia function <code>f</code> and a set of arguments, and returns the gradient with respect to each argument. (It&#39;s a good idea to try pasting these examples in the Julia terminal.)</p><pre><code class="language-julia-repl">julia&gt; using Flux.Tracker
 
 julia&gt; f(x) = 3x^2 + 2x + 1;
 
diff --git a/dev/models/layers/index.html b/dev/models/layers/index.html
index 77219f85..8ae89f32 100644
--- a/dev/models/layers/index.html
+++ b/dev/models/layers/index.html
@@ -6,34 +6,34 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li class="current"><a class="toctext" href>Model Reference</a><ul class="internal"><li><a class="toctext" href="#Basic-Layers-1">Basic Layers</a></li><li><a class="toctext" href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></li><li><a class="toctext" href="#Recurrent-Layers-1">Recurrent Layers</a></li><li><a class="toctext" href="#Activation-Functions-1">Activation Functions</a></li><li><a class="toctext" href="#Normalisation-and-Regularisation-1">Normalisation &amp; Regularisation</a></li></ul></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Model Reference</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/layers.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Model Reference</span><a class="fa fa-bars" href="#"></a></div></header><h2><a class="nav-anchor" id="Basic-Layers-1" href="#Basic-Layers-1">Basic Layers</a></h2><p>These core layers form the foundation of almost all neural networks.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Chain" href="#Flux.Chain"><code>Flux.Chain</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Chain(layers...)</code></pre><p>Chain multiple layers / functions together, so that they are called in sequence on a given input.</p><pre><code class="language-julia">m = Chain(x -&gt; x^2, x -&gt; x+1)
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li class="current"><a class="toctext" href>Model Reference</a><ul class="internal"><li><a class="toctext" href="#Basic-Layers-1">Basic Layers</a></li><li><a class="toctext" href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></li><li><a class="toctext" href="#Recurrent-Layers-1">Recurrent Layers</a></li><li><a class="toctext" href="#Activation-Functions-1">Activation Functions</a></li><li><a class="toctext" href="#Normalisation-and-Regularisation-1">Normalisation &amp; Regularisation</a></li></ul></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Model Reference</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/layers.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Model Reference</span><a class="fa fa-bars" href="#"></a></div></header><h2><a class="nav-anchor" id="Basic-Layers-1" href="#Basic-Layers-1">Basic Layers</a></h2><p>These core layers form the foundation of almost all neural networks.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Chain" href="#Flux.Chain"><code>Flux.Chain</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Chain(layers...)</code></pre><p>Chain multiple layers / functions together, so that they are called in sequence on a given input.</p><pre><code class="language-julia">m = Chain(x -&gt; x^2, x -&gt; x+1)
 m(5) == 26
 
 m = Chain(Dense(10, 5), Dense(5, 2))
 x = rand(10)
-m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
+m(x) == m[2](m[1](x))</code></pre><p><code>Chain</code> also supports indexing and slicing, e.g. <code>m[2]</code> or <code>m[1:end-1]</code>. <code>m[1:3](x)</code> will calculate the output of the first three layers.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/basic.jl#L1-L18">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dense" href="#Flux.Dense"><code>Flux.Dense</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dense(in::Integer, out::Integer, σ = identity)</code></pre><p>Creates a traditional <code>Dense</code> layer with parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-none">y = σ.(W * x .+ b)</code></pre><p>The input <code>x</code> must be a vector of length <code>in</code>, or a batch of vectors represented as an <code>in × N</code> matrix. The out <code>y</code> will be a vector or batch of length <code>out</code>.</p><pre><code class="language-julia">julia&gt; d = Dense(5, 2)
 Dense(5, 2)
 
 julia&gt; d(rand(5))
 Tracked 2-element Array{Float64,1}:
   0.00257447
-  -0.00449443</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/basic.jl#L45-L64">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Conv(size, in=&gt;out)
-Conv(size, in=&gt;out, relu)</code></pre><p>Standard convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/conv.jl#L8-L19">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/conv.jl#L159-L165">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/conv.jl#L181-L187">source</a></section><h2><a class="nav-anchor" id="Additional-Convolution-Layers-1" href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></h2><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">DepthwiseConv(size, in)
+  -0.00449443</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/basic.jl#L45-L64">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Conv" href="#Flux.Conv"><code>Flux.Conv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Conv(size, in=&gt;out)
+Conv(size, in=&gt;out, relu)</code></pre><p>Standard convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3×1</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L8-L19">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MaxPool" href="#Flux.MaxPool"><code>Flux.MaxPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MaxPool(k)</code></pre><p>Max pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L159-L165">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.MeanPool" href="#Flux.MeanPool"><code>Flux.MeanPool</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">MeanPool(k)</code></pre><p>Mean pooling layer. <code>k</code> stands for the size of the window for each dimension of the input.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L181-L187">source</a></section><h2><a class="nav-anchor" id="Additional-Convolution-Layers-1" href="#Additional-Convolution-Layers-1">Additional Convolution Layers</a></h2><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.DepthwiseConv" href="#Flux.DepthwiseConv"><code>Flux.DepthwiseConv</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">DepthwiseConv(size, in)
 DepthwiseConv(size, in=&gt;mul)
-DepthwiseConv(size, in=&gt;mul, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>mul</code> specify the number of input channels and channel multiplier respectively. In case the <code>mul</code> is not specified it is taken as 1.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/conv.jl#L108-L121">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.ConvTranspose" href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">ConvTranspose(size, in=&gt;out)
-ConvTranspose(size, in=&gt;out, relu)</code></pre><p>Standard convolutional transpose layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively. Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array. Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/conv.jl#L60-L69">source</a></section><h2><a class="nav-anchor" id="Recurrent-Layers-1" href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/recurrent.jl#L105-L110">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/recurrent.jl#L150-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/recurrent.jl#L191-L199">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
+DepthwiseConv(size, in=&gt;mul, relu)</code></pre><p>Depthwise convolutional layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>mul</code> specify the number of input channels and channel multiplier respectively. In case the <code>mul</code> is not specified it is taken as 1.</p><p>Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array.</p><p>Takes the keyword arguments <code>pad</code> and <code>stride</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L108-L121">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.ConvTranspose" href="#Flux.ConvTranspose"><code>Flux.ConvTranspose</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">ConvTranspose(size, in=&gt;out)
+ConvTranspose(size, in=&gt;out, relu)</code></pre><p>Standard convolutional transpose layer. <code>size</code> should be a tuple like <code>(2, 2)</code>. <code>in</code> and <code>out</code> specify the number of input and output channels respectively. Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a <code>100×100×3</code> array, and a batch of 50 would be a <code>100×100×3×50</code> array. Takes the keyword arguments <code>pad</code>, <code>stride</code> and <code>dilation</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/conv.jl#L60-L69">source</a></section><h2><a class="nav-anchor" id="Recurrent-Layers-1" href="#Recurrent-Layers-1">Recurrent Layers</a></h2><p>Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.RNN" href="#Flux.RNN"><code>Flux.RNN</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">RNN(in::Integer, out::Integer, σ = tanh)</code></pre><p>The most basic recurrent layer; essentially acts as a <code>Dense</code> layer, but with the output fed back into the input each time step.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L105-L110">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LSTM" href="#Flux.LSTM"><code>Flux.LSTM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">LSTM(in::Integer, out::Integer)</code></pre><p>Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L150-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.GRU" href="#Flux.GRU"><code>Flux.GRU</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">GRU(in::Integer, out::Integer)</code></pre><p>Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.</p><p>See <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">this article</a> for a good overview of the internals.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L191-L199">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Recur" href="#Flux.Recur"><code>Flux.Recur</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Recur(cell)</code></pre><p><code>Recur</code> takes a recurrent cell and makes it stateful, managing the hidden state in the background. <code>cell</code> should be a model of the form:</p><pre><code class="language-none">h, y = cell(h, x...)</code></pre><p>For example, here&#39;s a recurrent network that keeps a running total of its inputs.</p><pre><code class="language-julia">accum(h, x) = (h+x, x)
 rnn = Flux.Recur(accum, 0)
 rnn(2) # 2
 rnn(3) # 3
 rnn.state # 5
 rnn.(1:10) # apply to a sequence
-rnn.state # 60</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/recurrent.jl#L7-L26">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">elu(x, α = 1) =
+rnn.state # 60</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/recurrent.jl#L7-L26">source</a></section><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Most of these functions are defined in <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> but are available by default in Flux.</p><p>Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.σ" href="#NNlib.σ"><code>NNlib.σ</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">elu(x, α = 1) =
   x &gt; 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">swish(x) = x * σ(x)</code></pre><p>Self-gated actvation function. See <a href="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div></div></section><h2><a class="nav-anchor" id="Normalisation-and-Regularisation-1" href="#Normalisation-and-Regularisation-1">Normalisation &amp; Regularisation</a></h2><p>These layers don&#39;t affect the structure of the network but may improve training times or reduce overfitting.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.testmode!" href="#Flux.testmode!"><code>Flux.testmode!</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">testmode!(m)
-testmode!(m, false)</code></pre><p>Put layers like <a href="#Flux.Dropout"><code>Dropout</code></a> and <a href="#Flux.BatchNorm"><code>BatchNorm</code></a> into testing mode (or back to training mode with <code>false</code>).</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/normalise.jl#L1-L7">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">BatchNorm(channels::Integer, σ = identity;
+testmode!(m, false)</code></pre><p>Put layers like <a href="#Flux.Dropout"><code>Dropout</code></a> and <a href="#Flux.BatchNorm"><code>BatchNorm</code></a> into testing mode (or back to training mode with <code>false</code>).</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L1-L7">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.BatchNorm" href="#Flux.BatchNorm"><code>Flux.BatchNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">BatchNorm(channels::Integer, σ = identity;
           initβ = zeros, initγ = ones,
           ϵ = 1e-8, momentum = .1)</code></pre><p>Batch Normalization layer. The <code>channels</code> input should be the size of the channel dimension in your data (see below).</p><p>Given an array with <code>N</code> dimensions, call the <code>N-1</code>th the channel dimension. (For a batch of feature vectors this is just the data dimension, for <code>WHCN</code> images it&#39;s the usual channel dimension.)</p><p><code>BatchNorm</code> computes the mean and variance for each each <code>W×H×1×N</code> slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel <code>bias</code> and <code>scale</code> parameters).</p><p>See <a href="https://arxiv.org/pdf/1502.03167.pdf">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</a>.</p><p>Example:</p><pre><code class="language-julia">m = Chain(
   Dense(28^2, 64),
   BatchNorm(64, relu),
   Dense(64, 10),
   BatchNorm(10),
-  softmax)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/normalise.jl#L68-L96">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dropout(p)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. This is used as a regularisation, i.e. it reduces overfitting during training.</p><p>Does nothing to the input once in <a href="#Flux.testmode!"><code>testmode!</code></a>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/normalise.jl#L15-L23">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">LayerNorm(h::Integer)</code></pre><p>A <a href="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/layers/normalise.jl#L46-L52">source</a></section><footer><hr/><a class="previous" href="../regularisation/"><span class="direction">Previous</span><span class="title">Regularisation</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
+  softmax)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L68-L96">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Dropout" href="#Flux.Dropout"><code>Flux.Dropout</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Dropout(p)</code></pre><p>A Dropout layer. For each input, either sets that input to <code>0</code> (with probability <code>p</code>) or scales it by <code>1/(1-p)</code>. This is used as a regularisation, i.e. it reduces overfitting during training.</p><p>Does nothing to the input once in <a href="#Flux.testmode!"><code>testmode!</code></a>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L15-L23">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.LayerNorm" href="#Flux.LayerNorm"><code>Flux.LayerNorm</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">LayerNorm(h::Integer)</code></pre><p>A <a href="https://arxiv.org/pdf/1607.06450.pdf">normalisation layer</a> designed to be used with recurrent hidden states of size <code>h</code>. Normalises the mean/stddev of each input before applying a per-neuron gain/bias.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/layers/normalise.jl#L46-L52">source</a></section><footer><hr/><a class="previous" href="../regularisation/"><span class="direction">Previous</span><span class="title">Regularisation</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
diff --git a/dev/models/recurrence/index.html b/dev/models/recurrence/index.html
index 27f7d901..9f2de587 100644
--- a/dev/models/recurrence/index.html
+++ b/dev/models/recurrence/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li class="current"><a class="toctext" href>Recurrence</a><ul class="internal"><li><a class="toctext" href="#Recurrent-Cells-1">Recurrent Cells</a></li><li><a class="toctext" href="#Stateful-Models-1">Stateful Models</a></li><li><a class="toctext" href="#Sequences-1">Sequences</a></li><li><a class="toctext" href="#Truncating-Gradients-1">Truncating Gradients</a></li></ul></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Recurrence</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Recurrence</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Recurrent-Models-1" href="#Recurrent-Models-1">Recurrent Models</a></h1><h2><a class="nav-anchor" id="Recurrent-Cells-1" href="#Recurrent-Cells-1">Recurrent Cells</a></h2><p>In the simple feedforward case, our model <code>m</code> is a simple function from various inputs <code>xᵢ</code> to predictions <code>yᵢ</code>. (For example, each <code>x</code> might be an MNIST digit and each <code>y</code> a digit label.) Each prediction is completely independent of any others, and using the same <code>x</code> will always produce the same <code>y</code>.</p><pre><code class="language-julia">y₁ = f(x₁)
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li class="current"><a class="toctext" href>Recurrence</a><ul class="internal"><li><a class="toctext" href="#Recurrent-Cells-1">Recurrent Cells</a></li><li><a class="toctext" href="#Stateful-Models-1">Stateful Models</a></li><li><a class="toctext" href="#Sequences-1">Sequences</a></li><li><a class="toctext" href="#Truncating-Gradients-1">Truncating Gradients</a></li></ul></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Recurrence</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Recurrence</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Recurrent-Models-1" href="#Recurrent-Models-1">Recurrent Models</a></h1><h2><a class="nav-anchor" id="Recurrent-Cells-1" href="#Recurrent-Cells-1">Recurrent Cells</a></h2><p>In the simple feedforward case, our model <code>m</code> is a simple function from various inputs <code>xᵢ</code> to predictions <code>yᵢ</code>. (For example, each <code>x</code> might be an MNIST digit and each <code>y</code> a digit label.) Each prediction is completely independent of any others, and using the same <code>x</code> will always produce the same <code>y</code>.</p><pre><code class="language-julia">y₁ = f(x₁)
 y₂ = f(x₂)
 y₃ = f(x₃)
 # ...</code></pre><p>Recurrent networks introduce a <em>hidden state</em> that gets carried over each time we run the model. The model now takes the old <code>h</code> as an input, and produces a new <code>h</code> as output, each time we run it.</p><pre><code class="language-julia">h = # ... initial state ...
diff --git a/dev/models/regularisation/index.html b/dev/models/regularisation/index.html
index 6646d937..9dd23051 100644
--- a/dev/models/regularisation/index.html
+++ b/dev/models/regularisation/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li class="current"><a class="toctext" href>Regularisation</a><ul class="internal"></ul></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Regularisation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/regularisation.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Regularisation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Regularisation-1" href="#Regularisation-1">Regularisation</a></h1><p>Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as <code>norm</code>, to each model parameter and add the result to the overall loss.</p><p>For example, say we have a simple regression.</p><pre><code class="language-julia">using Flux: crossentropy
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li class="current"><a class="toctext" href>Regularisation</a><ul class="internal"></ul></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Regularisation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/regularisation.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Regularisation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Regularisation-1" href="#Regularisation-1">Regularisation</a></h1><p>Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as <code>norm</code>, to each model parameter and add the result to the overall loss.</p><p>For example, say we have a simple regression.</p><pre><code class="language-julia">using Flux: crossentropy
 m = Dense(10, 5)
 loss(x, y) = crossentropy(softmax(m(x)), y)</code></pre><p>We can regularise this by taking the (L2) norm of the parameters, <code>m.W</code> and <code>m.b</code>.</p><pre><code class="language-julia">penalty() = norm(m.W) + norm(m.b)
 loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()</code></pre><p>When working with layers, Flux provides the <code>params</code> function to grab all parameters at once. We can easily penalise everything with <code>sum(norm, params)</code>.</p><pre><code class="language-julia">julia&gt; params(m)
diff --git a/dev/performance/index.html b/dev/performance/index.html
new file mode 100644
index 00000000..9a981139
--- /dev/null
+++ b/dev/performance/index.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Performance Tips · Flux</title><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li class="current"><a class="toctext" href>Performance Tips</a><ul class="internal"><li><a class="toctext" href="#Don&#39;t-use-more-precision-than-you-need.-1">Don&#39;t use more precision than you need.</a></li><li><a class="toctext" href="#Make-sure-your-custom-activation-functions-preserve-the-type-of-their-inputs-1">Make sure your custom activation functions preserve the type of their inputs</a></li><li><a class="toctext" href="#Evaluate-batches-as-Matrices-of-features,-rather-than-sequences-of-Vector-features-1">Evaluate batches as Matrices of features, rather than sequences of Vector features</a></li></ul></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Performance Tips</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/performance.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Performance Tips</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Performance-Tips-1" href="#Performance-Tips-1">Performance Tips</a></h1><p>All the usual <a href="https://docs.julialang.org/en/v1/manual/performance-tips/">Julia performance tips apply</a>. As always <a href="https://docs.julialang.org/en/v1/manual/profile/#Profiling-1">profiling your code</a> is generally a useful way of finding bottlenecks. Below follow some Flux specific tips/reminders.</p><h2><a class="nav-anchor" id="Don&#39;t-use-more-precision-than-you-need.-1" href="#Don&#39;t-use-more-precision-than-you-need.-1">Don&#39;t use more precision than you need.</a></h2><p>Flux works great with all kinds of number types. But often you do not need to be working with say <code>Float64</code> (let alone <code>BigFloat</code>). Switching to <code>Float32</code> can give you a significant speed up, not because the operations are faster, but because the memory usage is halved. Which means allocations occur much faster. And you use less memory.</p><h2><a class="nav-anchor" id="Make-sure-your-custom-activation-functions-preserve-the-type-of-their-inputs-1" href="#Make-sure-your-custom-activation-functions-preserve-the-type-of-their-inputs-1">Make sure your custom activation functions preserve the type of their inputs</a></h2><p>Not only should your activation functions be <a href="https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1">type-stable</a>, they should also preserve the type of their inputs.</p><p>A very artificial example using an activatioon function like</p><pre><code class="language-none">    my_tanh(x) = Float64(tanh(x))</code></pre><p>will result in performance on <code>Float32</code> input orders of magnitude slower than the normal <code>tanh</code> would, because it results in having to use slow mixed type multiplication in the dense layers.</p><p>Which means if you change your data say from <code>Float64</code> to <code>Float32</code> (which should give a speedup: see above), you will see a large slow-down</p><p>This can occur sneakily, because you can cause type-promotion by interacting with a numeric literals. E.g. the following will have run into the same problem as above:</p><pre><code class="language-none">    leaky_tanh(x) = 0.01x + tanh(x)</code></pre><p>While one could change your activation function (e.g. to use <code>0.01f0x</code>) to avoid this when ever your inputs change, the idiomatic (and safe way) is to use <code>oftype</code>.</p><pre><code class="language-none">    leaky_tanh(x) = oftype(x/1, 0.01) + tanh(x)</code></pre><h2><a class="nav-anchor" id="Evaluate-batches-as-Matrices-of-features,-rather-than-sequences-of-Vector-features-1" href="#Evaluate-batches-as-Matrices-of-features,-rather-than-sequences-of-Vector-features-1">Evaluate batches as Matrices of features, rather than sequences of Vector features</a></h2><p>While it can sometimes be tempting to process your observations (feature vectors) one at a time e.g.</p><pre><code class="language-julia">function loss_total(xs::AbstractVector{&lt;:Vector}, ys::AbstractVector{&lt;:Vector})
+    sum(zip(xs, ys)) do (x, y_target)
+        y_pred = model(x) #  evaluate the model
+        return loss(y_pred, y_target)
+    end
+end</code></pre><p>It is much faster to concatenate them into a matrix, as this will hit BLAS matrix-matrix multiplication, which is much faster than the equivalent sequence of matrix-vector multiplications. Even though this means allocating new memory to store them contiguously.</p><pre><code class="language-julia">x_batch = reduce(hcat, xs)
+y_batch = reduce(hcat, ys)
+...
+function loss_total(x_batch::Matrix, y_batch::Matrix)
+    y_preds = model(x_batch)
+    sum(loss.(y_preds, y_batch))
+end</code></pre><p>When doing this kind of concatenation use <code>reduce(hcat, xs)</code> rather than <code>hcat(xs...)</code>. This will avoid the splatting penality, and will hit the optimised <code>reduce</code> method.</p><footer><hr/><a class="previous" href="../saving/"><span class="direction">Previous</span><span class="title">Saving &amp; Loading</span></a><a class="next" href="../internals/tracker/"><span class="direction">Next</span><span class="title">Backpropagation</span></a></footer></article></body></html>
diff --git a/dev/saving/index.html b/dev/saving/index.html
index 0b3c93f4..6f21561b 100644
--- a/dev/saving/index.html
+++ b/dev/saving/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li class="current"><a class="toctext" href>Saving &amp; Loading</a><ul class="internal"><li><a class="toctext" href="#Saving-Model-Weights-1">Saving Model Weights</a></li><li><a class="toctext" href="#Checkpointing-1">Checkpointing</a></li></ul></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Saving &amp; Loading</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Saving &amp; Loading</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Saving-and-Loading-Models-1" href="#Saving-and-Loading-Models-1">Saving and Loading Models</a></h1><p>You may wish to save models so that they can be loaded and run in a later session. The easiest way to do this is via <a href="https://github.com/MikeInnes/BSON.jl">BSON.jl</a>.</p><p>Save a model:</p><pre><code class="language-julia">julia&gt; using Flux
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li class="current"><a class="toctext" href>Saving &amp; Loading</a><ul class="internal"><li><a class="toctext" href="#Saving-Model-Weights-1">Saving Model Weights</a></li><li><a class="toctext" href="#Checkpointing-1">Checkpointing</a></li></ul></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Saving &amp; Loading</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Saving &amp; Loading</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Saving-and-Loading-Models-1" href="#Saving-and-Loading-Models-1">Saving and Loading Models</a></h1><p>You may wish to save models so that they can be loaded and run in a later session. The easiest way to do this is via <a href="https://github.com/MikeInnes/BSON.jl">BSON.jl</a>.</p><p>Save a model:</p><pre><code class="language-julia">julia&gt; using Flux
 
 julia&gt; model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
 Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
@@ -47,4 +47,4 @@ evalcb = throttle(30) do
   # Show loss
   @save &quot;model-checkpoint.bson&quot; model
 end</code></pre><p>This will update the <code>&quot;model-checkpoint.bson&quot;</code> file every thirty seconds.</p><p>You can get more advanced by saving a series of models throughout training, for example</p><pre><code class="language-julia">@save &quot;model-$(now()).bson&quot; model</code></pre><p>will produce a series of models like <code>&quot;model-2018-03-06T02:57:10.41.bson&quot;</code>. You could also store the current test set loss, so that it&#39;s easy to (for example) revert to an older copy of the model if it starts to overfit.</p><pre><code class="language-julia">@save &quot;model-$(now()).bson&quot; model loss = testloss()</code></pre><p>You can even store optimiser state alongside the model, to resume training exactly where you left off.</p><pre><code class="language-julia">opt = ADAM(params(model))
-@save &quot;model-$(now()).bson&quot; model opt</code></pre><footer><hr/><a class="previous" href="../gpu/"><span class="direction">Previous</span><span class="title">GPU Support</span></a><a class="next" href="../internals/tracker/"><span class="direction">Next</span><span class="title">Backpropagation</span></a></footer></article></body></html>
+@save &quot;model-$(now()).bson&quot; model opt</code></pre><footer><hr/><a class="previous" href="../gpu/"><span class="direction">Previous</span><span class="title">GPU Support</span></a><a class="next" href="../performance/"><span class="direction">Next</span><span class="title">Performance Tips</span></a></footer></article></body></html>
diff --git a/dev/search/index.html b/dev/search/index.html
index 8105e38b..fa2bff6a 100644
--- a/dev/search/index.html
+++ b/dev/search/index.html
@@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article><header><nav><ul><li>Search</li></ul></nav><hr/><div id="topbar"><span>Search</span><a class="fa fa-bars" href="#"></a></div></header><h1>Search</h1><p id="search-info">Number of results: <span id="search-results-number">loading...</span></p><ul id="search-results"></ul></article></body><script src="../search_index.js"></script><script src="../assets/search.js"></script></html>
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article><header><nav><ul><li>Search</li></ul></nav><hr/><div id="topbar"><span>Search</span><a class="fa fa-bars" href="#"></a></div></header><h1>Search</h1><p id="search-info">Number of results: <span id="search-results-number">loading...</span></p><ul id="search-results"></ul></article></body><script src="../search_index.js"></script><script src="../assets/search.js"></script></html>
diff --git a/dev/search_index.js b/dev/search_index.js
index 1dff79ae..04ab508f 100644
--- a/dev/search_index.js
+++ b/dev/search_index.js
@@ -544,6 +544,46 @@ var documenterSearchIndex = {"docs": [
     "text": "In longer training runs it\'s a good idea to periodically save your model, so that you can resume if training is interrupted (for example, if there\'s a power cut). You can do this by saving the model in the callback provided to train!.using Flux: throttle\nusing BSON: @save\n\nm = Chain(Dense(10,5,relu),Dense(5,2),softmax)\n\nevalcb = throttle(30) do\n  # Show loss\n  @save \"model-checkpoint.bson\" model\nendThis will update the \"model-checkpoint.bson\" file every thirty seconds.You can get more advanced by saving a series of models throughout training, for example@save \"model-$(now()).bson\" modelwill produce a series of models like \"model-2018-03-06T02:57:10.41.bson\". You could also store the current test set loss, so that it\'s easy to (for example) revert to an older copy of the model if it starts to overfit.@save \"model-$(now()).bson\" model loss = testloss()You can even store optimiser state alongside the model, to resume training exactly where you left off.opt = ADAM(params(model))\n@save \"model-$(now()).bson\" model opt"
 },
 
+{
+    "location": "performance/#",
+    "page": "Performance Tips",
+    "title": "Performance Tips",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "performance/#Performance-Tips-1",
+    "page": "Performance Tips",
+    "title": "Performance Tips",
+    "category": "section",
+    "text": "All the usual Julia performance tips apply. As always profiling your code is generally a useful way of finding bottlenecks. Below follow some Flux specific tips/reminders."
+},
+
+{
+    "location": "performance/#Don\'t-use-more-precision-than-you-need.-1",
+    "page": "Performance Tips",
+    "title": "Don\'t use more precision than you need.",
+    "category": "section",
+    "text": "Flux works great with all kinds of number types. But often you do not need to be working with say Float64 (let alone BigFloat). Switching to Float32 can give you a significant speed up, not because the operations are faster, but because the memory usage is halved. Which means allocations occur much faster. And you use less memory."
+},
+
+{
+    "location": "performance/#Make-sure-your-custom-activation-functions-preserve-the-type-of-their-inputs-1",
+    "page": "Performance Tips",
+    "title": "Make sure your custom activation functions preserve the type of their inputs",
+    "category": "section",
+    "text": "Not only should your activation functions be type-stable, they should also preserve the type of their inputs.A very artificial example using an activatioon function like    my_tanh(x) = Float64(tanh(x))will result in performance on Float32 input orders of magnitude slower than the normal tanh would, because it results in having to use slow mixed type multiplication in the dense layers.Which means if you change your data say from Float64 to Float32 (which should give a speedup: see above), you will see a large slow-downThis can occur sneakily, because you can cause type-promotion by interacting with a numeric literals. E.g. the following will have run into the same problem as above:    leaky_tanh(x) = 0.01x + tanh(x)While one could change your activation function (e.g. to use 0.01f0x) to avoid this when ever your inputs change, the idiomatic (and safe way) is to use oftype.    leaky_tanh(x) = oftype(x/1, 0.01) + tanh(x)"
+},
+
+{
+    "location": "performance/#Evaluate-batches-as-Matrices-of-features,-rather-than-sequences-of-Vector-features-1",
+    "page": "Performance Tips",
+    "title": "Evaluate batches as Matrices of features, rather than sequences of Vector features",
+    "category": "section",
+    "text": "While it can sometimes be tempting to process your observations (feature vectors) one at a time e.g.function loss_total(xs::AbstractVector{<:Vector}, ys::AbstractVector{<:Vector})\n    sum(zip(xs, ys)) do (x, y_target)\n        y_pred = model(x) #  evaluate the model\n        return loss(y_pred, y_target)\n    end\nendIt is much faster to concatenate them into a matrix, as this will hit BLAS matrix-matrix multiplication, which is much faster than the equivalent sequence of matrix-vector multiplications. Even though this means allocating new memory to store them contiguously.x_batch = reduce(hcat, xs)\ny_batch = reduce(hcat, ys)\n...\nfunction loss_total(x_batch::Matrix, y_batch::Matrix)\n    y_preds = model(x_batch)\n    sum(loss.(y_preds, y_batch))\nendWhen doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penality, and will hit the optimised reduce method."
+},
+
 {
     "location": "internals/tracker/#",
     "page": "Backpropagation",
diff --git a/dev/training/optimisers/index.html b/dev/training/optimisers/index.html
index 6f9d12fe..513db317 100644
--- a/dev/training/optimisers/index.html
+++ b/dev/training/optimisers/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li class="current"><a class="toctext" href>Optimisers</a><ul class="internal"><li><a class="toctext" href="#Optimiser-Reference-1">Optimiser Reference</a></li></ul></li><li><a class="toctext" href="../training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href>Optimisers</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/optimisers.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Optimisers</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Optimisers-1" href="#Optimisers-1">Optimisers</a></h1><p>Consider a <a href="../../models/basics/">simple linear regression</a>. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-julia">using Flux, Flux.Tracker
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li class="current"><a class="toctext" href>Optimisers</a><ul class="internal"><li><a class="toctext" href="#Optimiser-Reference-1">Optimiser Reference</a></li></ul></li><li><a class="toctext" href="../training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href>Optimisers</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/optimisers.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Optimisers</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Optimisers-1" href="#Optimisers-1">Optimisers</a></h1><p>Consider a <a href="../../models/basics/">simple linear regression</a>. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-julia">using Flux, Flux.Tracker
 
 W = param(rand(2, 5))
 b = param(rand(2))
@@ -27,4 +27,4 @@ end</code></pre><p>Running this will alter the parameters <code>W</code> and <co
 
 for p in (W, b)
   update!(opt, p, grads[p])
-end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <a href="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Descent" href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Descent(η)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/optimise/optimisers.jl#L9-L14">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/optimise/optimisers.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Nesterov(eta, ρ = 0.9)</code></pre><p>Gradient descent with learning rate  <code>η</code> and Nesterov momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/optimise/optimisers.jl#L45-L49">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">ADAM(η = 0.001, β = (0.9, 0.999))</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/78876a14b3148e03a4238720f7b43091f4f4fb66/src/optimise/optimisers.jl#L88-L92">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
+end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <a href="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Descent" href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Descent(η)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/optimise/optimisers.jl#L9-L14">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/optimise/optimisers.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">Nesterov(eta, ρ = 0.9)</code></pre><p>Gradient descent with learning rate  <code>η</code> and Nesterov momentum <code>ρ</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/optimise/optimisers.jl#L45-L49">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-none">ADAM(η = 0.001, β = (0.9, 0.999))</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ebf50f4e1c7c73920b38e88a0960fd99a56db0cb/src/optimise/optimisers.jl#L88-L92">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
diff --git a/dev/training/training/index.html b/dev/training/training/index.html
index 726436b1..7946d5e2 100644
--- a/dev/training/training/index.html
+++ b/dev/training/training/index.html
@@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 
 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../optimisers/">Optimisers</a></li><li class="current"><a class="toctext" href>Training</a><ul class="internal"><li><a class="toctext" href="#Loss-Functions-1">Loss Functions</a></li><li><a class="toctext" href="#Datasets-1">Datasets</a></li><li><a class="toctext" href="#Callbacks-1">Callbacks</a></li></ul></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href>Training</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/training.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Training</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Training-1" href="#Training-1">Training</a></h1><p>To actually train a model we need three things:</p><ul><li>A <em>objective function</em>, that evaluates how well a model is doing given some input data.</li><li>A collection of data points that will be provided to the objective function.</li><li>An <a href="../optimisers/">optimiser</a> that will update the model parameters appropriately.</li></ul><p>With these we can call <code>Flux.train!</code>:</p><pre><code class="language-julia">Flux.train!(objective, params, data, opt)</code></pre><p>There are plenty of examples in the <a href="https://github.com/FluxML/model-zoo">model zoo</a>.</p><h2><a class="nav-anchor" id="Loss-Functions-1" href="#Loss-Functions-1">Loss Functions</a></h2><p>The objective function must return a number representing how far the model is from its target – the <em>loss</em> of the model. The <code>loss</code> function that we defined in <a href="../../models/basics/">basics</a> will work as an objective. We can also define an objective in terms of some model:</p><pre><code class="language-julia">m = Chain(
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../optimisers/">Optimisers</a></li><li class="current"><a class="toctext" href>Training</a><ul class="internal"><li><a class="toctext" href="#Loss-Functions-1">Loss Functions</a></li><li><a class="toctext" href="#Datasets-1">Datasets</a></li><li><a class="toctext" href="#Callbacks-1">Callbacks</a></li></ul></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><span class="toctext">Internals</span><ul><li><a class="toctext" href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href>Training</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/training.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Training</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Training-1" href="#Training-1">Training</a></h1><p>To actually train a model we need three things:</p><ul><li>A <em>objective function</em>, that evaluates how well a model is doing given some input data.</li><li>A collection of data points that will be provided to the objective function.</li><li>An <a href="../optimisers/">optimiser</a> that will update the model parameters appropriately.</li></ul><p>With these we can call <code>Flux.train!</code>:</p><pre><code class="language-julia">Flux.train!(objective, params, data, opt)</code></pre><p>There are plenty of examples in the <a href="https://github.com/FluxML/model-zoo">model zoo</a>.</p><h2><a class="nav-anchor" id="Loss-Functions-1" href="#Loss-Functions-1">Loss Functions</a></h2><p>The objective function must return a number representing how far the model is from its target – the <em>loss</em> of the model. The <code>loss</code> function that we defined in <a href="../../models/basics/">basics</a> will work as an objective. We can also define an objective in terms of some model:</p><pre><code class="language-julia">m = Chain(
   Dense(784, 32, σ),
   Dense(32, 10), softmax)