build based on ddab979

2020-03-02 12:52:32 +00:00 · 2020-03-02 12:52:32 +00:00 · 98d55f72c6
commit 98d55f72c6
parent de31cb483a
17 changed files with 121 additions and 61 deletions
--- a/dev/community/index.html
+++ b/dev/community/index.html
@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li class="current"><a class="toctext" href>Community</a><ul class="internal"></ul></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Community</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/community.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Community</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Community-1" href="#Community-1">Community</a></h1><p>All Flux users are welcome to join our community on the <a href="https://discourse.julialang.org/">Julia forum</a>, or the <a href="https://discourse.julialang.org/t/announcing-a-julia-slack/4866">slack</a> (channel #machine-learning). If you have questions or issues we&#39;ll try to help you out.</p><p>If you&#39;re interested in hacking on Flux, the <a href="https://github.com/FluxML/Flux.jl">source code</a> is open and easy to understand – it&#39;s all just the same Julia code you work with normally. You might be interested in our <a href="https://github.com/FluxML/Flux.jl/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22">intro issues</a> to get started.</p><footer><hr/><a class="previous" href="../performance/"><span class="direction">Previous</span><span class="title">Performance Tips</span></a></footer></article></body></html>
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li><li><a class="toctext" href="../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li class="current"><a class="toctext" href>Community</a><ul class="internal"></ul></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Community</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/community.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Community</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Community-1" href="#Community-1">Community</a></h1><p>All Flux users are welcome to join our community on the <a href="https://discourse.julialang.org/">Julia forum</a>, or the <a href="https://discourse.julialang.org/t/announcing-a-julia-slack/4866">slack</a> (channel #machine-learning). If you have questions or issues we&#39;ll try to help you out.</p><p>If you&#39;re interested in hacking on Flux, the <a href="https://github.com/FluxML/Flux.jl">source code</a> is open and easy to understand – it&#39;s all just the same Julia code you work with normally. You might be interested in our <a href="https://github.com/FluxML/Flux.jl/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22">intro issues</a> to get started.</p><footer><hr/><a class="previous" href="../performance/"><span class="direction">Previous</span><span class="title">Performance Tips</span></a></footer></article></body></html>
--- a/dev/data/dataloader/index.html
+++ b/dev/data/dataloader/index.html
@ -0,0 +1,30 @@
+<!DOCTYPE html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>DataLoader · Flux</title><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li><li><a class="toctext" href="../../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../onehot/">One-Hot Encoding</a></li><li class="current"><a class="toctext" href>DataLoader</a><ul class="internal"></ul></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Handling Data</li><li><a href>DataLoader</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/dataloader.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>DataLoader</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="DataLoader-1" href="#DataLoader-1">DataLoader</a></h1><p>Flux provides the <code>DataLoader</code> type in the <code>Flux.Data</code> module to handle iteration over mini-batches of data. </p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Data.DataLoader" href="#Flux.Data.DataLoader"><code>Flux.Data.DataLoader</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">DataLoader(data...; batchsize=1, shuffle=false, partial=true)</code></pre><p>An object that iterates over mini-batches of <code>data</code>, each mini-batch containing <code>batchsize</code> observations (except possibly the last one). </p><p>Takes as input one or more data tensors, e.g. X in unsupervised learning, X and Y in  supervised learning. The last dimension in each tensor is considered to be the observation dimension. </p><p>If <code>shuffle=true</code>, shuffles the observations each time iterations are re-started. If <code>partial=false</code>, drops the last mini-batch if it is smaller than the batchsize.</p><p>Example usage:</p><pre><code class="language-none">Xtrain = rand(10, 100)
+dtrain = DataLoader(Xtrain, batchsize=2) 
+# iterate over 50 mini-batches
+for x in dtrain: 
+    @assert size(x) == (10, 2)
+    ...
+end
+
+Xtrain = rand(10, 100)
+Ytrain = rand(100)
+dtrain = DataLoader(Xtrain, Ytrain, batchsize=2, shuffle=true) 
+for epoch in 1:100
+    for (x, y) in dtrain: 
+        @assert size(x) == (10, 2)
+        @assert size(y) == (2,)
+        ...
+    end
+end
+
+# train for 10 epochs
+using IterTools: ncycle 
+Flux.train!(loss, ps, ncycle(dtrain, 10), opt)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/data/dataloader.jl#L13-L50">source</a></section><footer><hr/><a class="previous" href="../onehot/"><span class="direction">Previous</span><span class="title">One-Hot Encoding</span></a><a class="next" href="../../training/optimisers/"><span class="direction">Next</span><span class="title">Optimisers</span></a></footer></article></body></html>
--- a/dev/data/onehot/index.html
+++ b/dev/data/onehot/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li class="current"><a class="toctext" href>One-Hot Encoding</a><ul class="internal"><li><a class="toctext" href="#Batches-1">Batches</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>One-Hot Encoding</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/onehot.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>One-Hot Encoding</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="One-Hot-Encoding-1" href="#One-Hot-Encoding-1">One-Hot Encoding</a></h1><p>It&#39;s common to encode categorical variables (like <code>true</code>, <code>false</code> or <code>cat</code>, <code>dog</code>) in &quot;one-of-k&quot; or <a href="https://en.wikipedia.org/wiki/One-hot">&quot;one-hot&quot;</a> form. Flux provides the <code>onehot</code> function to make this easy.</p><pre><code class="language-none">julia&gt; using Flux: onehot, onecold
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li><li><a class="toctext" href="../../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li class="current"><a class="toctext" href>One-Hot Encoding</a><ul class="internal"><li><a class="toctext" href="#Batches-1">Batches</a></li></ul></li><li><a class="toctext" href="../dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Handling Data</li><li><a href>One-Hot Encoding</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/data/onehot.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>One-Hot Encoding</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="One-Hot-Encoding-1" href="#One-Hot-Encoding-1">One-Hot Encoding</a></h1><p>It&#39;s common to encode categorical variables (like <code>true</code>, <code>false</code> or <code>cat</code>, <code>dog</code>) in &quot;one-of-k&quot; or <a href="https://en.wikipedia.org/wiki/One-hot">&quot;one-hot&quot;</a> form. Flux provides the <code>onehot</code> function to make this easy.</p><pre><code class="language-none">julia&gt; using Flux: onehot, onecold

 julia&gt; onehot(:b, [:a, :b, :c])
 3-element Flux.OneHotVector:
@ -37,4 +37,4 @@ julia&gt; onecold(ans, [:a, :b, :c])
 3-element Array{Symbol,1}:
  :b
  :a
-  :b</code></pre><p>Note that these operations returned <code>OneHotVector</code> and <code>OneHotMatrix</code> rather than <code>Array</code>s. <code>OneHotVector</code>s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.</p><footer><hr/><a class="previous" href="../../training/training/"><span class="direction">Previous</span><span class="title">Training</span></a><a class="next" href="../../gpu/"><span class="direction">Next</span><span class="title">GPU Support</span></a></footer></article></body></html>
+  :b</code></pre><p>Note that these operations returned <code>OneHotVector</code> and <code>OneHotMatrix</code> rather than <code>Array</code>s. <code>OneHotVector</code>s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.</p><footer><hr/><a class="previous" href="../../models/nnlib/"><span class="direction">Previous</span><span class="title">NNlib</span></a><a class="next" href="../dataloader/"><span class="direction">Next</span><span class="title">DataLoader</span></a></footer></article></body></html>
--- a/dev/ecosystem/index.html
+++ b/dev/ecosystem/index.html
@ -0,0 +1,9 @@
+<!DOCTYPE html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>The Julia Ecosystem · Flux</title><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li><li><a class="toctext" href="../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li class="current"><a class="toctext" href>The Julia Ecosystem</a><ul class="internal"></ul></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>The Julia Ecosystem</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/ecosystem.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>The Julia Ecosystem</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="The-Julia-Ecosystem-1" href="#The-Julia-Ecosystem-1">The Julia Ecosystem</a></h1><p>One of the main strengths of Julia lies in an ecosystem of packages  globally providing a rich and consistent user experience.</p><p>This is a non-exhaustive list of Julia packages, nicely complementing <code>Flux</code> in typical machine learning and deep learning workflows:</p><ul><li><a href="https://github.com/carlobaldassi/ArgParse.jl">ArgParse.jl</a>: package for parsing command-line arguments to Julia programs.</li><li><a href="https://github.com/Evizero/Augmentor.jl">Augmentor.jl</a>: a fast image augmentation library in Julia for machine learning.</li><li><a href="https://github.com/JuliaIO/BSON.jl">BSON.jl</a>: package for working with the Binary JSON serialisation format</li><li><a href="https://github.com/joshday/OnlineStats.jl">DataFrames.jl</a>: in-memory tabular data in Julia</li><li><a href="https://github.com/JuliaDynamics/DrWatson.jl">DrWatson.jl</a>:  a scientific project assistant software</li><li><a href="https://github.com/JuliaML/MLDatasets.jl">MLDatasets.jl</a>: utility package for accessing common machine learning datasets</li><li><a href="https://github.com/joshday/OnlineStats.jl">OnlineStats.jl</a>: single-pass algorithms for statistics</li><li><a href="https://github.com/mauro3/Parameters.jl">Parameters.jl</a>: types with default field values, keyword constructors and (un-)pack macros</li><li><a href="https://github.com/timholy/ProgressMeter.jl">ProgressMeters.jl</a>: progress meters for long-running computations</li><li><a href="https://github.com/PhilipVinc/TensorBoardLogger.jl">TensorBoardLogger.jl</a>: easy peasy logging to <a href="https://www.tensorflow.org/tensorboard">tensorboard</a> in Julia</li></ul><p>This tight integration among Julia pakages is shown in some of the examples in the <a href="https://github.com/FluxML/model-zoo">model-zoo</a> repository.</p><footer><hr/><a class="previous" href="../saving/"><span class="direction">Previous</span><span class="title">Saving &amp; Loading</span></a><a class="next" href="../performance/"><span class="direction">Next</span><span class="title">Performance Tips</span></a></footer></article></body></html>
--- a/dev/gpu/index.html
+++ b/dev/gpu/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li class="current"><a class="toctext" href>GPU Support</a><ul class="internal"><li><a class="toctext" href="#GPU-Usage-1">GPU Usage</a></li></ul></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>GPU Support</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/gpu.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>GPU Support</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="GPU-Support-1" href="#GPU-Support-1">GPU Support</a></h1><p>NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays</a> readme.</p><h2><a class="nav-anchor" id="GPU-Usage-1" href="#GPU-Usage-1">GPU Usage</a></h2><p>Support for array operations on other hardware backends, like GPUs, is provided by external packages like <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays</a>. Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.</p><p>For example, we can use <code>CuArrays</code> (with the <code>cu</code> converter) to run our <a href="../models/basics/">basic example</a> on an NVIDIA GPU.</p><p>(Note that you need to have CUDA available to use CuArrays – please see the <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays.jl</a> instructions for more details.)</p><pre><code class="language-julia">using CuArrays
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li><li><a class="toctext" href="../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li class="current"><a class="toctext" href>GPU Support</a><ul class="internal"><li><a class="toctext" href="#GPU-Usage-1">GPU Usage</a></li></ul></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>GPU Support</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/gpu.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>GPU Support</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="GPU-Support-1" href="#GPU-Support-1">GPU Support</a></h1><p>NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays</a> readme.</p><h2><a class="nav-anchor" id="GPU-Usage-1" href="#GPU-Usage-1">GPU Usage</a></h2><p>Support for array operations on other hardware backends, like GPUs, is provided by external packages like <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays</a>. Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.</p><p>For example, we can use <code>CuArrays</code> (with the <code>cu</code> converter) to run our <a href="../models/basics/">basic example</a> on an NVIDIA GPU.</p><p>(Note that you need to have CUDA available to use CuArrays – please see the <a href="https://github.com/JuliaGPU/CuArrays.jl">CuArrays.jl</a> instructions for more details.)</p><pre><code class="language-julia">using CuArrays

 W = cu(rand(2, 5)) # a 2×5 CuArray
 b = cu(rand(2))
@ -17,7 +17,7 @@ loss(x, y) = sum((predict(x) .- y).^2)
 x, y = cu(rand(5)), cu(rand(2)) # Dummy data
 loss(x, y) # ~ 3</code></pre><p>Note that we convert both the parameters (<code>W</code>, <code>b</code>) and the data set (<code>x</code>, <code>y</code>) to cuda arrays. Taking derivatives and training works exactly as before.</p><p>If you define a structured model, like a <code>Dense</code> layer or <code>Chain</code>, you just need to convert the internal parameters. Flux provides <code>fmap</code>, which allows you to alter all parameters of a model at once.</p><pre><code class="language-julia">d = Dense(10, 5, σ)
 d = fmap(cu, d)
-d.W # Tracked CuArray
+d.W # CuArray
 d(cu(rand(10))) # CuArray output

 m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
@ -34,7 +34,7 @@ julia&gt; x = rand(10) |&gt; gpu
 0.511655

 julia&gt; m(x)
-Tracked 5-element CuArray{Float32,1}:
+5-element CuArray{Float32,1}:
 -0.30535
 ⋮
 -0.618002</code></pre><p>The analogue <code>cpu</code> is also available for moving models and data back off of the GPU.</p><pre><code class="language-julia">julia&gt; x = rand(10) |&gt; gpu
@ -47,4 +47,4 @@ julia&gt; x |&gt; cpu
 10-element Array{Float32,1}:
 0.235164
 ⋮
- 0.192538</code></pre><footer><hr/><a class="previous" href="../data/onehot/"><span class="direction">Previous</span><span class="title">One-Hot Encoding</span></a><a class="next" href="../saving/"><span class="direction">Next</span><span class="title">Saving &amp; Loading</span></a></footer></article></body></html>
+ 0.192538</code></pre><footer><hr/><a class="previous" href="../training/training/"><span class="direction">Previous</span><span class="title">Training</span></a><a class="next" href="../saving/"><span class="direction">Next</span><span class="title">Saving &amp; Loading</span></a></footer></article></body></html>
--- a/dev/index.html
+++ b/dev/index.html
--- a/dev/models/basics/index.html
+++ b/dev/models/basics/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li class="current"><a class="toctext" href>Basics</a><ul class="internal"><li><a class="toctext" href="#Taking-Gradients-1">Taking Gradients</a></li><li><a class="toctext" href="#Simple-Models-1">Simple Models</a></li><li><a class="toctext" href="#Building-Layers-1">Building Layers</a></li><li><a class="toctext" href="#Stacking-It-Up-1">Stacking It Up</a></li><li><a class="toctext" href="#Layer-helpers-1">Layer helpers</a></li></ul></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Basics</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/basics.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Basics</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">Model-Building Basics</a></h1><h2><a class="nav-anchor" id="Taking-Gradients-1" href="#Taking-Gradients-1">Taking Gradients</a></h2><p>Flux&#39;s core feature is taking gradients of Julia code. The <code>gradient</code> function takes another Julia function <code>f</code> and a set of arguments, and returns the gradient with respect to each argument. (It&#39;s a good idea to try pasting these examples in the Julia terminal.)</p><pre><code class="language-julia-repl">julia&gt; using Flux
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li class="current"><a class="toctext" href>Basics</a><ul class="internal"><li><a class="toctext" href="#Taking-Gradients-1">Taking Gradients</a></li><li><a class="toctext" href="#Simple-Models-1">Simple Models</a></li><li><a class="toctext" href="#Building-Layers-1">Building Layers</a></li><li><a class="toctext" href="#Stacking-It-Up-1">Stacking It Up</a></li><li><a class="toctext" href="#Layer-helpers-1">Layer helpers</a></li><li><a class="toctext" href="#Utility-functions-1">Utility functions</a></li></ul></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li><li><a class="toctext" href="../nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Basics</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/basics.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Basics</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">Model-Building Basics</a></h1><h2><a class="nav-anchor" id="Taking-Gradients-1" href="#Taking-Gradients-1">Taking Gradients</a></h2><p>Flux&#39;s core feature is taking gradients of Julia code. The <code>gradient</code> function takes another Julia function <code>f</code> and a set of arguments, and returns the gradient with respect to each argument. (It&#39;s a good idea to try pasting these examples in the Julia terminal.)</p><pre><code class="language-julia-repl">julia&gt; using Flux

 julia&gt; f(x) = 3x^2 + 2x + 1;

@ -110,4 +110,4 @@ model2(rand(10)) # =&gt; 2-element vector</code></pre><p>This quickly starts to

 m(rand(10))</code></pre><p>Likewise, <code>Chain</code> will happily work with any Julia function.</p><pre><code class="language-julia">m = Chain(x -&gt; x^2, x -&gt; x+1)

-m(5) # =&gt; 26</code></pre><h2><a class="nav-anchor" id="Layer-helpers-1" href="#Layer-helpers-1">Layer helpers</a></h2><p>Flux provides a set of helpers for custom layers, which you can enable by calling</p><pre><code class="language-julia">Flux.@functor Affine</code></pre><p>This enables a useful extra set of functionality for our <code>Affine</code> layer, such as <a href="../../training/optimisers/">collecting its parameters</a> or <a href="../../gpu/">moving it to the GPU</a>.</p><footer><hr/><a class="previous" href="../../"><span class="direction">Previous</span><span class="title">Home</span></a><a class="next" href="../recurrence/"><span class="direction">Next</span><span class="title">Recurrence</span></a></footer></article></body></html>
+m(5) # =&gt; 26</code></pre><h2><a class="nav-anchor" id="Layer-helpers-1" href="#Layer-helpers-1">Layer helpers</a></h2><p>Flux provides a set of helpers for custom layers, which you can enable by calling</p><pre><code class="language-julia">Flux.@functor Affine</code></pre><p>This enables a useful extra set of functionality for our <code>Affine</code> layer, such as <a href="../../training/optimisers/">collecting its parameters</a> or <a href="../../gpu/">moving it to the GPU</a>.</p><h2><a class="nav-anchor" id="Utility-functions-1" href="#Utility-functions-1">Utility functions</a></h2><p>Flux provides some utility functions to help you generate models in an automated fashion.</p><p><code>outdims</code> enables you to calculate the spatial output dimensions of layers like <code>Conv</code> when applied to input images of a given size. Currently limited to the following layers:</p><ul><li><code>Chain</code></li><li><code>Dense</code></li><li><code>Conv</code></li><li><code>Diagonal</code></li><li><code>Maxout</code></li><li><code>ConvTranspose</code></li><li><code>DepthwiseConv</code></li><li><code>CrossCor</code></li><li><code>MaxPool</code></li><li><code>MeanPool</code></li></ul><div class="admonition warning"><div class="admonition-title">Missing docstring.</div><div class="admonition-text"><p>Missing docstring for <code>outdims</code>. Check Documenter&#39;s build log for details.</p></div></div><footer><hr/><a class="previous" href="../../"><span class="direction">Previous</span><span class="title">Home</span></a><a class="next" href="../recurrence/"><span class="direction">Next</span><span class="title">Recurrence</span></a></footer></article></body></html>
--- a/dev/models/layers/index.html
+++ b/dev/models/layers/index.html
--- a/dev/models/nnlib/index.html
+++ b/dev/models/nnlib/index.html
@ -0,0 +1,23 @@
+<!DOCTYPE html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>NNlib · Flux</title><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li><li class="current"><a class="toctext" href>NNlib</a><ul class="internal"><li><a class="toctext" href="#Activation-Functions-1">Activation Functions</a></li><li><a class="toctext" href="#Softmax-1">Softmax</a></li><li><a class="toctext" href="#Pooling-1">Pooling</a></li><li><a class="toctext" href="#Convolution-1">Convolution</a></li></ul></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>NNlib</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/nnlib.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>NNlib</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="NNlib-1" href="#NNlib-1">NNlib</a></h1><p>Flux re-exports all of the functions exported by the <a href="https://github.com/FluxML/NNlib.jl">NNlib</a> package.</p><h2><a class="nav-anchor" id="Activation-Functions-1" href="#Activation-Functions-1">Activation Functions</a></h2><p>Non-linearities that go between layers of your model. Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call <code>σ.(xs)</code>, <code>relu.(xs)</code> and so on.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.elu" href="#NNlib.elu"><code>NNlib.elu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">elu(x, α = 1) =
+  x &gt; 0 ? x : α * (exp(x) - 1)</code></pre><p>Exponential Linear Unit activation function. See <a href="https://arxiv.org/abs/1511.07289">Fast and Accurate Deep Network Learning by Exponential Linear Units</a>. You can also specify the coefficient explicitly, e.g. <code>elu(x, 1)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.gelu" href="#NNlib.gelu"><code>NNlib.gelu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">gelu(x) = 0.5x*(1 + tanh(√(2/π)*(x + 0.044715x^3)))</code></pre><p><a href="https://arxiv.org/pdf/1606.08415.pdf">Gaussian Error Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.leakyrelu" href="#NNlib.leakyrelu"><code>NNlib.leakyrelu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">leakyrelu(x) = max(0.01x, x)</code></pre><p>Leaky <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function. You can also specify the coefficient explicitly, e.g. <code>leakyrelu(x, 0.01)</code>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.logcosh" href="#NNlib.logcosh"><code>NNlib.logcosh</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">logcosh(x)</code></pre><p>Return <code>log(cosh(x))</code> which is computed in a numerically stable way.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.logsigmoid" href="#NNlib.logsigmoid"><code>NNlib.logsigmoid</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">logσ(x)</code></pre><p>Return <code>log(σ(x))</code> which is computed in a numerically stable way.</p><pre><code class="language-none">julia&gt; logσ(0)
+-0.6931471805599453
+julia&gt; logσ.([-100, -10, 100])
+3-element Array{Float64,1}:
+ -100.0
+  -10.000045398899218
+   -3.720075976020836e-44</code></pre></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.sigmoid" href="#NNlib.sigmoid"><code>NNlib.sigmoid</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">σ(x) = 1 / (1 + exp(-x))</code></pre><p>Classic <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.relu" href="#NNlib.relu"><code>NNlib.relu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">relu(x) = max(0, x)</code></pre><p><a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">Rectified Linear Unit</a> activation function.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.selu" href="#NNlib.selu"><code>NNlib.selu</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">selu(x) = λ * (x ≥ 0 ? x : α * (exp(x) - 1))
+
+λ ≈ 1.0507
+α ≈ 1.6733</code></pre><p>Scaled exponential linear units. See <a href="https://arxiv.org/pdf/1706.02515.pdf">Self-Normalizing Neural Networks</a>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.softplus" href="#NNlib.softplus"><code>NNlib.softplus</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">softplus(x) = log(exp(x) + 1)</code></pre><p>See <a href="http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf">Deep Sparse Rectifier Neural Networks</a>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.softsign" href="#NNlib.softsign"><code>NNlib.softsign</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">softsign(x) = x / (1 + |x|)</code></pre><p>See <a href="http://www.iro.umontreal.ca/~lisa/publications2/index.php/attachments/single/205">Quadratic Polynomials Learn Better Image Features</a>.</p></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.swish" href="#NNlib.swish"><code>NNlib.swish</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">swish(x) = x * σ(x)</code></pre><p>Self-gated activation function. See <a href="https://arxiv.org/pdf/1710.05941.pdf">Swish: a Self-Gated Activation Function</a>.</p></div></div></section><h2><a class="nav-anchor" id="Softmax-1" href="#Softmax-1">Softmax</a></h2><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.softmax" href="#NNlib.softmax"><code>NNlib.softmax</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">softmax(xs) = exp.(xs) ./ sum(exp.(xs))</code></pre><p><a href="https://en.wikipedia.org/wiki/Softmax_function">Softmax</a> takes log-probabilities (any real vector) and returns a probability distribution that sums to 1.</p><p>If given a matrix it will by default (<code>dims=1</code>) treat it as a batch of vectors, with each column independent. Keyword <code>dims=2</code> will instead treat rows independently, etc.</p><pre><code class="language-none">julia&gt; softmax([1,2,3.])
+3-element Array{Float64,1}:
+  0.0900306
+  0.244728
+  0.665241</code></pre></div></div></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="NNlib.logsoftmax" href="#NNlib.logsoftmax"><code>NNlib.logsoftmax</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">logsoftmax(xs) = log.(exp.(xs) ./ sum(exp.(xs)))</code></pre><p>Computes the log of softmax in a more numerically stable way than directly taking <code>log.(softmax(xs))</code>. Commonly used in computing cross entropy loss.</p></div></div></section><h2><a class="nav-anchor" id="Pooling-1" href="#Pooling-1">Pooling</a></h2><div class="admonition warning"><div class="admonition-title">Missing docstring.</div><div class="admonition-text"><p>Missing docstring for <code>NNlib.maxpool</code>. Check Documenter&#39;s build log for details.</p></div></div><div class="admonition warning"><div class="admonition-title">Missing docstring.</div><div class="admonition-text"><p>Missing docstring for <code>NNlib.meanpool</code>. Check Documenter&#39;s build log for details.</p></div></div><h2><a class="nav-anchor" id="Convolution-1" href="#Convolution-1">Convolution</a></h2><div class="admonition warning"><div class="admonition-title">Missing docstring.</div><div class="admonition-text"><p>Missing docstring for <code>NNlib.conv</code>. Check Documenter&#39;s build log for details.</p></div></div><div class="admonition warning"><div class="admonition-title">Missing docstring.</div><div class="admonition-text"><p>Missing docstring for <code>NNlib.depthwiseconv</code>. Check Documenter&#39;s build log for details.</p></div></div><footer><hr/><a class="previous" href="../layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../../data/onehot/"><span class="direction">Next</span><span class="title">One-Hot Encoding</span></a></footer></article></body></html>
--- a/dev/models/recurrence/index.html
+++ b/dev/models/recurrence/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li class="current"><a class="toctext" href>Recurrence</a><ul class="internal"><li><a class="toctext" href="#Recurrent-Cells-1">Recurrent Cells</a></li><li><a class="toctext" href="#Stateful-Models-1">Stateful Models</a></li><li><a class="toctext" href="#Sequences-1">Sequences</a></li></ul></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Recurrence</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Recurrence</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Recurrent-Models-1" href="#Recurrent-Models-1">Recurrent Models</a></h1><h2><a class="nav-anchor" id="Recurrent-Cells-1" href="#Recurrent-Cells-1">Recurrent Cells</a></h2><p>In the simple feedforward case, our model <code>m</code> is a simple function from various inputs <code>xᵢ</code> to predictions <code>yᵢ</code>. (For example, each <code>x</code> might be an MNIST digit and each <code>y</code> a digit label.) Each prediction is completely independent of any others, and using the same <code>x</code> will always produce the same <code>y</code>.</p><pre><code class="language-julia">y₁ = f(x₁)
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li class="current"><a class="toctext" href>Recurrence</a><ul class="internal"><li><a class="toctext" href="#Recurrent-Cells-1">Recurrent Cells</a></li><li><a class="toctext" href="#Stateful-Models-1">Stateful Models</a></li><li><a class="toctext" href="#Sequences-1">Sequences</a></li></ul></li><li><a class="toctext" href="../regularisation/">Regularisation</a></li><li><a class="toctext" href="../layers/">Model Reference</a></li><li><a class="toctext" href="../nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Recurrence</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Recurrence</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Recurrent-Models-1" href="#Recurrent-Models-1">Recurrent Models</a></h1><h2><a class="nav-anchor" id="Recurrent-Cells-1" href="#Recurrent-Cells-1">Recurrent Cells</a></h2><p>In the simple feedforward case, our model <code>m</code> is a simple function from various inputs <code>xᵢ</code> to predictions <code>yᵢ</code>. (For example, each <code>x</code> might be an MNIST digit and each <code>y</code> a digit label.) Each prediction is completely independent of any others, and using the same <code>x</code> will always produce the same <code>y</code>.</p><pre><code class="language-julia">y₁ = f(x₁)
 y₂ = f(x₂)
 y₃ = f(x₃)
 # ...</code></pre><p>Recurrent networks introduce a <em>hidden state</em> that gets carried over each time we run the model. The model now takes the old <code>h</code> as an input, and produces a new <code>h</code> as output, each time we run it.</p><pre><code class="language-julia">h = # ... initial state ...
--- a/dev/models/regularisation/index.html
+++ b/dev/models/regularisation/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li class="current"><a class="toctext" href>Regularisation</a><ul class="internal"></ul></li><li><a class="toctext" href="../layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Regularisation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/regularisation.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Regularisation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Regularisation-1" href="#Regularisation-1">Regularisation</a></h1><p>Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as <code>norm</code>, to each model parameter and add the result to the overall loss.</p><p>For example, say we have a simple regression.</p><pre><code class="language-julia">using Flux: crossentropy
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../basics/">Basics</a></li><li><a class="toctext" href="../recurrence/">Recurrence</a></li><li class="current"><a class="toctext" href>Regularisation</a><ul class="internal"></ul></li><li><a class="toctext" href="../layers/">Model Reference</a></li><li><a class="toctext" href="../nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Building Models</li><li><a href>Regularisation</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/regularisation.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Regularisation</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Regularisation-1" href="#Regularisation-1">Regularisation</a></h1><p>Applying regularisation to model parameters is straightforward. We just need to apply an appropriate regulariser, such as <code>norm</code>, to each model parameter and add the result to the overall loss.</p><p>For example, say we have a simple regression.</p><pre><code class="language-julia">using Flux: crossentropy
 m = Dense(10, 5)
 loss(x, y) = crossentropy(softmax(m(x)), y)</code></pre><p>We can regularise this by taking the (L2) norm of the parameters, <code>m.W</code> and <code>m.b</code>.</p><pre><code class="language-julia">using LinearAlgebra

@ -17,7 +17,7 @@ loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()</code></pre><p>When work
 param([0.0, 0.0, 0.0, 0.0, 0.0])

 julia&gt; sum(norm, params(m))
-26.01749952921026 (tracked)</code></pre><p>Here&#39;s a larger example with a multi-layer perceptron.</p><pre><code class="language-julia">m = Chain(
+26.01749952921026</code></pre><p>Here&#39;s a larger example with a multi-layer perceptron.</p><pre><code class="language-julia">m = Chain(
  Dense(28^2, 128, relu),
  Dense(128, 32, relu),
  Dense(32, 10), softmax)
@ -26,7 +26,7 @@ loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))

 loss(rand(28^2), rand(10))</code></pre><p>One can also easily add per-layer regularisation via the <code>activations</code> function:</p><pre><code class="language-julia">julia&gt; using Flux: activations

-julia&gt; c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
+julia&gt; c = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
 Chain(Dense(10, 5, σ), Dense(5, 2), softmax)

 julia&gt; activations(c, rand(10))
--- a/dev/performance/index.html
+++ b/dev/performance/index.html
--- a/dev/saving/index.html
+++ b/dev/saving/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li class="current"><a class="toctext" href>Saving &amp; Loading</a><ul class="internal"><li><a class="toctext" href="#Saving-Model-Weights-1">Saving Model Weights</a></li><li><a class="toctext" href="#Checkpointing-1">Checkpointing</a></li></ul></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Saving &amp; Loading</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Saving &amp; Loading</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Saving-and-Loading-Models-1" href="#Saving-and-Loading-Models-1">Saving and Loading Models</a></h1><p>You may wish to save models so that they can be loaded and run in a later session. The easiest way to do this is via <a href="https://github.com/MikeInnes/BSON.jl">BSON.jl</a>.</p><p>Save a model:</p><pre><code class="language-julia">julia&gt; using Flux
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li><li><a class="toctext" href="../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li class="current"><a class="toctext" href>Saving &amp; Loading</a><ul class="internal"><li><a class="toctext" href="#Saving-Model-Weights-1">Saving Model Weights</a></li><li><a class="toctext" href="#Checkpointing-1">Checkpointing</a></li></ul></li><li><a class="toctext" href="../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li><a href>Saving &amp; Loading</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Saving &amp; Loading</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Saving-and-Loading-Models-1" href="#Saving-and-Loading-Models-1">Saving and Loading Models</a></h1><p>You may wish to save models so that they can be loaded and run in a later session. The easiest way to do this is via <a href="https://github.com/MikeInnes/BSON.jl">BSON.jl</a>.</p><p>Save a model:</p><pre><code class="language-julia">julia&gt; using Flux

 julia&gt; model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
 Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
@ -47,4 +47,4 @@ evalcb = throttle(30) do
  # Show loss
  @save &quot;model-checkpoint.bson&quot; model
 end</code></pre><p>This will update the <code>&quot;model-checkpoint.bson&quot;</code> file every thirty seconds.</p><p>You can get more advanced by saving a series of models throughout training, for example</p><pre><code class="language-julia">@save &quot;model-$(now()).bson&quot; model</code></pre><p>will produce a series of models like <code>&quot;model-2018-03-06T02:57:10.41.bson&quot;</code>. You could also store the current test set loss, so that it&#39;s easy to (for example) revert to an older copy of the model if it starts to overfit.</p><pre><code class="language-julia">@save &quot;model-$(now()).bson&quot; model loss = testloss()</code></pre><p>You can even store optimiser state alongside the model, to resume training exactly where you left off.</p><pre><code class="language-julia">opt = ADAM()
-@save &quot;model-$(now()).bson&quot; model opt</code></pre><footer><hr/><a class="previous" href="../gpu/"><span class="direction">Previous</span><span class="title">GPU Support</span></a><a class="next" href="../performance/"><span class="direction">Next</span><span class="title">Performance Tips</span></a></footer></article></body></html>
+@save &quot;model-$(now()).bson&quot; model opt</code></pre><footer><hr/><a class="previous" href="../gpu/"><span class="direction">Previous</span><span class="title">GPU Support</span></a><a class="next" href="../ecosystem/"><span class="direction">Next</span><span class="title">The Julia Ecosystem</span></a></footer></article></body></html>
--- a/dev/search/index.html
+++ b/dev/search/index.html
@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article><header><nav><ul><li>Search</li></ul></nav><hr/><div id="topbar"><span>Search</span><a class="fa fa-bars" href="#"></a></div></header><h1>Search</h1><p id="search-info">Number of results: <span id="search-results-number">loading...</span></p><ul id="search-results"></ul></article></body><script src="../search_index.js"></script><script src="../assets/search.js"></script></html>
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link href="../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../models/basics/">Basics</a></li><li><a class="toctext" href="../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../models/layers/">Model Reference</a></li><li><a class="toctext" href="../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li><a class="toctext" href="../training/optimisers/">Optimisers</a></li><li><a class="toctext" href="../training/training/">Training</a></li></ul></li><li><a class="toctext" href="../gpu/">GPU Support</a></li><li><a class="toctext" href="../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../performance/">Performance Tips</a></li><li><a class="toctext" href="../community/">Community</a></li></ul></nav><article><header><nav><ul><li>Search</li></ul></nav><hr/><div id="topbar"><span>Search</span><a class="fa fa-bars" href="#"></a></div></header><h1>Search</h1><p id="search-info">Number of results: <span id="search-results-number">loading...</span></p><ul id="search-results"></ul></article></body><script src="../search_index.js"></script><script src="../assets/search.js"></script></html>
--- a/dev/search_index.js
+++ b/dev/search_index.js
--- a/dev/training/optimisers/index.html
+++ b/dev/training/optimisers/index.html
@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

 ga('create', 'UA-36890222-9', 'auto');
 ga('send', 'pageview');
-</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li class="current"><a class="toctext" href>Optimisers</a><ul class="internal"><li><a class="toctext" href="#Optimiser-Reference-1">Optimiser Reference</a></li><li><a class="toctext" href="#Optimiser-Interface-1">Optimiser Interface</a></li><li><a class="toctext" href="#Composing-Optimisers-1">Composing Optimisers</a></li><li><a class="toctext" href="#Decays-1">Decays</a></li></ul></li><li><a class="toctext" href="../training/">Training</a></li></ul></li><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href>Optimisers</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/optimisers.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Optimisers</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Optimisers-1" href="#Optimisers-1">Optimisers</a></h1><p>Consider a <a href="../../models/basics/">simple linear regression</a>. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-julia">using Flux
+</script><link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/><link href="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link href="../../assets/documenter.css" rel="stylesheet" type="text/css"/><link href="../../assets/flux.css" rel="stylesheet" type="text/css"/></head><body><nav class="toc"><h1>Flux</h1><select id="version-selector" onChange="window.location.href=this.value" style="visibility: hidden"></select><form class="search" id="search-form" action="../../search/"><input id="search-query" name="q" type="text" placeholder="Search docs"/></form><ul><li><a class="toctext" href="../../">Home</a></li><li><span class="toctext">Building Models</span><ul><li><a class="toctext" href="../../models/basics/">Basics</a></li><li><a class="toctext" href="../../models/recurrence/">Recurrence</a></li><li><a class="toctext" href="../../models/regularisation/">Regularisation</a></li><li><a class="toctext" href="../../models/layers/">Model Reference</a></li><li><a class="toctext" href="../../models/nnlib/">NNlib</a></li></ul></li><li><span class="toctext">Handling Data</span><ul><li><a class="toctext" href="../../data/onehot/">One-Hot Encoding</a></li><li><a class="toctext" href="../../data/dataloader/">DataLoader</a></li></ul></li><li><span class="toctext">Training Models</span><ul><li class="current"><a class="toctext" href>Optimisers</a><ul class="internal"><li><a class="toctext" href="#Optimiser-Reference-1">Optimiser Reference</a></li><li><a class="toctext" href="#Optimiser-Interface-1">Optimiser Interface</a></li><li><a class="toctext" href="#Composing-Optimisers-1">Composing Optimisers</a></li><li><a class="toctext" href="#Decays-1">Decays</a></li></ul></li><li><a class="toctext" href="../training/">Training</a></li></ul></li><li><a class="toctext" href="../../gpu/">GPU Support</a></li><li><a class="toctext" href="../../saving/">Saving &amp; Loading</a></li><li><a class="toctext" href="../../ecosystem/">The Julia Ecosystem</a></li><li><a class="toctext" href="../../performance/">Performance Tips</a></li><li><a class="toctext" href="../../community/">Community</a></li></ul></nav><article id="docs"><header><nav><ul><li>Training Models</li><li><a href>Optimisers</a></li></ul><a class="edit-page" href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/optimisers.md"><span class="fa"></span> Edit on GitHub</a></nav><hr/><div id="topbar"><span>Optimisers</span><a class="fa fa-bars" href="#"></a></div></header><h1><a class="nav-anchor" id="Optimisers-1" href="#Optimisers-1">Optimisers</a></h1><p>Consider a <a href="../../models/basics/">simple linear regression</a>. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters <code>W</code> and <code>b</code>.</p><pre><code class="language-julia">using Flux

 W = rand(2, 5)
 b = rand(2)
@ -18,7 +18,7 @@ x, y = rand(5), rand(2) # Dummy data
 l = loss(x, y) # ~ 3

 θ = Params([W, b])
-grads = gradient(() -&gt; loss(x, y), θ)</code></pre><p>We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here&#39;s one way to do that:</p><pre><code class="language-julia">using Flux: update!
+grads = gradient(() -&gt; loss(x, y), θ)</code></pre><p>We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here&#39;s one way to do that:</p><pre><code class="language-julia">using Flux.Optimise: update!

 η = 0.1 # Learning Rate
 for p in (W, b)
@ -27,7 +27,8 @@ end</code></pre><p>Running this will alter the parameters <code>W</code> and <co

 for p in (W, b)
  update!(opt, p, grads[p])
-end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <a href="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Descent" href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>Descent(η)</p><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code></p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): The amount by which the gradients are discounted before updating the weights. Defaults to <code>0.1</code>.</li></ul><p><strong>Example</strong></p><pre><code class="language-julia-repl">opt = Descent() # uses default η (0.1)
+end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <a href="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.update!" href="#Flux.Optimise.update!"><code>Flux.Optimise.update!</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">update!(opt, p, g)
+update!(opt, ps::Params, gs)</code></pre><p>Perform an update step of the parameters <code>ps</code> (or the single parameter <code>p</code>)  according to optimizer <code>opt</code>  and the gradients <code>gs</code> (the gradient <code>g</code>).</p><p>As a result, the parameters are mutated and the optimizer&#39;s internal state may change. </p><p>update!(x, x̄)</p><p>Update the array <code>x</code> according to <code>x .-= x̄</code>.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/train.jl#L5-L17">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Descent" href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Descent(η)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code></p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): The amount by which the gradients are discounted before updating the weights. Defaults to <code>0.1</code>.</li></ul><p><strong>Example</strong></p><pre><code class="language-julia-repl">opt = Descent() # uses default η (0.1)

 opt = Descent(0.3) # use provided η

@ -37,23 +38,23 @@ gs = gradient(ps) do
  loss(x, y)
 end

-Flux.Optimise.update!(opt, ps, gs)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L9-L32">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Momentum(η, ρ)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (<code>η</code>): Amount by which gradients are discounted before updating the weights. Defaults to <code>0.01</code>.</li><li>Momentum (<code>ρ</code>): Parameter that accelerates descent in the relevant direction and dampens oscillations. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Momentum() # uses defaults of η = 0.01 and ρ = 0.9
+Flux.Optimise.update!(opt, ps, gs)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L8-L31">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Momentum(η, ρ)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (<code>η</code>): Amount by which gradients are discounted before updating the weights. Defaults to <code>0.01</code>.</li><li>Momentum (<code>ρ</code>): Parameter that accelerates descent in the relevant direction and dampens oscillations. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Momentum() # uses defaults of η = 0.01 and ρ = 0.9

-opt = Momentum(0.01, 0.99)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L43-L58">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Nesterov(η, ρ)</code></pre><p>Gradient descent with learning rate  <code>η</code> and Nesterov momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Amount by which the gradients are dicsounted berfore updating the weights. Defaults to <code>0.001</code>.</li><li>Nesterov Momentum (ρ): Paramters controlling the amount of nesterov momentum to be applied. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Nesterov() # uses defaults η = 0.001 and ρ = 0.9
+opt = Momentum(0.01, 0.99)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L42-L57">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">Nesterov(η, ρ)</code></pre><p>Gradient descent with learning rate  <code>η</code> and Nesterov momentum <code>ρ</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Amount by which the gradients are dicsounted berfore updating the weights. Defaults to <code>0.001</code>.</li><li>Nesterov Momentum (ρ): Parameters controlling the amount of nesterov momentum to be applied. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = Nesterov() # uses defaults η = 0.001 and ρ = 0.9

-opt = Nesterov(0.003, 0.95)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L74-L89">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.RMSProp" href="#Flux.Optimise.RMSProp"><code>Flux.Optimise.RMSProp</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">RMSProp(η, ρ)</code></pre><p>Implements the RMSProp algortihm. Often a good choice for recurrent networks. Paramters other than learning rate generally don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Rho (ρ): Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = RMSProp() # uses default η = 0.001 and ρ = 0.9
+opt = Nesterov(0.003, 0.95)</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L73-L88">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.RMSProp" href="#Flux.Optimise.RMSProp"><code>Flux.Optimise.RMSProp</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">RMSProp(η, ρ)</code></pre><p>Implements the RMSProp algortihm. Often a good choice for recurrent networks. Parameters other than learning rate generally don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Rho (ρ): Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = RMSProp() # uses default η = 0.001 and ρ = 0.9

-opt = RMSProp(0.002, 0.95)</code></pre><p><strong>References</strong></p><p><a href="https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L106-L124">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAM(η, β::Tuple)</code></pre><p>Implements the ADAM optimiser.</p><p><strong>Paramters</strong></p><ul><li>Learning Rate (<code>η</code>): Defaults to <code>0.001</code>.</li><li>Beta (<code>β::Tuple</code>): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAM() # uses the default η = 0.001 and β = (0.9, 0.999)
+opt = RMSProp(0.002, 0.95)</code></pre><p><strong>References</strong></p><p><a href="https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L105-L123">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAM(η, β::Tuple)</code></pre><p>Implements the ADAM optimiser.</p><p><strong>Paramters</strong></p><ul><li>Learning Rate (<code>η</code>): Defaults to <code>0.001</code>.</li><li>Beta (<code>β::Tuple</code>): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAM() # uses the default η = 0.001 and β = (0.9, 0.999)

-opt = ADAM(0.001, (0.9, 0.8))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L140-L158">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AdaMax" href="#Flux.Optimise.AdaMax"><code>Flux.Optimise.AdaMax</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AdaMax(η, β::Tuple)</code></pre><p>Variant of ADAM based on ∞-norm.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code></li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AdaMax() # uses default η and β
+opt = ADAM(0.001, (0.9, 0.8))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L139-L157">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AdaMax" href="#Flux.Optimise.AdaMax"><code>Flux.Optimise.AdaMax</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AdaMax(η, β::Tuple)</code></pre><p>Variant of ADAM based on ∞-norm.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code></li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AdaMax() # uses default η and β

-opt = AdaMax(0.001, (0.9, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v9">AdaMax</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L222-L239">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAGrad" href="#Flux.Optimise.ADAGrad"><code>Flux.Optimise.ADAGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAGrad(η)</code></pre><p>Implements AdaGrad. It has parameter specific learning rates based on how frequently it is updated.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.1</code></li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAGrad() # uses default η = 0.1
+opt = AdaMax(0.001, (0.9, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1412.6980v9">AdaMax</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L221-L238">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAGrad" href="#Flux.Optimise.ADAGrad"><code>Flux.Optimise.ADAGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADAGrad(η)</code></pre><p>Implements AdaGrad. It has parameter specific learning rates based on how frequently it is updated.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.1</code></li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAGrad() # uses default η = 0.1

-opt = ADAGrad(0.001)</code></pre><p><strong>References</strong></p><p><a href="http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf">ADAGrad</a> optimiser. Parameters don&#39;t need tuning.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L258-L276">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADADelta" href="#Flux.Optimise.ADADelta"><code>Flux.Optimise.ADADelta</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADADelta(ρ)</code></pre><p>Version of ADAGrad that adapts learning rate based on a window of past gradient updates. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Rho (ρ): Factor by which gradient is decayed at each time step. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADADelta() # uses default ρ = 0.9
-opt = ADADelta(0.89)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1212.5701">ADADelta</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L291-L307">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AMSGrad" href="#Flux.Optimise.AMSGrad"><code>Flux.Optimise.AMSGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AMSGrad(η, β::Tuple)</code></pre><p>Implements AMSGrad version of the ADAM optimiser. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AMSGrad() # uses default η and β
-opt = AMSGrad(0.001, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://openreview.net/forum?id=ryQu7f-RZ">AMSGrad</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L324-L341">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.NADAM" href="#Flux.Optimise.NADAM"><code>Flux.Optimise.NADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">NADAM(η, β::Tuple)</code></pre><p>Nesterov variant of ADAM. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = NADAM() # uses default η and β
-opt = NADAM(0.002, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="http://cs229.stanford.edu/proj2015/054_report.pdf">NADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L359-L376">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAMW" href="#Flux.Optimise.ADAMW"><code>Flux.Optimise.ADAMW</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">ADAMW(η, β::Tuple, decay)</code></pre><p>Variant of ADAM defined by fixing weight decay regularization.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to (0.9, 0.999).</li><li>decay: Decay applied to weights during optimisation. Defaults to 0.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAMW() # uses default η, β and decay
-opt = ADAMW(0.001, (0.89, 0.995), 0.1)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1711.05101">ADAMW</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L395-L413">source</a></section><h2><a class="nav-anchor" id="Optimiser-Interface-1" href="#Optimiser-Interface-1">Optimiser Interface</a></h2><p>Flux&#39;s optimsers are built around a <code>struct</code> that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the <code>apply!</code> function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.</p><p>In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let&#39;s work this with a simple example.</p><pre><code class="language-julia">mutable struct Momentum
+opt = ADAGrad(0.001)</code></pre><p><strong>References</strong></p><p><a href="http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf">ADAGrad</a> optimiser. Parameters don&#39;t need tuning.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L257-L275">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADADelta" href="#Flux.Optimise.ADADelta"><code>Flux.Optimise.ADADelta</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ADADelta(ρ)</code></pre><p>Version of ADAGrad that adapts learning rate based on a window of past gradient updates. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Rho (ρ): Factor by which gradient is decayed at each time step. Defaults to <code>0.9</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADADelta() # uses default ρ = 0.9
+opt = ADADelta(0.89)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1212.5701">ADADelta</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L290-L306">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.AMSGrad" href="#Flux.Optimise.AMSGrad"><code>Flux.Optimise.AMSGrad</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">AMSGrad(η, β::Tuple)</code></pre><p>Implements AMSGrad version of the ADAM optimiser. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = AMSGrad() # uses default η and β
+opt = AMSGrad(0.001, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="https://openreview.net/forum?id=ryQu7f-RZ">AMSGrad</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L323-L340">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.NADAM" href="#Flux.Optimise.NADAM"><code>Flux.Optimise.NADAM</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">NADAM(η, β::Tuple)</code></pre><p>Nesterov variant of ADAM. Parameters don&#39;t need tuning.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to <code>(0.9, 0.999)</code>.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = NADAM() # uses default η and β
+opt = NADAM(0.002, (0.89, 0.995))</code></pre><p><strong>References</strong></p><p><a href="http://cs229.stanford.edu/proj2015/054_report.pdf">NADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L358-L375">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAMW" href="#Flux.Optimise.ADAMW"><code>Flux.Optimise.ADAMW</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-julia">ADAMW(η, β::Tuple, decay)</code></pre><p>Variant of ADAM defined by fixing weight decay regularization.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (η): Defaults to <code>0.001</code>.</li><li>Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to (0.9, 0.999).</li><li>decay: Decay applied to weights during optimisation. Defaults to 0.</li></ul><p><strong>Examples</strong></p><pre><code class="language-julia">opt = ADAMW() # uses default η, β and decay
+opt = ADAMW(0.001, (0.89, 0.995), 0.1)</code></pre><p><strong>References</strong></p><p><a href="https://arxiv.org/abs/1711.05101">ADAMW</a></p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L394-L412">source</a></section><h2><a class="nav-anchor" id="Optimiser-Interface-1" href="#Optimiser-Interface-1">Optimiser Interface</a></h2><p>Flux&#39;s optimisers are built around a <code>struct</code> that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the <code>apply!</code> function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.</p><p>In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let&#39;s work this with a simple example.</p><pre><code class="language-julia">mutable struct Momentum
  eta
  rho
  velocity
@ -65,7 +66,7 @@ Momentum(eta::Real, rho::Real) = Momentum(eta, rho, IdDict())</code></pre><p>The
  @. v = ρ * v - η * Δ
  @. Δ = -v
 end</code></pre><p>This is the basic definition of a Momentum update rule given by:</p><div>\[v = ρ * v - η * Δ
-w = w - v\]</div><p>The <code>apply!</code> defines the update rules for an optimiser <code>opt</code>, given the parameters and gradients. It returns the updated gradients. Here, every parameter <code>x</code> is retrieved from the running state <code>v</code> and subsequently updates the state of the optimiser.</p><p>Flux internally calls on this function via the <code>update!</code> function. It shares the API with <code>apply!</code> but ensures that multiple parameters are handled gracefully.</p><h2><a class="nav-anchor" id="Composing-Optimisers-1" href="#Composing-Optimisers-1">Composing Optimisers</a></h2><p>Flux defines a special kind of optimiser called simply as <code>Optimiser</code> which takes in a arbitrary optimisers as input. Its behaviour is similar to the usual optimisers, but differs in that it acts by calling the optimisers listed in it sequentially. Each optimiser produces a modified gradient that will be fed into the next, and the resultant update will be applied to the parameter as usual. A classic use case is where adding decays is desirable. Flux defines some basic decays including <code>ExpDecay</code>, <code>InvDecay</code> etc.</p><pre><code class="language-julia">opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())</code></pre><p>Here we apply exponential decay to the <code>Descent</code> optimser. The defaults of <code>ExpDecay</code> say that its learning rate will be decayed every 1000 steps. It is then applied like any optimser.</p><pre><code class="language-julia">w = randn(10, 10)
+w = w - v\]</div><p>The <code>apply!</code> defines the update rules for an optimiser <code>opt</code>, given the parameters and gradients. It returns the updated gradients. Here, every parameter <code>x</code> is retrieved from the running state <code>v</code> and subsequently updates the state of the optimiser.</p><p>Flux internally calls on this function via the <code>update!</code> function. It shares the API with <code>apply!</code> but ensures that multiple parameters are handled gracefully.</p><h2><a class="nav-anchor" id="Composing-Optimisers-1" href="#Composing-Optimisers-1">Composing Optimisers</a></h2><p>Flux defines a special kind of optimiser simply called <code>Optimiser</code> which takes in arbitrary optimisers as input. Its behaviour is similar to the usual optimisers, but differs in that it acts by calling the optimisers listed in it sequentially. Each optimiser produces a modified gradient that will be fed into the next, and the resultant update will be applied to the parameter as usual. A classic use case is where adding decays is desirable. Flux defines some basic decays including <code>ExpDecay</code>, <code>InvDecay</code> etc.</p><pre><code class="language-julia">opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())</code></pre><p>Here we apply exponential decay to the <code>Descent</code> optimiser. The defaults of <code>ExpDecay</code> say that its learning rate will be decayed every 1000 steps. It is then applied like any optimiser.</p><pre><code class="language-julia">w = randn(10, 10)
 w1 = randn(10,10)
 ps = Params([w, w1])

@ -79,10 +80,5 @@ for t = 1:10^5
  Flux.Optimise.update!(opt, θ, θ̄)
 end

-loss(rand(10)) # around 0.9</code></pre><p>In this manner it is possible to compose optimisers for some added flexibility.</p><h2><a class="nav-anchor" id="Decays-1" href="#Decays-1">Decays</a></h2><p>Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ExpDecay" href="#Flux.Optimise.ExpDecay"><code>Flux.Optimise.ExpDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>ExpDecay(eta, decay, decay_step, clip)</p><p>Discount the learning rate <code>eta</code> by a multiplicative factor <code>decay</code> every <code>decay_step</code> till a minimum of <code>clip</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (eta): Defaults to <code>0.001</code>.</li><li>decay: Factor by which the learning rate is discounted. Defaults to <code>0.1</code>.</li><li>decay_step: Schedules decay operations by setting number of steps between two decay operations. Defaults to <code>1000</code>.</li><li>clip: Minimum value of learning rate. Defaults to <code>1e-4</code>.</li></ul><p><strong>Example</strong></p><p>To apply exponential decay to an optimiser:</p><pre><code class="language-julia">  Optimiser(ExpDecay(..), Opt(..))
-
-  opt = Optimiser(ExpDecay(), ADAM())</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L473-L491">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.InvDecay" href="#Flux.Optimise.InvDecay"><code>Flux.Optimise.InvDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>InvDecay(γ)</p><p>Applies inverse time decay to an optimiser, i.e., the effective step size at iteration <code>n</code> is <code>eta / (1 + γ * n)</code> where <code>eta</code> is the initial step size. The wrapped optimiser&#39;s step size is not modified.</p><pre><code class="language-none">
-## Parameters
-  - gamma (γ): Defaults to `0.001`
-
-## Example</code></pre><p>julia   Optimiser(InvDecay(..), Opt(..)) ```</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L444-L457">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.WeightDecay" href="#Flux.Optimise.WeightDecay"><code>Flux.Optimise.WeightDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><p>WeightDecay(wd)</p><p>Decays the weight by <code>wd</code></p><p><strong>Parameters</strong></p><ul><li>weight decay (wd): 0</li></ul></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddc2c20e68919faa41a04ed39ed3cfa08d6d5189/src/optimise/optimisers.jl#L512-L519">source</a></section><footer><hr/><a class="previous" href="../../models/layers/"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
+loss(rand(10)) # around 0.9</code></pre><p>In this manner it is possible to compose optimisers for some added flexibility.</p><h2><a class="nav-anchor" id="Decays-1" href="#Decays-1">Decays</a></h2><p>Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ExpDecay" href="#Flux.Optimise.ExpDecay"><code>Flux.Optimise.ExpDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">ExpDecay(eta, decay, decay_step, clip)</code></pre><p>Discount the learning rate <code>eta</code> by a multiplicative factor <code>decay</code> every <code>decay_step</code> till a minimum of <code>clip</code>.</p><p><strong>Parameters</strong></p><ul><li>Learning Rate (eta): Defaults to <code>0.001</code>.</li><li>decay: Factor by which the learning rate is discounted. Defaults to <code>0.1</code>.</li><li>decay_step: Schedules decay operations by setting number of steps between two decay operations. Defaults to <code>1000</code>.</li><li>clip: Minimum value of learning rate. Defaults to <code>1e-4</code>.</li></ul><p><strong>Example</strong></p><p>To apply exponential decay to an optimiser:</p><pre><code class="language-julia">Optimiser(ExpDecay(..), Opt(..))
+opt = Optimiser(ExpDecay(), ADAM())</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L471-L488">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.InvDecay" href="#Flux.Optimise.InvDecay"><code>Flux.Optimise.InvDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">InvDecay(γ)</code></pre><p>Applies inverse time decay to an optimiser, i.e., the effective step size at iteration <code>n</code> is <code>eta / (1 + γ * n)</code> where <code>eta</code> is the initial step size. The wrapped optimiser&#39;s step size is not modified.</p><p><strong>Parameters</strong></p><ul><li>gamma (γ): Defaults to <code>0.001</code></li></ul><p><strong>Example</strong></p><pre><code class="language-julia">Optimiser(InvDecay(..), Opt(..))</code></pre></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L443-L455">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.WeightDecay" href="#Flux.Optimise.WeightDecay"><code>Flux.Optimise.WeightDecay</code></a> — <span class="docstring-category">Type</span>.</div><div><div><pre><code class="language-julia">WeightDecay(wd)</code></pre><p>Decays the weight by <code>wd</code></p><p><strong>Parameters</strong></p><ul><li>weight decay (wd): 0</li></ul></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/ddab979ea9d062acc9e8e700404fd51997581ada/src/optimise/optimisers.jl#L509-L516">source</a></section><footer><hr/><a class="previous" href="../../data/dataloader/"><span class="direction">Previous</span><span class="title">DataLoader</span></a><a class="next" href="../training/"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
--- a/dev/training/training/index.html
+++ b/dev/training/training/index.html