</script><linkhref="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css"rel="stylesheet"type="text/css"/><linkhref="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono"rel="stylesheet"type="text/css"/><linkhref="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css"rel="stylesheet"type="text/css"/><linkhref="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css"rel="stylesheet"type="text/css"/><script>documenterBaseURL="../.."</script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js"data-main="../../assets/documenter.js"></script><scriptsrc="../../siteinfo.js"></script><scriptsrc="../../../versions.js"></script><linkhref="../../assets/documenter.css"rel="stylesheet"type="text/css"/><linkhref="../../assets/flux.css"rel="stylesheet"type="text/css"/></head><body><navclass="toc"><h1>Flux</h1><selectid="version-selector"onChange="window.location.href=this.value"style="visibility: hidden"></select><formclass="search"id="search-form"action="../../search/"><inputid="search-query"name="q"type="text"placeholder="Search docs"/></form><ul><li><aclass="toctext"href="../../">Home</a></li><li><spanclass="toctext">Building Models</span><ul><li><aclass="toctext"href="../../models/basics/">Basics</a></li><li><aclass="toctext"href="../../models/recurrence/">Recurrence</a></li><li><aclass="toctext"href="../../models/regularisation/">Regularisation</a></li><li><aclass="toctext"href="../../models/layers/">Model Reference</a></li></ul></li><li><spanclass="toctext">Training Models</span><ul><liclass="current"><aclass="toctext"href>Optimisers</a><ulclass="internal"><li><aclass="toctext"href="#Optimiser-Reference-1">Optimiser Reference</a></li></ul></li><li><aclass="toctext"href="../training/">Training</a></li></ul></li><li><aclass="toctext"href="../../data/onehot/">One-Hot Encoding</a></li><li><aclass="toctext"href="../../gpu/">GPU Support</a></li><li><aclass="toctext"href="../../saving/">Saving & Loading</a></li><li><aclass="toctext"href="../../performance/">Performance Tips</a></li><li><spanclass="toctext">Internals</span><ul><li><aclass="toctext"href="../../internals/tracker/">Backpropagation</a></li></ul></li><li><aclass="toctext"href="../../community/">Community</a></li></ul></nav><articleid="docs"><header><nav><ul><li>Training Models</li><li><ahref>Optimisers</a></li></ul><aclass="edit-page"href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/training/optimisers.md"><spanclass="fa"></span> Edit on GitHub</a></nav><hr/><divid="topbar"><span>Optimisers</span><aclass="fa fa-bars"href="#"></a></div></header><h1><aclass="nav-anchor"id="Optimisers-1"href="#Optimisers-1">Optimisers</a></h1><p>Consider a <ahref="../../models/basics/">simple linear regression</a>. We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters <code>W</code> and <code>b</code>.</p><pre><codeclass="language-julia">using Flux, Flux.Tracker
grads = Tracker.gradient(() -> loss(x, y), θ)</code></pre><p>We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:</p><pre><codeclass="language-julia">using Flux.Tracker: grad, update!
end</code></pre><p>Running this will alter the parameters <code>W</code> and <code>b</code> and our loss should go down. Flux provides a more general way to do optimiser updates like this.</p><pre><codeclass="language-julia">opt = Descent(0.1) # Gradient descent with learning rate 0.1
end</code></pre><p>An optimiser <code>update!</code> accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass <code>opt</code> to our <ahref="../training/">training loop</a>, which will update all parameters of the model in a loop. However, we can now easily replace <code>Descent</code> with a more advanced optimiser such as <code>ADAM</code>.</p><h2><aclass="nav-anchor"id="Optimiser-Reference-1"href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return an object that, when passed to <code>train!</code>, will update the parameters passed to it.</p><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Optimise.Descent"href="#Flux.Optimise.Descent"><code>Flux.Optimise.Descent</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-julia">Descent(η)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/b8e06ef3b750369bbe91309351e90384b3e829f5/src/optimise/optimisers.jl#L9-L14">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Optimise.Momentum"href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-julia">Momentum(η = 0.01; ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and momentum <code>ρ</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/b8e06ef3b750369bbe91309351e90384b3e829f5/src/optimise/optimisers.jl#L25-L29">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Optimise.Nesterov"href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-julia">Nesterov(eta, ρ = 0.9)</code></pre><p>Gradient descent with learning rate <code>η</code> and Nesterov momentum <code>ρ</code>.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/b8e06ef3b750369bbe91309351e90384b3e829f5/src/optimise/optimisers.jl#L45-L49">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Optimise.RMSProp"href="#Flux.Optimise.RMSProp"><code>Flux.Optimise.RMSProp</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-julia">RMSProp(η = 0.001, ρ = 0.9)</code></pre><p><ahref="https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a> optimiser. Parameters other than learning rate don't need tuning. Often a good choice for recurrent networks.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/b8e06ef3b750369bbe91309351e90384b3e829f5/src/optimise/optimisers.jl#L66-L72">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Optimise.ADAM"href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><codeclass="language-julia">ADAM(η = 0.001, β = (0.9, 0.999))</code></pre><p><ahref="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><aclass="source-link"target="_blank"href="https://github.com/FluxML/Flux.jl/blob/b8e06ef3b750369bbe91309351e90384b3e829f5/src/optimise/optimisers.jl#L88-L92">source</a></section><sectionclass="docstring"><divclass="docstring-header"><aclass="docstring-binding"id="Flux.Optimise.AdaMax"href="#Flux.Optimise.AdaMax"><code>Flux.Optimise.AdaMax</code></a> — <spanclass="docstring-category">Type</span>.</div><div><div><pre><