build based on 08fb9b7

2018-09-14 19:35:18 +00:00 · 2018-09-14 19:35:18 +00:00 · 7ccda2d2b5
commit 7ccda2d2b5
parent e38c9c1e2a
4 changed files with 12 additions and 12 deletions
--- a/latest/models/basics.html
+++ b/latest/models/basics.html
@ -11,14 +11,14 @@ ga('send', 'pageview');
 f(x) = 3x^2 + 2x + 1

 # df/dx = 6x + 2
-df(x) = Tracker.gradient(f, x)[1]
+f′(x) = Tracker.gradient(f, x)[1]

-df(2) # 14.0 (tracked)
+f′(2) # 14.0 (tracked)

 # d²f/dx² = 6
-d2f(x) = Tracker.gradient(df, x)[1]
+f′′(x) = Tracker.gradient(f′, x)[1]

-d2f(2) # 6.0 (tracked)</code></pre><p>(We&#39;ll learn more about why these numbers show up as <code>(tracked)</code> below.)</p><p>When a function has many parameters, we can pass them all in explicitly:</p><pre><code class="language-julia">f(W, b, x) = W * x + b
+f′′(2) # 6.0 (tracked)</code></pre><p>(We&#39;ll learn more about why these numbers show up as <code>(tracked)</code> below.)</p><p>When a function has many parameters, we can pass them all in explicitly:</p><pre><code class="language-julia">f(W, b, x) = W * x + b

 Tracker.gradient(f, 2, 3, 4)
 (4.0 (tracked), 1.0, 2.0 (tracked))</code></pre><p>But machine learning models can have <em>hundreds</em> of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via <code>param</code>. Then we can collect these together and tell <code>gradient</code> to collect the gradients of all of them at once.</p><pre><code class="language-julia">W = param(2) # 2.0 (tracked)
--- a/latest/models/layers.html
+++ b/latest/models/layers.html
--- a/latest/search_index.js
+++ b/latest/search_index.js
@ -53,7 +53,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Basics",
    "title": "Taking Gradients",
    "category": "section",
-    "text": "Flux\'s core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument. (It\'s a good idea to try pasting these examples in the Julia terminal.)using Flux.Tracker\n\nf(x) = 3x^2 + 2x + 1\n\n# df/dx = 6x + 2\ndf(x) = Tracker.gradient(f, x)[1]\n\ndf(2) # 14.0 (tracked)\n\n# d²f/dx² = 6\nd2f(x) = Tracker.gradient(df, x)[1]\n\nd2f(2) # 6.0 (tracked)(We\'ll learn more about why these numbers show up as (tracked) below.)When a function has many parameters, we can pass them all in explicitly:f(W, b, x) = W * x + b\n\nTracker.gradient(f, 2, 3, 4)\n(4.0 (tracked), 1.0, 2.0 (tracked))But machine learning models can have hundreds of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via param. Then we can collect these together and tell gradient to collect the gradients of all of them at once.W = param(2) # 2.0 (tracked)\nb = param(3) # 3.0 (tracked)\n\nf(x) = W * x + b\n\nparams = Params([W, b])\ngrads = Tracker.gradient(() -> f(4), params)\n\ngrads[W] # 4.0\ngrads[b] # 1.0There are a few things to notice here. Firstly, W and b now show up as tracked. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. gradient takes a zero-argument function; no arguments are necessary because the Params tell it what to differentiate.This will come in really handy when dealing with big, complicated models. For now, though, let\'s start with something simple."
+    "text": "Flux\'s core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument. (It\'s a good idea to try pasting these examples in the Julia terminal.)using Flux.Tracker\n\nf(x) = 3x^2 + 2x + 1\n\n# df/dx = 6x + 2\nf′(x) = Tracker.gradient(f, x)[1]\n\nf′(2) # 14.0 (tracked)\n\n# d²f/dx² = 6\nf′′(x) = Tracker.gradient(f′, x)[1]\n\nf′′(2) # 6.0 (tracked)(We\'ll learn more about why these numbers show up as (tracked) below.)When a function has many parameters, we can pass them all in explicitly:f(W, b, x) = W * x + b\n\nTracker.gradient(f, 2, 3, 4)\n(4.0 (tracked), 1.0, 2.0 (tracked))But machine learning models can have hundreds of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via param. Then we can collect these together and tell gradient to collect the gradients of all of them at once.W = param(2) # 2.0 (tracked)\nb = param(3) # 3.0 (tracked)\n\nf(x) = W * x + b\n\nparams = Params([W, b])\ngrads = Tracker.gradient(() -> f(4), params)\n\ngrads[W] # 4.0\ngrads[b] # 1.0There are a few things to notice here. Firstly, W and b now show up as tracked. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. gradient takes a zero-argument function; no arguments are necessary because the Params tell it what to differentiate.This will come in really handy when dealing with big, complicated models. For now, though, let\'s start with something simple."
 },

 {
--- a/latest/training/optimisers.html
+++ b/latest/training/optimisers.html
@ -29,4 +29,4 @@ end</code></pre><p>If we call <code>sgd</code>, the parameters <code>W</code> an
  Dense(10, 5, σ),
  Dense(5, 2), softmax)</code></pre><p>Instead of having to write <code>[m[1].W, m[1].b, ...]</code>, Flux provides a params function <code>params(m)</code> that returns a list of all parameters in the model for you.</p><p>For the update step, there&#39;s nothing whatsoever wrong with writing the loop above – it&#39;ll work just fine – but Flux provides various <em>optimisers</em> that make it more convenient.</p><pre><code class="language-julia">opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1

-opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">SGD(params, η = 0.1; decay = 0)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports inverse decaying learning rate if the <code>decay</code> argument is provided.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L14-L21">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, Nesterov momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L33-L37">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L51-L55">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
+opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">SGD(params, η = 0.1; decay = 0)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports inverse decaying learning rate if the <code>decay</code> argument is provided.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L14-L21">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate  <code>η</code>, Nesterov momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L33-L37">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a> — <span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L51-L55">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>