build based on 08fb9b7

This commit is contained in:
autodocs 2018-09-14 19:35:18 +00:00
parent e38c9c1e2a
commit 7ccda2d2b5
4 changed files with 12 additions and 12 deletions

View File

@ -11,14 +11,14 @@ ga('send', 'pageview');
f(x) = 3x^2 + 2x + 1
# df/dx = 6x + 2
df(x) = Tracker.gradient(f, x)[1]
f(x) = Tracker.gradient(f, x)[1]
df(2) # 14.0 (tracked)
f(2) # 14.0 (tracked)
# d²f/dx² = 6
d2f(x) = Tracker.gradient(df, x)[1]
f(x) = Tracker.gradient(f, x)[1]
d2f(2) # 6.0 (tracked)</code></pre><p>(We&#39;ll learn more about why these numbers show up as <code>(tracked)</code> below.)</p><p>When a function has many parameters, we can pass them all in explicitly:</p><pre><code class="language-julia">f(W, b, x) = W * x + b
f(2) # 6.0 (tracked)</code></pre><p>(We&#39;ll learn more about why these numbers show up as <code>(tracked)</code> below.)</p><p>When a function has many parameters, we can pass them all in explicitly:</p><pre><code class="language-julia">f(W, b, x) = W * x + b
Tracker.gradient(f, 2, 3, 4)
(4.0 (tracked), 1.0, 2.0 (tracked))</code></pre><p>But machine learning models can have <em>hundreds</em> of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via <code>param</code>. Then we can collect these together and tell <code>gradient</code> to collect the gradients of all of them at once.</p><pre><code class="language-julia">W = param(2) # 2.0 (tracked)

File diff suppressed because one or more lines are too long

View File

@ -53,7 +53,7 @@ var documenterSearchIndex = {"docs": [
"page": "Basics",
"title": "Taking Gradients",
"category": "section",
"text": "Flux\'s core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument. (It\'s a good idea to try pasting these examples in the Julia terminal.)using Flux.Tracker\n\nf(x) = 3x^2 + 2x + 1\n\n# df/dx = 6x + 2\ndf(x) = Tracker.gradient(f, x)[1]\n\ndf(2) # 14.0 (tracked)\n\n# d²f/dx² = 6\nd2f(x) = Tracker.gradient(df, x)[1]\n\nd2f(2) # 6.0 (tracked)(We\'ll learn more about why these numbers show up as (tracked) below.)When a function has many parameters, we can pass them all in explicitly:f(W, b, x) = W * x + b\n\nTracker.gradient(f, 2, 3, 4)\n(4.0 (tracked), 1.0, 2.0 (tracked))But machine learning models can have hundreds of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via param. Then we can collect these together and tell gradient to collect the gradients of all of them at once.W = param(2) # 2.0 (tracked)\nb = param(3) # 3.0 (tracked)\n\nf(x) = W * x + b\n\nparams = Params([W, b])\ngrads = Tracker.gradient(() -> f(4), params)\n\ngrads[W] # 4.0\ngrads[b] # 1.0There are a few things to notice here. Firstly, W and b now show up as tracked. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. gradient takes a zero-argument function; no arguments are necessary because the Params tell it what to differentiate.This will come in really handy when dealing with big, complicated models. For now, though, let\'s start with something simple."
"text": "Flux\'s core feature is taking gradients of Julia code. The gradient function takes another Julia function f and a set of arguments, and returns the gradient with respect to each argument. (It\'s a good idea to try pasting these examples in the Julia terminal.)using Flux.Tracker\n\nf(x) = 3x^2 + 2x + 1\n\n# df/dx = 6x + 2\nf(x) = Tracker.gradient(f, x)[1]\n\nf(2) # 14.0 (tracked)\n\n# d²f/dx² = 6\nf(x) = Tracker.gradient(f, x)[1]\n\nf(2) # 6.0 (tracked)(We\'ll learn more about why these numbers show up as (tracked) below.)When a function has many parameters, we can pass them all in explicitly:f(W, b, x) = W * x + b\n\nTracker.gradient(f, 2, 3, 4)\n(4.0 (tracked), 1.0, 2.0 (tracked))But machine learning models can have hundreds of parameters! Flux offers a nice way to handle this. We can tell Flux to treat something as a parameter via param. Then we can collect these together and tell gradient to collect the gradients of all of them at once.W = param(2) # 2.0 (tracked)\nb = param(3) # 3.0 (tracked)\n\nf(x) = W * x + b\n\nparams = Params([W, b])\ngrads = Tracker.gradient(() -> f(4), params)\n\ngrads[W] # 4.0\ngrads[b] # 1.0There are a few things to notice here. Firstly, W and b now show up as tracked. Tracked things behave like normal numbers or arrays, but keep records of everything you do with them, allowing Flux to calculate their gradients. gradient takes a zero-argument function; no arguments are necessary because the Params tell it what to differentiate.This will come in really handy when dealing with big, complicated models. For now, though, let\'s start with something simple."
},
{

View File

@ -29,4 +29,4 @@ end</code></pre><p>If we call <code>sgd</code>, the parameters <code>W</code> an
Dense(10, 5, σ),
Dense(5, 2), softmax)</code></pre><p>Instead of having to write <code>[m[1].W, m[1].b, ...]</code>, Flux provides a params function <code>params(m)</code> that returns a list of all parameters in the model for you.</p><p>For the update step, there&#39;s nothing whatsoever wrong with writing the loop above it&#39;ll work just fine but Flux provides various <em>optimisers</em> that make it more convenient.</p><pre><code class="language-julia">opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1
opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">SGD(params, η = 0.1; decay = 0)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports inverse decaying learning rate if the <code>decay</code> argument is provided.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L14-L21">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate <code>η</code>, momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate <code>η</code>, Nesterov momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L33-L37">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/9d4ee1b3aab2c6b2a2da8850062acea66b8a2e33/src/optimise/interface.jl#L51-L55">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>
opt() # Carry out the update, modifying `W` and `b`.</code></pre><p>An optimiser takes a parameter list and returns a function that does the same thing as <code>update</code> above. We can pass either <code>opt</code> or <code>update</code> to our <a href="training.html">training loop</a>, which will then run the optimiser after every mini-batch of data.</p><h2><a class="nav-anchor" id="Optimiser-Reference-1" href="#Optimiser-Reference-1">Optimiser Reference</a></h2><p>All optimisers return a function that, when called, will update the parameters passed to it.</p><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.SGD" href="#Flux.Optimise.SGD"><code>Flux.Optimise.SGD</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">SGD(params, η = 0.1; decay = 0)</code></pre><p>Classic gradient descent optimiser with learning rate <code>η</code>. For each parameter <code>p</code> and its gradient <code>δp</code>, this runs <code>p -= η*δp</code>.</p><p>Supports inverse decaying learning rate if the <code>decay</code> argument is provided.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L14-L21">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Momentum" href="#Flux.Optimise.Momentum"><code>Flux.Optimise.Momentum</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Momentum(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate <code>η</code>, momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L25-L29">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.Nesterov" href="#Flux.Optimise.Nesterov"><code>Flux.Optimise.Nesterov</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">Nesterov(params, η = 0.01; ρ = 0.9, decay = 0)</code></pre><p>SGD with learning rate <code>η</code>, Nesterov momentum <code>ρ</code> and optional learning rate inverse decay.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L33-L37">source</a></section><section class="docstring"><div class="docstring-header"><a class="docstring-binding" id="Flux.Optimise.ADAM" href="#Flux.Optimise.ADAM"><code>Flux.Optimise.ADAM</code></a><span class="docstring-category">Function</span>.</div><div><div><pre><code class="language-none">ADAM(params, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0)</code></pre><p><a href="https://arxiv.org/abs/1412.6980v8">ADAM</a> optimiser.</p></div></div><a class="source-link" target="_blank" href="https://github.com/FluxML/Flux.jl/blob/08fb9b7df1988818b5a1f77da2ce3cba5d900cd4/src/optimise/interface.jl#L51-L55">source</a></section><footer><hr/><a class="previous" href="../models/layers.html"><span class="direction">Previous</span><span class="title">Model Reference</span></a><a class="next" href="training.html"><span class="direction">Next</span><span class="title">Training</span></a></footer></article></body></html>