Flux.jl/release-0.2/models/templates.html
2017-05-02 13:01:23 +00:00

371 lines
11 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>
Model Templates · Flux
</title>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-36890222-9', 'auto');
ga('send', 'pageview');
</script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
<link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
<link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
<script>
documenterBaseURL=".."
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
<script src="../../versions.js"></script>
<link href="../../flux.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<nav class="toc">
<h1>
Flux
</h1>
<form class="search" action="../search.html">
<select id="version-selector" onChange="window.location.href=this.value">
<option value="#" selected="selected" disabled="disabled">
Version
</option>
</select>
<input id="search-query" name="q" type="text" placeholder="Search docs"/>
</form>
<ul>
<li>
<a class="toctext" href="../index.html">
Home
</a>
</li>
<li>
<span class="toctext">
Building Models
</span>
<ul>
<li>
<a class="toctext" href="basics.html">
Model Building Basics
</a>
</li>
<li class="current">
<a class="toctext" href="templates.html">
Model Templates
</a>
<ul class="internal">
<li>
<a class="toctext" href="#Models-in-templates-1">
Models in templates
</a>
</li>
<li>
<a class="toctext" href="#Constructors-1">
Constructors
</a>
</li>
<li>
<a class="toctext" href="#Supported-syntax-1">
Supported syntax
</a>
</li>
</ul>
</li>
<li>
<a class="toctext" href="recurrent.html">
Recurrence
</a>
</li>
<li>
<a class="toctext" href="debugging.html">
Debugging
</a>
</li>
</ul>
</li>
<li>
<span class="toctext">
Other APIs
</span>
<ul>
<li>
<a class="toctext" href="../apis/batching.html">
Batching
</a>
</li>
<li>
<a class="toctext" href="../apis/backends.html">
Backends
</a>
</li>
<li>
<a class="toctext" href="../apis/storage.html">
Storing Models
</a>
</li>
</ul>
</li>
<li>
<span class="toctext">
In Action
</span>
<ul>
<li>
<a class="toctext" href="../examples/logreg.html">
Simple MNIST
</a>
</li>
<li>
<a class="toctext" href="../examples/char-rnn.html">
Char RNN
</a>
</li>
</ul>
</li>
<li>
<a class="toctext" href="../contributing.html">
Contributing &amp; Help
</a>
</li>
<li>
<a class="toctext" href="../internals.html">
Internals
</a>
</li>
</ul>
</nav>
<article id="docs">
<header>
<nav>
<ul>
<li>
Building Models
</li>
<li>
<a href="templates.html">
Model Templates
</a>
</li>
</ul>
<a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/efcb9650da31c183b94b839f66aa3467d007c33f/docs/src/models/templates.md">
<span class="fa">
</span>
Edit on GitHub
</a>
</nav>
<hr/>
</header>
<h1>
<a class="nav-anchor" id="Model-Templates-1" href="#Model-Templates-1">
Model Templates
</a>
</h1>
<p>
<em>
... Calculating Tax Expenses ...
</em>
</p>
<p>
So how does the
<code>Affine</code>
template work? We don&#39;t want to duplicate the code above whenever we need more than one affine layer:
</p>
<pre><code class="language-julia">W₁, b₁ = randn(...)
affine₁(x) = W₁*x + b₁
W₂, b₂ = randn(...)
affine₂(x) = W₂*x + b₂
model = Chain(affine₁, affine₂)</code></pre>
<p>
Here&#39;s one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:
</p>
<pre><code class="language-julia">type MyAffine
W
b
end
# Use the `MyAffine` layer as a model
(l::MyAffine)(x) = l.W * x + l.b
# Convenience constructor
MyAffine(in::Integer, out::Integer) =
MyAffine(randn(out, in), randn(out))
model = Chain(MyAffine(5, 5), MyAffine(5, 5))
model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]</code></pre>
<p>
This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the
<code>@net</code>
macro:
</p>
<pre><code class="language-julia">@net type MyAffine
W
b
x -&gt; x * W + b
end</code></pre>
<p>
The function provided,
<code>x -&gt; x * W + b</code>
, will be used when
<code>MyAffine</code>
is used as a model; it&#39;s just a shorter way of defining the
<code>(::MyAffine)(x)</code>
method above. (You may notice that
<code>W</code>
and
<code>x</code>
have swapped order in the model; this is due to the way batching works, which will be covered in more detail later on.)
</p>
<p>
However,
<code>@net</code>
does not simply save us some keystrokes; it&#39;s the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.
</p>
<p>
The above code is almost exactly how
<code>Affine</code>
is defined in Flux itself! There&#39;s no difference between &quot;library-level&quot; and &quot;user-level&quot; models, so making your code reusable doesn&#39;t involve a lot of extra complexity. Moreover, much more complex models than
<code>Affine</code>
are equally simple to define.
</p>
<h2>
<a class="nav-anchor" id="Models-in-templates-1" href="#Models-in-templates-1">
Models in templates
</a>
</h2>
<p>
<code>@net</code>
models can contain sub-models as well as just array parameters:
</p>
<pre><code class="language-julia">@net type TLP
first
second
function (x)
l1 = σ(first(x))
l2 = softmax(second(l1))
end
end</code></pre>
<p>
Just as above, this is roughly equivalent to writing:
</p>
<pre><code class="language-julia">type TLP
first
second
end
function (self::TLP)(x)
l1 = σ(self.first(x))
l2 = softmax(self.second(l1))
end</code></pre>
<p>
Clearly, the
<code>first</code>
and
<code>second</code>
parameters are not arrays here, but should be models themselves, and produce a result when called with an input array
<code>x</code>
. The
<code>Affine</code>
layer fits the bill, so we can instantiate
<code>TLP</code>
with two of them:
</p>
<pre><code class="language-julia">model = TLP(Affine(10, 20),
Affine(20, 15))
x1 = rand(20)
model(x1) # [0.057852,0.0409741,0.0609625,0.0575354 ...</code></pre>
<p>
You may recognise this as being equivalent to
</p>
<pre><code class="language-julia">Chain(
Affine(10, 20), σ
Affine(20, 15), softmax)</code></pre>
<p>
given that it&#39;s just a sequence of calls. For simple networks
<code>Chain</code>
is completely fine, although the
<code>@net</code>
version is more powerful as we can (for example) reuse the output
<code>l1</code>
more than once.
</p>
<h2>
<a class="nav-anchor" id="Constructors-1" href="#Constructors-1">
Constructors
</a>
</h2>
<p>
<code>Affine</code>
has two array parameters,
<code>W</code>
and
<code>b</code>
. Just like any other Julia type, it&#39;s easy to instantiate an
<code>Affine</code>
layer with parameters of our choosing:
</p>
<pre><code class="language-julia">a = Affine(rand(10, 20), rand(20))</code></pre>
<p>
However, for convenience and to avoid errors, we&#39;d probably rather specify the input and output dimension instead:
</p>
<pre><code class="language-julia">a = Affine(10, 20)</code></pre>
<p>
This is easy to implement using the usual Julia syntax for constructors:
</p>
<pre><code class="language-julia">Affine(in::Integer, out::Integer) =
Affine(randn(in, out), randn(1, out))</code></pre>
<p>
In practice, these constructors tend to take the parameter initialisation function as an argument so that it&#39;s more easily customisable, and use
<code>Flux.initn</code>
by default (which is equivalent to
<code>randn(...)/100</code>
). So
<code>Affine</code>
&#39;s constructor really looks like this:
</p>
<pre><code class="language-julia">Affine(in::Integer, out::Integer; init = initn) =
Affine(init(in, out), init(1, out))</code></pre>
<h2>
<a class="nav-anchor" id="Supported-syntax-1" href="#Supported-syntax-1">
Supported syntax
</a>
</h2>
<p>
The syntax used to define a forward pass like
<code>x -&gt; x*W + b</code>
behaves exactly like Julia code for the most part. However, it&#39;s important to remember that it&#39;s defining a dataflow graph, not a general Julia expression. In practice this means that anything side-effectful, or things like control flow and
<code>println</code>
s, won&#39;t work as expected. In future we&#39;ll continue to expand support for Julia syntax and features.
</p>
<footer>
<hr/>
<a class="previous" href="basics.html">
<span class="direction">
Previous
</span>
<span class="title">
Model Building Basics
</span>
</a>
<a class="next" href="recurrent.html">
<span class="direction">
Next
</span>
<span class="title">
Recurrence
</span>
</a>
</footer>
</article>
</body>
</html>