371 lines
11 KiB
HTML
371 lines
11 KiB
HTML
![]() |
<!DOCTYPE html>
|
|||
|
<html lang="en">
|
|||
|
<head>
|
|||
|
<meta charset="UTF-8"/>
|
|||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
|
|||
|
<title>
|
|||
|
Model Templates · Flux
|
|||
|
</title>
|
|||
|
<script>
|
|||
|
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
|||
|
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
|||
|
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
|||
|
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
|
|||
|
|
|||
|
ga('create', 'UA-36890222-9', 'auto');
|
|||
|
ga('send', 'pageview');
|
|||
|
|
|||
|
</script>
|
|||
|
<link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
|
|||
|
<link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
|
|||
|
<link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
|
|||
|
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
|
|||
|
<link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
|
|||
|
<script>
|
|||
|
documenterBaseURL=".."
|
|||
|
</script>
|
|||
|
<script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
|
|||
|
<script src="../../versions.js"></script>
|
|||
|
<link href="../../flux.css" rel="stylesheet" type="text/css"/>
|
|||
|
</head>
|
|||
|
<body>
|
|||
|
<nav class="toc">
|
|||
|
<h1>
|
|||
|
Flux
|
|||
|
</h1>
|
|||
|
<form class="search" action="../search.html">
|
|||
|
<select id="version-selector" onChange="window.location.href=this.value">
|
|||
|
<option value="#" selected="selected" disabled="disabled">
|
|||
|
Version
|
|||
|
</option>
|
|||
|
</select>
|
|||
|
<input id="search-query" name="q" type="text" placeholder="Search docs"/>
|
|||
|
</form>
|
|||
|
<ul>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../index.html">
|
|||
|
Home
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<span class="toctext">
|
|||
|
Building Models
|
|||
|
</span>
|
|||
|
<ul>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="basics.html">
|
|||
|
Model Building Basics
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li class="current">
|
|||
|
<a class="toctext" href="templates.html">
|
|||
|
Model Templates
|
|||
|
</a>
|
|||
|
<ul class="internal">
|
|||
|
<li>
|
|||
|
<a class="toctext" href="#Models-in-templates-1">
|
|||
|
Models in templates
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="#Constructors-1">
|
|||
|
Constructors
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="#Supported-syntax-1">
|
|||
|
Supported syntax
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
</ul>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="recurrent.html">
|
|||
|
Recurrence
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="debugging.html">
|
|||
|
Debugging
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
</ul>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<span class="toctext">
|
|||
|
Other APIs
|
|||
|
</span>
|
|||
|
<ul>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../apis/batching.html">
|
|||
|
Batching
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../apis/backends.html">
|
|||
|
Backends
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../apis/storage.html">
|
|||
|
Storing Models
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
</ul>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<span class="toctext">
|
|||
|
In Action
|
|||
|
</span>
|
|||
|
<ul>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../examples/logreg.html">
|
|||
|
Simple MNIST
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../examples/char-rnn.html">
|
|||
|
Char RNN
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
</ul>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../contributing.html">
|
|||
|
Contributing & Help
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a class="toctext" href="../internals.html">
|
|||
|
Internals
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
</ul>
|
|||
|
</nav>
|
|||
|
<article id="docs">
|
|||
|
<header>
|
|||
|
<nav>
|
|||
|
<ul>
|
|||
|
<li>
|
|||
|
Building Models
|
|||
|
</li>
|
|||
|
<li>
|
|||
|
<a href="templates.html">
|
|||
|
Model Templates
|
|||
|
</a>
|
|||
|
</li>
|
|||
|
</ul>
|
|||
|
<a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/efcb9650da31c183b94b839f66aa3467d007c33f/docs/src/models/templates.md">
|
|||
|
<span class="fa">
|
|||
|
|
|||
|
</span>
|
|||
|
Edit on GitHub
|
|||
|
</a>
|
|||
|
</nav>
|
|||
|
<hr/>
|
|||
|
</header>
|
|||
|
<h1>
|
|||
|
<a class="nav-anchor" id="Model-Templates-1" href="#Model-Templates-1">
|
|||
|
Model Templates
|
|||
|
</a>
|
|||
|
</h1>
|
|||
|
<p>
|
|||
|
<em>
|
|||
|
... Calculating Tax Expenses ...
|
|||
|
</em>
|
|||
|
</p>
|
|||
|
<p>
|
|||
|
So how does the
|
|||
|
<code>Affine</code>
|
|||
|
template work? We don't want to duplicate the code above whenever we need more than one affine layer:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">W₁, b₁ = randn(...)
|
|||
|
affine₁(x) = W₁*x + b₁
|
|||
|
W₂, b₂ = randn(...)
|
|||
|
affine₂(x) = W₂*x + b₂
|
|||
|
model = Chain(affine₁, affine₂)</code></pre>
|
|||
|
<p>
|
|||
|
Here's one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">type MyAffine
|
|||
|
W
|
|||
|
b
|
|||
|
end
|
|||
|
|
|||
|
# Use the `MyAffine` layer as a model
|
|||
|
(l::MyAffine)(x) = l.W * x + l.b
|
|||
|
|
|||
|
# Convenience constructor
|
|||
|
MyAffine(in::Integer, out::Integer) =
|
|||
|
MyAffine(randn(out, in), randn(out))
|
|||
|
|
|||
|
model = Chain(MyAffine(5, 5), MyAffine(5, 5))
|
|||
|
|
|||
|
model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]</code></pre>
|
|||
|
<p>
|
|||
|
This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the
|
|||
|
<code>@net</code>
|
|||
|
macro:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">@net type MyAffine
|
|||
|
W
|
|||
|
b
|
|||
|
x -> x * W + b
|
|||
|
end</code></pre>
|
|||
|
<p>
|
|||
|
The function provided,
|
|||
|
<code>x -> x * W + b</code>
|
|||
|
, will be used when
|
|||
|
<code>MyAffine</code>
|
|||
|
is used as a model; it's just a shorter way of defining the
|
|||
|
<code>(::MyAffine)(x)</code>
|
|||
|
method above. (You may notice that
|
|||
|
<code>W</code>
|
|||
|
and
|
|||
|
<code>x</code>
|
|||
|
have swapped order in the model; this is due to the way batching works, which will be covered in more detail later on.)
|
|||
|
</p>
|
|||
|
<p>
|
|||
|
However,
|
|||
|
<code>@net</code>
|
|||
|
does not simply save us some keystrokes; it's the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.
|
|||
|
</p>
|
|||
|
<p>
|
|||
|
The above code is almost exactly how
|
|||
|
<code>Affine</code>
|
|||
|
is defined in Flux itself! There's no difference between "library-level" and "user-level" models, so making your code reusable doesn't involve a lot of extra complexity. Moreover, much more complex models than
|
|||
|
<code>Affine</code>
|
|||
|
are equally simple to define.
|
|||
|
</p>
|
|||
|
<h2>
|
|||
|
<a class="nav-anchor" id="Models-in-templates-1" href="#Models-in-templates-1">
|
|||
|
Models in templates
|
|||
|
</a>
|
|||
|
</h2>
|
|||
|
<p>
|
|||
|
<code>@net</code>
|
|||
|
models can contain sub-models as well as just array parameters:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">@net type TLP
|
|||
|
first
|
|||
|
second
|
|||
|
function (x)
|
|||
|
l1 = σ(first(x))
|
|||
|
l2 = softmax(second(l1))
|
|||
|
end
|
|||
|
end</code></pre>
|
|||
|
<p>
|
|||
|
Just as above, this is roughly equivalent to writing:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">type TLP
|
|||
|
first
|
|||
|
second
|
|||
|
end
|
|||
|
|
|||
|
function (self::TLP)(x)
|
|||
|
l1 = σ(self.first(x))
|
|||
|
l2 = softmax(self.second(l1))
|
|||
|
end</code></pre>
|
|||
|
<p>
|
|||
|
Clearly, the
|
|||
|
<code>first</code>
|
|||
|
and
|
|||
|
<code>second</code>
|
|||
|
parameters are not arrays here, but should be models themselves, and produce a result when called with an input array
|
|||
|
<code>x</code>
|
|||
|
. The
|
|||
|
<code>Affine</code>
|
|||
|
layer fits the bill, so we can instantiate
|
|||
|
<code>TLP</code>
|
|||
|
with two of them:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">model = TLP(Affine(10, 20),
|
|||
|
Affine(20, 15))
|
|||
|
x1 = rand(20)
|
|||
|
model(x1) # [0.057852,0.0409741,0.0609625,0.0575354 ...</code></pre>
|
|||
|
<p>
|
|||
|
You may recognise this as being equivalent to
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">Chain(
|
|||
|
Affine(10, 20), σ
|
|||
|
Affine(20, 15), softmax)</code></pre>
|
|||
|
<p>
|
|||
|
given that it's just a sequence of calls. For simple networks
|
|||
|
<code>Chain</code>
|
|||
|
is completely fine, although the
|
|||
|
<code>@net</code>
|
|||
|
version is more powerful as we can (for example) reuse the output
|
|||
|
<code>l1</code>
|
|||
|
more than once.
|
|||
|
</p>
|
|||
|
<h2>
|
|||
|
<a class="nav-anchor" id="Constructors-1" href="#Constructors-1">
|
|||
|
Constructors
|
|||
|
</a>
|
|||
|
</h2>
|
|||
|
<p>
|
|||
|
<code>Affine</code>
|
|||
|
has two array parameters,
|
|||
|
<code>W</code>
|
|||
|
and
|
|||
|
<code>b</code>
|
|||
|
. Just like any other Julia type, it's easy to instantiate an
|
|||
|
<code>Affine</code>
|
|||
|
layer with parameters of our choosing:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">a = Affine(rand(10, 20), rand(20))</code></pre>
|
|||
|
<p>
|
|||
|
However, for convenience and to avoid errors, we'd probably rather specify the input and output dimension instead:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">a = Affine(10, 20)</code></pre>
|
|||
|
<p>
|
|||
|
This is easy to implement using the usual Julia syntax for constructors:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">Affine(in::Integer, out::Integer) =
|
|||
|
Affine(randn(in, out), randn(1, out))</code></pre>
|
|||
|
<p>
|
|||
|
In practice, these constructors tend to take the parameter initialisation function as an argument so that it's more easily customisable, and use
|
|||
|
<code>Flux.initn</code>
|
|||
|
by default (which is equivalent to
|
|||
|
<code>randn(...)/100</code>
|
|||
|
). So
|
|||
|
<code>Affine</code>
|
|||
|
's constructor really looks like this:
|
|||
|
</p>
|
|||
|
<pre><code class="language-julia">Affine(in::Integer, out::Integer; init = initn) =
|
|||
|
Affine(init(in, out), init(1, out))</code></pre>
|
|||
|
<h2>
|
|||
|
<a class="nav-anchor" id="Supported-syntax-1" href="#Supported-syntax-1">
|
|||
|
Supported syntax
|
|||
|
</a>
|
|||
|
</h2>
|
|||
|
<p>
|
|||
|
The syntax used to define a forward pass like
|
|||
|
<code>x -> x*W + b</code>
|
|||
|
behaves exactly like Julia code for the most part. However, it's important to remember that it's defining a dataflow graph, not a general Julia expression. In practice this means that anything side-effectful, or things like control flow and
|
|||
|
<code>println</code>
|
|||
|
s, won't work as expected. In future we'll continue to expand support for Julia syntax and features.
|
|||
|
</p>
|
|||
|
<footer>
|
|||
|
<hr/>
|
|||
|
<a class="previous" href="basics.html">
|
|||
|
<span class="direction">
|
|||
|
Previous
|
|||
|
</span>
|
|||
|
<span class="title">
|
|||
|
Model Building Basics
|
|||
|
</span>
|
|||
|
</a>
|
|||
|
<a class="next" href="recurrent.html">
|
|||
|
<span class="direction">
|
|||
|
Next
|
|||
|
</span>
|
|||
|
<span class="title">
|
|||
|
Recurrence
|
|||
|
</span>
|
|||
|
</a>
|
|||
|
</footer>
|
|||
|
</article>
|
|||
|
</body>
|
|||
|
</html>
|