Flux.jl/latest/models/basics.html

367 lines
11 KiB
HTML
Raw Normal View History

2017-01-16 16:51:09 +00:00
<!DOCTYPE html>
2017-01-17 20:06:28 +00:00
<html lang="en">
<head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>
2017-01-18 12:45:25 +00:00
First Steps · Flux
2017-01-17 20:06:28 +00:00
</title>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
2017-01-16 16:51:09 +00:00
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-36890222-9', 'auto');
ga('send', 'pageview');
2017-01-17 20:06:28 +00:00
</script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
<link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
2017-01-18 01:18:15 +00:00
<link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
2017-01-17 20:06:28 +00:00
<script>
2017-01-18 01:18:15 +00:00
documenterBaseURL=".."
2017-01-17 20:06:28 +00:00
</script>
2017-01-18 01:18:15 +00:00
<script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
<script src="../../versions.js"></script>
<link href="../../flux.css" rel="stylesheet" type="text/css"/>
2017-01-17 20:06:28 +00:00
</head>
<body>
<nav class="toc">
<h1>
Flux
</h1>
2017-01-18 01:18:15 +00:00
<form class="search" action="../search.html">
2017-01-17 20:06:28 +00:00
<select id="version-selector" onChange="window.location.href=this.value">
<option value="#" selected="selected" disabled="disabled">
Version
</option>
</select>
<input id="search-query" name="q" type="text" placeholder="Search docs"/>
</form>
<ul>
<li>
2017-01-18 01:18:15 +00:00
<a class="toctext" href="../index.html">
2017-01-17 20:06:28 +00:00
Home
</a>
</li>
2017-01-18 23:22:30 +00:00
<li>
<span class="toctext">
Building Models
</span>
<ul>
<li class="current">
<a class="toctext" href="basics.html">
2017-01-18 12:45:25 +00:00
First Steps
</a>
2017-01-18 23:22:30 +00:00
<ul class="internal">
<li>
<a class="toctext" href="#The-Model-1">
2017-01-18 12:45:25 +00:00
The Model
2017-01-18 23:22:30 +00:00
</a>
</li>
<li>
<a class="toctext" href="#Combining-Models-1">
2017-01-18 23:16:38 +00:00
Combining Models
2017-01-18 23:22:30 +00:00
</a>
</li>
<li>
<a class="toctext" href="#A-Function-in-Model's-Clothing-1">
A Function in Model&#39;s Clothing
</a>
</li>
<li>
<a class="toctext" href="#The-Template-1">
The Template
</a>
</li>
</ul>
2017-01-18 23:16:38 +00:00
</li>
<li>
2017-01-18 23:22:30 +00:00
<a class="toctext" href="recurrent.html">
Recurrence
2017-01-18 23:16:38 +00:00
</a>
</li>
<li>
2017-01-18 23:22:30 +00:00
<a class="toctext" href="debugging.html">
Debugging
2017-01-18 01:18:15 +00:00
</a>
</li>
2017-01-18 12:45:25 +00:00
</ul>
</li>
<li>
<span class="toctext">
In Action
</span>
<ul>
<li>
<a class="toctext" href="../examples/logreg.html">
Logistic Regression
2017-01-18 01:18:15 +00:00
</a>
2017-01-17 20:06:28 +00:00
</li>
</ul>
</li>
2017-01-18 01:18:15 +00:00
<li>
<a class="toctext" href="../contributing.html">
Contributing &amp; Help
</a>
</li>
2017-01-18 12:45:25 +00:00
<li>
<a class="toctext" href="../internals.html">
Internals
</a>
</li>
2017-01-17 20:06:28 +00:00
</ul>
</nav>
<article id="docs">
<header>
<nav>
<ul>
<li>
2017-01-18 23:22:30 +00:00
Building Models
</li>
<li>
2017-01-18 01:02:10 +00:00
<a href="basics.html">
2017-01-18 12:45:25 +00:00
First Steps
2017-01-17 20:06:28 +00:00
</a>
</li>
</ul>
2017-02-01 12:42:41 +00:00
<a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/cb4912c271b8d8377a376894bf3a1421ac118760/docs/src/models/basics.md">
2017-01-17 20:06:28 +00:00
<span class="fa">
</span>
Edit on GitHub
</a>
</nav>
<hr/>
</header>
<h1>
2017-02-01 08:59:51 +00:00
<a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">
Model Building Basics
2017-01-17 20:06:28 +00:00
</a>
</h1>
2017-01-18 02:29:40 +00:00
<h2>
<a class="nav-anchor" id="The-Model-1" href="#The-Model-1">
The Model
</a>
</h2>
<p>
<em>
2017-01-18 23:16:38 +00:00
... Initialising Photon Beams ...
2017-01-18 02:29:40 +00:00
</em>
</p>
2017-01-18 01:18:15 +00:00
<p>
2017-01-18 23:16:38 +00:00
The core concept in Flux is the
2017-01-18 01:18:15 +00:00
<em>
2017-01-18 02:29:40 +00:00
model
2017-01-18 01:18:15 +00:00
</em>
2017-01-18 23:16:38 +00:00
. A model (or &quot;layer&quot;) is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):
2017-01-18 01:18:15 +00:00
</p>
2017-01-18 02:29:40 +00:00
<pre><code class="language-julia">W = randn(3,5)
b = randn(3)
2017-02-01 08:59:51 +00:00
affine(x) = W * x + b
2017-01-18 02:29:40 +00:00
2017-01-18 23:16:38 +00:00
x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]</code></pre>
<p>
<code>affine</code>
is simply a function which takes some vector
<code>x1</code>
and outputs a new one
<code>y1</code>
. For example,
<code>x1</code>
could be data from an image and
<code>y1</code>
could be predictions about the content of that image. However,
<code>affine</code>
isn&#39;t static. It has
<em>
parameters
</em>
<code>W</code>
and
<code>b</code>
, and if we tweak those parameters we&#39;ll tweak the result hopefully to make the predictions more accurate.
</p>
<p>
This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a
<em>
template
</em>
which creates these functions for us:
</p>
<pre><code class="language-julia">affine1 = Affine(5, 5)
affine2 = Affine(5, 5)
softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]</code></pre>
<p>
We just created two separate
<code>Affine</code>
layers, and each contains its own version of
<code>W</code>
and
<code>b</code>
, leading to a different result when called with our data. It&#39;s easy to define templates like
<code>Affine</code>
ourselves (see
<a href="basics.html#The-Template-1">
The Template
</a>
), but Flux provides
<code>Affine</code>
2017-02-01 08:59:51 +00:00
out of the box, so we&#39;ll use that for now.
2017-01-18 23:16:38 +00:00
</p>
<h2>
<a class="nav-anchor" id="Combining-Models-1" href="#Combining-Models-1">
Combining Models
</a>
</h2>
<p>
<em>
... Inflating Graviton Zeppelins ...
</em>
</p>
<p>
A more complex model usually involves many basic layers like
<code>affine</code>
, where we use the output of one layer as the input to the next:
</p>
<pre><code class="language-julia">mymodel1(x) = softmax(affine2(σ(affine1(x))))
mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
<p>
This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
</p>
<pre><code class="language-julia">mymodel2 = Chain(affine1, σ, affine2, softmax)
mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
<p>
<code>mymodel2</code>
is exactly equivalent to
<code>mymodel1</code>
because it simply calls the provided functions in sequence. We don&#39;t have to predefine the affine layers and can also write this as:
</p>
<pre><code class="language-julia">mymodel3 = Chain(
Affine(5, 5), σ,
Affine(5, 5), softmax)</code></pre>
<p>
2017-01-18 23:26:14 +00:00
You now know enough to take a look at the
2017-01-18 23:16:38 +00:00
<a href="../examples/logreg.html">
logistic regression
</a>
example, if you haven&#39;t already.
</p>
<h2>
<a class="nav-anchor" id="A-Function-in-Model's-Clothing-1" href="#A-Function-in-Model's-Clothing-1">
A Function in Model&#39;s Clothing
</a>
</h2>
<p>
<em>
... Booting Dark Matter Transmogrifiers ...
</em>
</p>
<p>
2017-02-01 08:59:51 +00:00
We noted above that a &quot;model&quot; is a function with some number of trainable parameters. This goes both ways; a normal Julia function like
2017-01-18 23:16:38 +00:00
<code>exp</code>
2017-02-01 08:59:51 +00:00
is effectively a model with 0 parameters. Flux doesn&#39;t care, and anywhere that you use one, you can use the other. For example,
2017-01-18 23:16:38 +00:00
<code>Chain</code>
will happily work with regular functions:
</p>
<pre><code class="language-julia">foo = Chain(exp, sum, log)
foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))</code></pre>
2017-01-18 02:29:40 +00:00
<h2>
2017-01-18 23:16:38 +00:00
<a class="nav-anchor" id="The-Template-1" href="#The-Template-1">
The Template
2017-01-18 02:29:40 +00:00
</a>
</h2>
2017-01-18 23:16:38 +00:00
<p>
<em>
... Calculating Tax Expenses ...
</em>
</p>
<p>
2017-02-01 08:59:51 +00:00
So how does the
<code>Affine</code>
template work? We don&#39;t want to duplicate the code above whenever we need more than one affine layer:
</p>
<pre><code class="language-julia">W₁, b₁ = randn(...)
affine₁(x) = W₁*x + b₁
W₂, b₂ = randn(...)
affine₂(x) = W₂*x + b₂
model = Chain(affine₁, affine₂)</code></pre>
<p>
Here&#39;s one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:
</p>
<pre><code class="language-julia">type MyAffine
W
b
end
# Use the `MyAffine` layer as a model
(l::MyAffine)(x) = l.W * x + l.b
# Convenience constructor
MyAffine(in::Integer, out::Integer) =
MyAffine(randn(out, in), randn(out))
model = Chain(MyAffine(5, 5), MyAffine(5, 5))
model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]</code></pre>
<p>
This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the
<code>@net</code>
macro:
</p>
<pre><code class="language-julia">@net type MyAffine
W
b
x -&gt; W * x + b
end</code></pre>
<p>
The function provided,
<code>x -&gt; W * x + b</code>
, will be used when
<code>MyAffine</code>
is used as a model; it&#39;s just a shorter way of defining the
<code>(::MyAffine)(x)</code>
method above.
</p>
<p>
However,
<code>@net</code>
does not simply save us some keystrokes; it&#39;s the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.
</p>
<p>
The above code is almost exactly how
<code>Affine</code>
is defined in Flux itself! There&#39;s no difference between &quot;library-level&quot; and &quot;user-level&quot; models, so making your code reusable doesn&#39;t involve a lot of extra complexity. Moreover, much more complex models than
<code>Affine</code>
are equally simple to define, and equally close to the mathematical notation; read on to find out how.
2017-01-18 23:16:38 +00:00
</p>
2017-01-17 20:06:28 +00:00
<footer>
<hr/>
2017-01-18 01:18:15 +00:00
<a class="previous" href="../index.html">
2017-01-17 20:06:28 +00:00
<span class="direction">
Previous
</span>
<span class="title">
Home
</span>
</a>
2017-01-18 12:45:25 +00:00
<a class="next" href="recurrent.html">
2017-01-18 01:18:15 +00:00
<span class="direction">
Next
</span>
<span class="title">
2017-01-18 12:45:25 +00:00
Recurrence
2017-01-18 01:18:15 +00:00
</span>
</a>
2017-01-17 20:06:28 +00:00
</footer>
</article>
</body>
</html>