Flux.jl/latest/models/basics.html

377 lines
12 KiB
HTML
Raw Normal View History

2017-01-16 16:51:09 +00:00
<!DOCTYPE html>
2017-01-17 20:06:28 +00:00
<html lang="en">
<head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>
2017-02-01 13:48:25 +00:00
Model Building Basics · Flux
2017-01-17 20:06:28 +00:00
</title>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
2017-01-16 16:51:09 +00:00
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-36890222-9', 'auto');
ga('send', 'pageview');
2017-01-17 20:06:28 +00:00
</script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
<link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
2017-01-18 01:18:15 +00:00
<link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
2017-01-17 20:06:28 +00:00
<script>
2017-01-18 01:18:15 +00:00
documenterBaseURL=".."
2017-01-17 20:06:28 +00:00
</script>
2017-01-18 01:18:15 +00:00
<script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
<script src="../../versions.js"></script>
<link href="../../flux.css" rel="stylesheet" type="text/css"/>
2017-01-17 20:06:28 +00:00
</head>
<body>
<nav class="toc">
<h1>
Flux
</h1>
2017-01-18 01:18:15 +00:00
<form class="search" action="../search.html">
2017-01-17 20:06:28 +00:00
<select id="version-selector" onChange="window.location.href=this.value">
<option value="#" selected="selected" disabled="disabled">
Version
</option>
</select>
<input id="search-query" name="q" type="text" placeholder="Search docs"/>
</form>
<ul>
<li>
2017-01-18 01:18:15 +00:00
<a class="toctext" href="../index.html">
2017-01-17 20:06:28 +00:00
Home
</a>
</li>
2017-01-18 23:22:30 +00:00
<li>
<span class="toctext">
Building Models
</span>
<ul>
<li class="current">
<a class="toctext" href="basics.html">
2017-02-01 13:48:25 +00:00
Model Building Basics
2017-01-18 12:45:25 +00:00
</a>
2017-01-18 23:22:30 +00:00
<ul class="internal">
2017-05-03 18:18:35 +00:00
<li>
<a class="toctext" href="#Functions-1">
Functions
</a>
</li>
2017-01-18 23:22:30 +00:00
<li>
<a class="toctext" href="#The-Model-1">
2017-01-18 12:45:25 +00:00
The Model
2017-01-18 23:22:30 +00:00
</a>
</li>
<li>
2017-05-03 18:18:35 +00:00
<a class="toctext" href="#Layers-1">
Layers
</a>
</li>
<li>
<a class="toctext" href="#Combining-Layers-1">
Combining Layers
2017-01-18 23:22:30 +00:00
</a>
</li>
<li>
<a class="toctext" href="#A-Function-in-Model's-Clothing-1">
A Function in Model&#39;s Clothing
</a>
</li>
</ul>
2017-01-18 23:16:38 +00:00
</li>
2017-02-02 07:48:56 +00:00
<li>
<a class="toctext" href="templates.html">
Model Templates
</a>
</li>
2017-01-18 23:16:38 +00:00
<li>
2017-01-18 23:22:30 +00:00
<a class="toctext" href="recurrent.html">
Recurrence
2017-01-18 23:16:38 +00:00
</a>
</li>
<li>
2017-01-18 23:22:30 +00:00
<a class="toctext" href="debugging.html">
Debugging
2017-01-18 01:18:15 +00:00
</a>
</li>
2017-01-18 12:45:25 +00:00
</ul>
</li>
2017-02-18 15:11:53 +00:00
<li>
2017-02-20 10:53:09 +00:00
<span class="toctext">
Other APIs
</span>
<ul>
<li>
2017-02-20 11:05:06 +00:00
<a class="toctext" href="../apis/batching.html">
2017-02-18 15:11:53 +00:00
Batching
2017-02-20 10:53:09 +00:00
</a>
</li>
<li>
2017-02-20 11:05:06 +00:00
<a class="toctext" href="../apis/backends.html">
2017-02-18 15:11:53 +00:00
Backends
2017-02-20 10:53:09 +00:00
</a>
</li>
2017-02-28 16:50:27 +00:00
<li>
<a class="toctext" href="../apis/storage.html">
Storing Models
</a>
</li>
2017-02-20 10:53:09 +00:00
</ul>
2017-02-18 15:11:53 +00:00
</li>
2017-01-18 12:45:25 +00:00
<li>
<span class="toctext">
In Action
</span>
<ul>
<li>
<a class="toctext" href="../examples/logreg.html">
2017-03-09 00:26:06 +00:00
Simple MNIST
2017-01-18 01:18:15 +00:00
</a>
2017-01-17 20:06:28 +00:00
</li>
2017-02-28 16:21:45 +00:00
<li>
<a class="toctext" href="../examples/char-rnn.html">
Char RNN
</a>
</li>
2017-01-17 20:06:28 +00:00
</ul>
</li>
2017-01-18 01:18:15 +00:00
<li>
<a class="toctext" href="../contributing.html">
Contributing &amp; Help
</a>
</li>
2017-01-18 12:45:25 +00:00
<li>
<a class="toctext" href="../internals.html">
Internals
</a>
</li>
2017-01-17 20:06:28 +00:00
</ul>
</nav>
<article id="docs">
<header>
<nav>
<ul>
<li>
2017-01-18 23:22:30 +00:00
Building Models
</li>
<li>
2017-01-18 01:02:10 +00:00
<a href="basics.html">
2017-02-01 13:48:25 +00:00
Model Building Basics
2017-01-17 20:06:28 +00:00
</a>
</li>
</ul>
2017-05-03 18:18:35 +00:00
<a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/models/basics.md">
2017-01-17 20:06:28 +00:00
<span class="fa">
</span>
Edit on GitHub
</a>
</nav>
<hr/>
</header>
<h1>
2017-02-01 08:59:51 +00:00
<a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">
Model Building Basics
2017-01-17 20:06:28 +00:00
</a>
</h1>
2017-05-03 18:18:35 +00:00
<h2>
<a class="nav-anchor" id="Functions-1" href="#Functions-1">
Functions
</a>
</h2>
<p>
Flux&#39;s core feature is the
<code>@net</code>
macro, which adds some superpowers to regular ol&#39; Julia functions. Consider this simple function with the
<code>@net</code>
annotation applied:
</p>
<pre><code class="language-julia">@net f(x) = x .* x
f([1,2,3]) == [1,4,9]</code></pre>
<p>
This behaves as expected, but we have some extra features. For example, we can convert the function to run on
<a href="https://www.tensorflow.org/">
TensorFlow
</a>
or
<a href="https://github.com/dmlc/MXNet.jl">
MXNet
</a>
:
</p>
<pre><code class="language-julia">f_mxnet = mxnet(f)
f_mxnet([1,2,3]) == [1.0, 4.0, 9.0]</code></pre>
<p>
Simples! Flux took care of a lot of boilerplate for us and just ran the multiplication on MXNet. MXNet can optimise this code for us, taking advantage of parallelism or running the code on a GPU.
</p>
<p>
Using MXNet, we can get the gradient of the function, too:
</p>
<pre><code class="language-julia">back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0])</code></pre>
<p>
At first glance, this may seem broadly similar to building a graph in TensorFlow. The difference is that the Julia code still behaves like Julia code. Error messages continue to give you helpful stacktraces that pinpoint mistakes. You can step through the code in the debugger. The code only runs once when it&#39;s called, as usual, rather than once to build the graph and once to execute it.
</p>
2017-01-18 02:29:40 +00:00
<h2>
<a class="nav-anchor" id="The-Model-1" href="#The-Model-1">
The Model
</a>
</h2>
<p>
<em>
2017-01-18 23:16:38 +00:00
... Initialising Photon Beams ...
2017-01-18 02:29:40 +00:00
</em>
</p>
2017-01-18 01:18:15 +00:00
<p>
2017-01-18 23:16:38 +00:00
The core concept in Flux is the
2017-01-18 01:18:15 +00:00
<em>
2017-01-18 02:29:40 +00:00
model
2017-01-18 01:18:15 +00:00
</em>
2017-01-18 23:16:38 +00:00
. A model (or &quot;layer&quot;) is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):
2017-01-18 01:18:15 +00:00
</p>
2017-01-18 02:29:40 +00:00
<pre><code class="language-julia">W = randn(3,5)
b = randn(3)
2017-02-01 08:59:51 +00:00
affine(x) = W * x + b
2017-01-18 02:29:40 +00:00
2017-01-18 23:16:38 +00:00
x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]</code></pre>
<p>
<code>affine</code>
is simply a function which takes some vector
<code>x1</code>
and outputs a new one
<code>y1</code>
. For example,
<code>x1</code>
could be data from an image and
<code>y1</code>
could be predictions about the content of that image. However,
<code>affine</code>
isn&#39;t static. It has
<em>
parameters
</em>
<code>W</code>
and
<code>b</code>
, and if we tweak those parameters we&#39;ll tweak the result hopefully to make the predictions more accurate.
</p>
2017-05-03 18:18:35 +00:00
<h2>
<a class="nav-anchor" id="Layers-1" href="#Layers-1">
Layers
</a>
</h2>
2017-01-18 23:16:38 +00:00
<p>
This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a
<em>
template
</em>
which creates these functions for us:
</p>
<pre><code class="language-julia">affine1 = Affine(5, 5)
affine2 = Affine(5, 5)
softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]</code></pre>
<p>
We just created two separate
<code>Affine</code>
2017-03-04 14:07:25 +00:00
layers, and each contains its own (randomly initialised) version of
2017-01-18 23:16:38 +00:00
<code>W</code>
and
<code>b</code>
, leading to a different result when called with our data. It&#39;s easy to define templates like
<code>Affine</code>
ourselves (see
2017-03-04 14:00:54 +00:00
<a href="templates.html">
templates
2017-01-18 23:16:38 +00:00
</a>
), but Flux provides
<code>Affine</code>
2017-02-01 08:59:51 +00:00
out of the box, so we&#39;ll use that for now.
2017-01-18 23:16:38 +00:00
</p>
<h2>
2017-05-03 18:18:35 +00:00
<a class="nav-anchor" id="Combining-Layers-1" href="#Combining-Layers-1">
Combining Layers
2017-01-18 23:16:38 +00:00
</a>
</h2>
<p>
<em>
... Inflating Graviton Zeppelins ...
</em>
</p>
<p>
A more complex model usually involves many basic layers like
<code>affine</code>
, where we use the output of one layer as the input to the next:
</p>
<pre><code class="language-julia">mymodel1(x) = softmax(affine2(σ(affine1(x))))
mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
<p>
This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
</p>
<pre><code class="language-julia">mymodel2 = Chain(affine1, σ, affine2, softmax)
mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
<p>
<code>mymodel2</code>
is exactly equivalent to
<code>mymodel1</code>
because it simply calls the provided functions in sequence. We don&#39;t have to predefine the affine layers and can also write this as:
</p>
<pre><code class="language-julia">mymodel3 = Chain(
Affine(5, 5), σ,
Affine(5, 5), softmax)</code></pre>
<p>
2017-01-18 23:26:14 +00:00
You now know enough to take a look at the
2017-01-18 23:16:38 +00:00
<a href="../examples/logreg.html">
logistic regression
</a>
example, if you haven&#39;t already.
</p>
<h2>
<a class="nav-anchor" id="A-Function-in-Model's-Clothing-1" href="#A-Function-in-Model's-Clothing-1">
A Function in Model&#39;s Clothing
</a>
</h2>
<p>
<em>
... Booting Dark Matter Transmogrifiers ...
</em>
</p>
<p>
2017-02-01 08:59:51 +00:00
We noted above that a &quot;model&quot; is a function with some number of trainable parameters. This goes both ways; a normal Julia function like
2017-01-18 23:16:38 +00:00
<code>exp</code>
2017-02-01 08:59:51 +00:00
is effectively a model with 0 parameters. Flux doesn&#39;t care, and anywhere that you use one, you can use the other. For example,
2017-01-18 23:16:38 +00:00
<code>Chain</code>
will happily work with regular functions:
</p>
<pre><code class="language-julia">foo = Chain(exp, sum, log)
foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))</code></pre>
2017-01-17 20:06:28 +00:00
<footer>
<hr/>
2017-01-18 01:18:15 +00:00
<a class="previous" href="../index.html">
2017-01-17 20:06:28 +00:00
<span class="direction">
Previous
</span>
<span class="title">
Home
</span>
</a>
2017-02-02 07:48:56 +00:00
<a class="next" href="templates.html">
2017-01-18 01:18:15 +00:00
<span class="direction">
Next
</span>
<span class="title">
2017-02-02 07:48:56 +00:00
Model Templates
2017-01-18 01:18:15 +00:00
</span>
</a>
2017-01-17 20:06:28 +00:00
</footer>
</article>
</body>
</html>