2017-01-16 16:51:09 +00:00
<!DOCTYPE html>
2017-01-17 20:06:28 +00:00
< html lang = "en" >
< head >
< meta charset = "UTF-8" / >
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" / >
< title >
2017-02-01 13:48:25 +00:00
Model Building Basics · Flux
2017-01-17 20:06:28 +00:00
< / title >
< script >
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
2017-01-16 16:51:09 +00:00
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-36890222-9', 'auto');
ga('send', 'pageview');
2017-01-17 20:06:28 +00:00
< / script >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel = "stylesheet" type = "text/css" / >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel = "stylesheet" type = "text/css" / >
2017-01-18 01:18:15 +00:00
< link href = "../assets/documenter.css" rel = "stylesheet" type = "text/css" / >
2017-01-17 20:06:28 +00:00
< script >
2017-01-18 01:18:15 +00:00
documenterBaseURL=".."
2017-01-17 20:06:28 +00:00
< / script >
2017-01-18 01:18:15 +00:00
< script src = "https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main = "../assets/documenter.js" > < / script >
< script src = "../../versions.js" > < / script >
< link href = "../../flux.css" rel = "stylesheet" type = "text/css" / >
2017-01-17 20:06:28 +00:00
< / head >
< body >
< nav class = "toc" >
< h1 >
Flux
< / h1 >
2017-01-18 01:18:15 +00:00
< form class = "search" action = "../search.html" >
2017-01-17 20:06:28 +00:00
< select id = "version-selector" onChange = "window.location.href=this.value" >
< option value = "#" selected = "selected" disabled = "disabled" >
Version
< / option >
< / select >
< input id = "search-query" name = "q" type = "text" placeholder = "Search docs" / >
< / form >
< ul >
< li >
2017-01-18 01:18:15 +00:00
< a class = "toctext" href = "../index.html" >
2017-01-17 20:06:28 +00:00
Home
< / a >
< / li >
2017-01-18 23:22:30 +00:00
< li >
< span class = "toctext" >
Building Models
< / span >
< ul >
< li class = "current" >
< a class = "toctext" href = "basics.html" >
2017-02-01 13:48:25 +00:00
Model Building Basics
2017-01-18 12:45:25 +00:00
< / a >
2017-01-18 23:22:30 +00:00
< ul class = "internal" >
2017-05-03 18:18:35 +00:00
< li >
2017-05-03 18:59:55 +00:00
< a class = "toctext" href = "#Net-Functions-1" >
Net Functions
2017-05-03 18:18:35 +00:00
< / a >
< / li >
2017-01-18 23:22:30 +00:00
< li >
< a class = "toctext" href = "#The-Model-1" >
2017-01-18 12:45:25 +00:00
The Model
2017-01-18 23:22:30 +00:00
< / a >
< / li >
2017-05-03 18:59:55 +00:00
< li >
< a class = "toctext" href = "#Parameters-1" >
Parameters
< / a >
< / li >
2017-01-18 23:22:30 +00:00
< li >
2017-05-03 18:18:35 +00:00
< a class = "toctext" href = "#Layers-1" >
Layers
< / a >
< / li >
< li >
< a class = "toctext" href = "#Combining-Layers-1" >
Combining Layers
2017-01-18 23:22:30 +00:00
< / a >
< / li >
< li >
2017-05-03 18:59:55 +00:00
< a class = "toctext" href = "#Dressed-like-a-model-1" >
Dressed like a model
2017-01-18 23:22:30 +00:00
< / a >
< / li >
< / ul >
2017-01-18 23:16:38 +00:00
< / li >
2017-02-02 07:48:56 +00:00
< li >
< a class = "toctext" href = "templates.html" >
Model Templates
< / a >
< / li >
2017-01-18 23:16:38 +00:00
< li >
2017-01-18 23:22:30 +00:00
< a class = "toctext" href = "recurrent.html" >
Recurrence
2017-01-18 23:16:38 +00:00
< / a >
< / li >
< li >
2017-01-18 23:22:30 +00:00
< a class = "toctext" href = "debugging.html" >
Debugging
2017-01-18 01:18:15 +00:00
< / a >
< / li >
2017-01-18 12:45:25 +00:00
< / ul >
< / li >
2017-02-18 15:11:53 +00:00
< li >
2017-02-20 10:53:09 +00:00
< span class = "toctext" >
Other APIs
< / span >
< ul >
< li >
2017-02-20 11:05:06 +00:00
< a class = "toctext" href = "../apis/batching.html" >
2017-02-18 15:11:53 +00:00
Batching
2017-02-20 10:53:09 +00:00
< / a >
< / li >
< li >
2017-02-20 11:05:06 +00:00
< a class = "toctext" href = "../apis/backends.html" >
2017-02-18 15:11:53 +00:00
Backends
2017-02-20 10:53:09 +00:00
< / a >
< / li >
2017-02-28 16:50:27 +00:00
< li >
< a class = "toctext" href = "../apis/storage.html" >
Storing Models
< / a >
< / li >
2017-02-20 10:53:09 +00:00
< / ul >
2017-02-18 15:11:53 +00:00
< / li >
2017-01-18 12:45:25 +00:00
< li >
< span class = "toctext" >
In Action
< / span >
< ul >
< li >
< a class = "toctext" href = "../examples/logreg.html" >
2017-03-09 00:26:06 +00:00
Simple MNIST
2017-01-18 01:18:15 +00:00
< / a >
2017-01-17 20:06:28 +00:00
< / li >
2017-02-28 16:21:45 +00:00
< li >
< a class = "toctext" href = "../examples/char-rnn.html" >
Char RNN
< / a >
< / li >
2017-01-17 20:06:28 +00:00
< / ul >
< / li >
2017-01-18 01:18:15 +00:00
< li >
< a class = "toctext" href = "../contributing.html" >
Contributing & Help
< / a >
< / li >
2017-01-18 12:45:25 +00:00
< li >
< a class = "toctext" href = "../internals.html" >
Internals
< / a >
< / li >
2017-01-17 20:06:28 +00:00
< / ul >
< / nav >
< article id = "docs" >
< header >
< nav >
< ul >
< li >
2017-01-18 23:22:30 +00:00
Building Models
< / li >
< li >
2017-01-18 01:02:10 +00:00
< a href = "basics.html" >
2017-02-01 13:48:25 +00:00
Model Building Basics
2017-01-17 20:06:28 +00:00
< / a >
< / li >
< / ul >
2017-05-04 16:19:01 +00:00
< a class = "edit-page" href = "https://github.com/MikeInnes/Flux.jl/tree/70615ff7f22e88cae79c8d8f88033f589434c39f/docs/src/models/basics.md" >
2017-01-17 20:06:28 +00:00
< span class = "fa" >
< / span >
Edit on GitHub
< / a >
< / nav >
< hr / >
< / header >
< h1 >
2017-02-01 08:59:51 +00:00
< a class = "nav-anchor" id = "Model-Building-Basics-1" href = "#Model-Building-Basics-1" >
Model Building Basics
2017-01-17 20:06:28 +00:00
< / a >
< / h1 >
2017-05-03 18:18:35 +00:00
< h2 >
2017-05-03 18:59:55 +00:00
< a class = "nav-anchor" id = "Net-Functions-1" href = "#Net-Functions-1" >
Net Functions
2017-05-03 18:18:35 +00:00
< / a >
< / h2 >
< p >
Flux' s core feature is the
< code > @net< / code >
macro, which adds some superpowers to regular ol' Julia functions. Consider this simple function with the
< code > @net< / code >
annotation applied:
< / p >
< pre > < code class = "language-julia" > @net f(x) = x .* x
f([1,2,3]) == [1,4,9]< / code > < / pre >
< p >
This behaves as expected, but we have some extra features. For example, we can convert the function to run on
< a href = "https://www.tensorflow.org/" >
TensorFlow
< / a >
2017-05-04 16:19:01 +00:00
or
2017-05-03 18:18:35 +00:00
< a href = "https://github.com/dmlc/MXNet.jl" >
MXNet
< / a >
:
< / p >
< pre > < code class = "language-julia" > f_mxnet = mxnet(f)
f_mxnet([1,2,3]) == [1.0, 4.0, 9.0]< / code > < / pre >
< p >
Simples! Flux took care of a lot of boilerplate for us and just ran the multiplication on MXNet. MXNet can optimise this code for us, taking advantage of parallelism or running the code on a GPU.
< / p >
< p >
Using MXNet, we can get the gradient of the function, too:
< / p >
2017-05-04 16:19:01 +00:00
< pre > < code class = "language-julia" > back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0],)< / code > < / pre >
2017-05-03 18:18:35 +00:00
< p >
2017-05-03 18:59:55 +00:00
< code > f< / code >
is effectively
< code > x^2< / code >
, so the gradient is
< code > 2x< / code >
as expected.
2017-05-03 18:18:35 +00:00
< / p >
2017-01-18 02:29:40 +00:00
< h2 >
< a class = "nav-anchor" id = "The-Model-1" href = "#The-Model-1" >
The Model
< / a >
< / h2 >
< p >
2017-01-18 23:16:38 +00:00
The core concept in Flux is the
2017-01-18 01:18:15 +00:00
< em >
2017-01-18 02:29:40 +00:00
model
2017-01-18 01:18:15 +00:00
< / em >
2017-05-03 18:59:55 +00:00
. This corresponds to what might be called a " layer" or " module" in other frameworks. A model is simply a differentiable function with parameters. Given a model
< code > m< / code >
we can do things like:
< / p >
< pre > < code class = "language-julia" > m(x) # See what the model does to an input vector `x`
back!(m, Δ, x) # backpropogate the gradient `Δ` through `m`
update!(m, η) # update the parameters of `m` using the gradient< / code > < / pre >
< p >
We can implement a model however we like as long as it fits this interface. But as hinted above,
< code > @net< / code >
2017-05-04 16:19:01 +00:00
is a particularly easy way to do it, because it gives you these functions for free.
2017-05-03 18:59:55 +00:00
< / p >
< h2 >
< a class = "nav-anchor" id = "Parameters-1" href = "#Parameters-1" >
Parameters
< / a >
< / h2 >
< p >
Consider how we' d write a logistic regression. We just take the Julia code and add
< code > @net< / code >
.
2017-01-18 01:18:15 +00:00
< / p >
2017-05-04 16:19:01 +00:00
< pre > < code class = "language-julia" > @net logistic(W, b, x) = softmax(x * W .+ b)
W = randn(10, 2)
b = randn(1, 2)
x = rand(1, 10) # [0.563 0.346 0.780 …] – fake data
y = [1 0] # our desired classification of `x`
2017-01-18 02:29:40 +00:00
2017-05-04 16:19:01 +00:00
ŷ = logistic(W, b, x) # [0.46 0.54]< / code > < / pre >
< p >
The network takes a set of 10 features (
< code > x< / code >
, a row vector) and produces a classification
< code > ŷ< / code >
, equivalent to a probability of true vs false.
< code > softmax< / code >
scales the output to sum to one, so that we can interpret it as a probability distribution.
< / p >
< p >
We can use MXNet and get gradients:
< / p >
< pre > < code class = "language-julia" > logisticm = mxnet(logistic)
logisticm(W, b, x) # [0.46 0.54]
back!(logisticm, [0.1 -0.1], W, b, x) # (dW, db, dx)< / code > < / pre >
2017-01-18 23:16:38 +00:00
< p >
2017-05-04 16:19:01 +00:00
The gradient
< code > [0.1 -0.1]< / code >
says that we want to increase
< code > ŷ[1]< / code >
and decrease
< code > ŷ[2]< / code >
to get closer to
< code > y< / code >
.
< code > back!< / code >
gives us the tweaks we need to make to each input (
< code > W< / code >
,
< code > b< / code >
,
< code > x< / code >
) in order to do this. If we add these tweaks to
< code > W< / code >
and
< code > b< / code >
it will predict
< code > ŷ< / code >
more accurately.
2017-01-18 23:16:38 +00:00
< / p >
2017-05-04 16:19:01 +00:00
< p >
Treating parameters like
< code > W< / code >
and
< code > b< / code >
as inputs can get unwieldy in larger networks. Since they are both global we can use them directly:
< / p >
< pre > < code class = "language-julia" > @net logistic(x) = softmax(x * W .+ b)< / code > < / pre >
< p >
However, this gives us a problem: how do we get their gradients?
< / p >
< p >
Flux solves this with the
< code > Param< / code >
wrapper:
< / p >
< pre > < code class = "language-julia" > W = param(randn(10, 2))
b = param(randn(1, 2))
@net logistic(x) = softmax(x * W .+ b)< / code > < / pre >
< p >
This works as before, but now
< code > W.x< / code >
stores the real value and
< code > W.Δx< / code >
stores its gradient, so we don' t have to manage it by hand. We can even use
< code > update!< / code >
to apply the gradients automatically.
< / p >
< pre > < code class = "language-julia" > logisticm(x) # [0.46, 0.54]
back!(logisticm, [-1 1], x)
update!(logisticm, 0.1)
logisticm(x) # [0.51, 0.49]< / code > < / pre >
< p >
Our network got a little closer to the target
< code > y< / code >
. Now we just need to repeat this millions of times.
< / p >
< p >
< em >
Side note:
< / em >
We obviously need a way to calculate the " tweak"
< code > [0.1, -0.1]< / code >
automatically. We can use a loss function like
< em >
mean squared error
< / em >
for this:
< / p >
< pre > < code class = "language-julia" > # How wrong is ŷ?
mse([0.46, 0.54], [1, 0]) == 0.292
# What change to `ŷ` will reduce the wrongness?
back!(mse, -1, [0.46, 0.54], [1, 0]) == [0.54 -0.54]< / code > < / pre >
2017-05-03 18:18:35 +00:00
< h2 >
< a class = "nav-anchor" id = "Layers-1" href = "#Layers-1" >
Layers
< / a >
< / h2 >
2017-01-18 23:16:38 +00:00
< p >
2017-05-03 18:59:55 +00:00
Bigger networks contain many affine transformations like
< code > W * x + b< / code >
. We don' t want to write out the definition every time we use it. Instead, we can factor this out by making a function that produces models:
2017-01-18 23:16:38 +00:00
< / p >
2017-05-03 18:59:55 +00:00
< pre > < code class = "language-julia" > function create_affine(in, out)
2017-05-04 16:19:01 +00:00
W = param(randn(out,in))
b = param(randn(out))
2017-05-03 18:59:55 +00:00
@net x -> W * x + b
end
2017-01-18 23:16:38 +00:00
2017-05-03 18:59:55 +00:00
affine1 = create_affine(3,2)
affine1([1,2,3])< / code > < / pre >
2017-01-18 23:16:38 +00:00
< p >
2017-05-03 18:59:55 +00:00
Flux has a
2017-03-04 14:00:54 +00:00
< a href = "templates.html" >
2017-05-03 18:59:55 +00:00
more powerful syntax
2017-01-18 23:16:38 +00:00
< / a >
2017-05-03 18:59:55 +00:00
for this pattern, but also provides a bunch of layers out of the box. So we can instead write:
2017-01-18 23:16:38 +00:00
< / p >
2017-05-03 18:59:55 +00:00
< pre > < code class = "language-julia" > affine1 = Affine(5, 5)
affine2 = Affine(5, 5)
2017-05-04 16:19:01 +00:00
softmax(affine1(x)) # [0.167952 0.186325 0.176683 0.238571 0.23047]
softmax(affine2(x)) # [0.125361 0.246448 0.21966 0.124596 0.283935]< / code > < / pre >
2017-01-18 23:16:38 +00:00
< h2 >
2017-05-03 18:18:35 +00:00
< a class = "nav-anchor" id = "Combining-Layers-1" href = "#Combining-Layers-1" >
Combining Layers
2017-01-18 23:16:38 +00:00
< / a >
< / h2 >
< p >
A more complex model usually involves many basic layers like
< code > affine< / code >
, where we use the output of one layer as the input to the next:
< / p >
< pre > < code class = "language-julia" > mymodel1(x) = softmax(affine2(σ (affine1(x))))
mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]< / code > < / pre >
< p >
This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
< / p >
< pre > < code class = "language-julia" > mymodel2 = Chain(affine1, σ , affine2, softmax)
mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]< / code > < / pre >
< p >
< code > mymodel2< / code >
is exactly equivalent to
< code > mymodel1< / code >
because it simply calls the provided functions in sequence. We don' t have to predefine the affine layers and can also write this as:
< / p >
< pre > < code class = "language-julia" > mymodel3 = Chain(
Affine(5, 5), σ ,
Affine(5, 5), softmax)< / code > < / pre >
< h2 >
2017-05-03 18:59:55 +00:00
< a class = "nav-anchor" id = "Dressed-like-a-model-1" href = "#Dressed-like-a-model-1" >
Dressed like a model
2017-01-18 23:16:38 +00:00
< / a >
< / h2 >
< p >
2017-05-03 18:59:55 +00:00
We noted above that a model is a function with trainable parameters. Normal functions like
2017-01-18 23:16:38 +00:00
< code > exp< / code >
2017-05-03 19:17:54 +00:00
are actually models too – they just happen to have 0 parameters. Flux doesn' t care, and anywhere that you use one, you can use the other. For example,
2017-01-18 23:16:38 +00:00
< code > Chain< / code >
will happily work with regular functions:
< / p >
< pre > < code class = "language-julia" > foo = Chain(exp, sum, log)
foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))< / code > < / pre >
2017-01-17 20:06:28 +00:00
< footer >
< hr / >
2017-01-18 01:18:15 +00:00
< a class = "previous" href = "../index.html" >
2017-01-17 20:06:28 +00:00
< span class = "direction" >
Previous
< / span >
< span class = "title" >
Home
< / span >
< / a >
2017-02-02 07:48:56 +00:00
< a class = "next" href = "templates.html" >
2017-01-18 01:18:15 +00:00
< span class = "direction" >
Next
< / span >
< span class = "title" >
2017-02-02 07:48:56 +00:00
Model Templates
2017-01-18 01:18:15 +00:00
< / span >
< / a >
2017-01-17 20:06:28 +00:00
< / footer >
< / article >
< / body >
< / html >