2017-01-16 16:51:09 +00:00
<!DOCTYPE html>
2017-01-17 20:06:28 +00:00
< html lang = "en" >
< head >
< meta charset = "UTF-8" / >
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" / >
< title >
2017-02-01 13:48:25 +00:00
Model Building Basics · Flux
2017-01-17 20:06:28 +00:00
< / title >
< script >
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
2017-01-16 16:51:09 +00:00
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-36890222-9', 'auto');
ga('send', 'pageview');
2017-01-17 20:06:28 +00:00
< / script >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel = "stylesheet" type = "text/css" / >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel = "stylesheet" type = "text/css" / >
2017-01-18 01:18:15 +00:00
< link href = "../assets/documenter.css" rel = "stylesheet" type = "text/css" / >
2017-01-17 20:06:28 +00:00
< script >
2017-01-18 01:18:15 +00:00
documenterBaseURL=".."
2017-01-17 20:06:28 +00:00
< / script >
2017-01-18 01:18:15 +00:00
< script src = "https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main = "../assets/documenter.js" > < / script >
< script src = "../../versions.js" > < / script >
< link href = "../../flux.css" rel = "stylesheet" type = "text/css" / >
2017-01-17 20:06:28 +00:00
< / head >
< body >
< nav class = "toc" >
< h1 >
Flux
< / h1 >
2017-01-18 01:18:15 +00:00
< form class = "search" action = "../search.html" >
2017-01-17 20:06:28 +00:00
< select id = "version-selector" onChange = "window.location.href=this.value" >
< option value = "#" selected = "selected" disabled = "disabled" >
Version
< / option >
< / select >
< input id = "search-query" name = "q" type = "text" placeholder = "Search docs" / >
< / form >
< ul >
< li >
2017-01-18 01:18:15 +00:00
< a class = "toctext" href = "../index.html" >
2017-01-17 20:06:28 +00:00
Home
< / a >
< / li >
2017-01-18 23:22:30 +00:00
< li >
< span class = "toctext" >
Building Models
< / span >
< ul >
< li class = "current" >
< a class = "toctext" href = "basics.html" >
2017-02-01 13:48:25 +00:00
Model Building Basics
2017-01-18 12:45:25 +00:00
< / a >
2017-01-18 23:22:30 +00:00
< ul class = "internal" >
2017-05-03 18:18:35 +00:00
< li >
2017-05-03 18:59:55 +00:00
< a class = "toctext" href = "#Net-Functions-1" >
Net Functions
2017-05-03 18:18:35 +00:00
< / a >
< / li >
2017-01-18 23:22:30 +00:00
< li >
< a class = "toctext" href = "#The-Model-1" >
2017-01-18 12:45:25 +00:00
The Model
2017-01-18 23:22:30 +00:00
< / a >
< / li >
2017-05-03 18:59:55 +00:00
< li >
< a class = "toctext" href = "#Parameters-1" >
Parameters
< / a >
< / li >
2017-01-18 23:22:30 +00:00
< li >
2017-05-03 18:18:35 +00:00
< a class = "toctext" href = "#Layers-1" >
Layers
< / a >
< / li >
< li >
< a class = "toctext" href = "#Combining-Layers-1" >
Combining Layers
2017-01-18 23:22:30 +00:00
< / a >
< / li >
< li >
2017-05-03 18:59:55 +00:00
< a class = "toctext" href = "#Dressed-like-a-model-1" >
Dressed like a model
2017-01-18 23:22:30 +00:00
< / a >
< / li >
< / ul >
2017-01-18 23:16:38 +00:00
< / li >
2017-02-02 07:48:56 +00:00
< li >
< a class = "toctext" href = "templates.html" >
Model Templates
< / a >
< / li >
2017-01-18 23:16:38 +00:00
< li >
2017-01-18 23:22:30 +00:00
< a class = "toctext" href = "recurrent.html" >
Recurrence
2017-01-18 23:16:38 +00:00
< / a >
< / li >
< li >
2017-01-18 23:22:30 +00:00
< a class = "toctext" href = "debugging.html" >
Debugging
2017-01-18 01:18:15 +00:00
< / a >
< / li >
2017-01-18 12:45:25 +00:00
< / ul >
< / li >
2017-02-18 15:11:53 +00:00
< li >
2017-02-20 10:53:09 +00:00
< span class = "toctext" >
Other APIs
< / span >
< ul >
< li >
2017-02-20 11:05:06 +00:00
< a class = "toctext" href = "../apis/batching.html" >
2017-02-18 15:11:53 +00:00
Batching
2017-02-20 10:53:09 +00:00
< / a >
< / li >
< li >
2017-02-20 11:05:06 +00:00
< a class = "toctext" href = "../apis/backends.html" >
2017-02-18 15:11:53 +00:00
Backends
2017-02-20 10:53:09 +00:00
< / a >
< / li >
2017-02-28 16:50:27 +00:00
< li >
< a class = "toctext" href = "../apis/storage.html" >
Storing Models
< / a >
< / li >
2017-02-20 10:53:09 +00:00
< / ul >
2017-02-18 15:11:53 +00:00
< / li >
2017-01-18 12:45:25 +00:00
< li >
< span class = "toctext" >
In Action
< / span >
< ul >
< li >
< a class = "toctext" href = "../examples/logreg.html" >
2017-03-09 00:26:06 +00:00
Simple MNIST
2017-01-18 01:18:15 +00:00
< / a >
2017-01-17 20:06:28 +00:00
< / li >
2017-02-28 16:21:45 +00:00
< li >
< a class = "toctext" href = "../examples/char-rnn.html" >
Char RNN
< / a >
< / li >
2017-01-17 20:06:28 +00:00
< / ul >
< / li >
2017-01-18 01:18:15 +00:00
< li >
< a class = "toctext" href = "../contributing.html" >
Contributing & Help
< / a >
< / li >
2017-01-18 12:45:25 +00:00
< li >
< a class = "toctext" href = "../internals.html" >
Internals
< / a >
< / li >
2017-01-17 20:06:28 +00:00
< / ul >
< / nav >
< article id = "docs" >
< header >
< nav >
< ul >
< li >
2017-01-18 23:22:30 +00:00
Building Models
< / li >
< li >
2017-01-18 01:02:10 +00:00
< a href = "basics.html" >
2017-02-01 13:48:25 +00:00
Model Building Basics
2017-01-17 20:06:28 +00:00
< / a >
< / li >
< / ul >
2017-05-03 18:59:55 +00:00
< a class = "edit-page" href = "https://github.com/MikeInnes/Flux.jl/tree/1bba9631a2b912e69a305fc33446f9a0e29aeb7a/docs/src/models/basics.md" >
2017-01-17 20:06:28 +00:00
< span class = "fa" >
< / span >
Edit on GitHub
< / a >
< / nav >
< hr / >
< / header >
< h1 >
2017-02-01 08:59:51 +00:00
< a class = "nav-anchor" id = "Model-Building-Basics-1" href = "#Model-Building-Basics-1" >
Model Building Basics
2017-01-17 20:06:28 +00:00
< / a >
< / h1 >
2017-05-03 18:18:35 +00:00
< h2 >
2017-05-03 18:59:55 +00:00
< a class = "nav-anchor" id = "Net-Functions-1" href = "#Net-Functions-1" >
Net Functions
2017-05-03 18:18:35 +00:00
< / a >
< / h2 >
< p >
Flux' s core feature is the
< code > @net< / code >
macro, which adds some superpowers to regular ol' Julia functions. Consider this simple function with the
< code > @net< / code >
annotation applied:
< / p >
< pre > < code class = "language-julia" > @net f(x) = x .* x
f([1,2,3]) == [1,4,9]< / code > < / pre >
< p >
This behaves as expected, but we have some extra features. For example, we can convert the function to run on
< a href = "https://www.tensorflow.org/" >
TensorFlow
< / a >
or
< a href = "https://github.com/dmlc/MXNet.jl" >
MXNet
< / a >
:
< / p >
< pre > < code class = "language-julia" > f_mxnet = mxnet(f)
f_mxnet([1,2,3]) == [1.0, 4.0, 9.0]< / code > < / pre >
< p >
Simples! Flux took care of a lot of boilerplate for us and just ran the multiplication on MXNet. MXNet can optimise this code for us, taking advantage of parallelism or running the code on a GPU.
< / p >
< p >
Using MXNet, we can get the gradient of the function, too:
< / p >
< pre > < code class = "language-julia" > back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0])< / code > < / pre >
< p >
2017-05-03 18:59:55 +00:00
< code > f< / code >
is effectively
< code > x^2< / code >
, so the gradient is
< code > 2x< / code >
as expected.
< / p >
< p >
For TensorFlow users this may seem similar to building a graph as usual. The difference is that Julia code still behaves like Julia code. Error messages give you helpful stacktraces that pinpoint mistakes. You can step through the code in the debugger. The code runs when it' s called, as usual, rather than running once to build the graph and then again to execute it.
2017-05-03 18:18:35 +00:00
< / p >
2017-01-18 02:29:40 +00:00
< h2 >
< a class = "nav-anchor" id = "The-Model-1" href = "#The-Model-1" >
The Model
< / a >
< / h2 >
< p >
2017-01-18 23:16:38 +00:00
The core concept in Flux is the
2017-01-18 01:18:15 +00:00
< em >
2017-01-18 02:29:40 +00:00
model
2017-01-18 01:18:15 +00:00
< / em >
2017-05-03 18:59:55 +00:00
. This corresponds to what might be called a " layer" or " module" in other frameworks. A model is simply a differentiable function with parameters. Given a model
< code > m< / code >
we can do things like:
< / p >
< pre > < code class = "language-julia" > m(x) # See what the model does to an input vector `x`
back!(m, Δ, x) # backpropogate the gradient `Δ` through `m`
update!(m, η) # update the parameters of `m` using the gradient< / code > < / pre >
< p >
We can implement a model however we like as long as it fits this interface. But as hinted above,
< code > @net< / code >
is a particularly easy way to do it, as
< code > @net< / code >
functions are models already.
< / p >
< h2 >
< a class = "nav-anchor" id = "Parameters-1" href = "#Parameters-1" >
Parameters
< / a >
< / h2 >
< p >
Consider how we' d write a logistic regression. We just take the Julia code and add
< code > @net< / code >
.
2017-01-18 01:18:15 +00:00
< / p >
2017-01-18 02:29:40 +00:00
< pre > < code class = "language-julia" > W = randn(3,5)
b = randn(3)
2017-05-03 18:59:55 +00:00
@net logistic(x) = softmax(W * x + b)
2017-01-18 02:29:40 +00:00
2017-01-18 23:16:38 +00:00
x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
2017-05-03 18:59:55 +00:00
y1 = logistic(x1) # [0.32676,0.0974173,0.575823]< / code > < / pre >
2017-01-18 23:16:38 +00:00
< p >
2017-05-03 18:59:55 +00:00
< !
–
TODO
–
>
2017-01-18 23:16:38 +00:00
< / p >
2017-05-03 18:18:35 +00:00
< h2 >
< a class = "nav-anchor" id = "Layers-1" href = "#Layers-1" >
Layers
< / a >
< / h2 >
2017-01-18 23:16:38 +00:00
< p >
2017-05-03 18:59:55 +00:00
Bigger networks contain many affine transformations like
< code > W * x + b< / code >
. We don' t want to write out the definition every time we use it. Instead, we can factor this out by making a function that produces models:
2017-01-18 23:16:38 +00:00
< / p >
2017-05-03 18:59:55 +00:00
< pre > < code class = "language-julia" > function create_affine(in, out)
W = randn(out,in)
b = randn(out)
@net x -> W * x + b
end
2017-01-18 23:16:38 +00:00
2017-05-03 18:59:55 +00:00
affine1 = create_affine(3,2)
affine1([1,2,3])< / code > < / pre >
2017-01-18 23:16:38 +00:00
< p >
2017-05-03 18:59:55 +00:00
Flux has a
2017-03-04 14:00:54 +00:00
< a href = "templates.html" >
2017-05-03 18:59:55 +00:00
more powerful syntax
2017-01-18 23:16:38 +00:00
< / a >
2017-05-03 18:59:55 +00:00
for this pattern, but also provides a bunch of layers out of the box. So we can instead write:
2017-01-18 23:16:38 +00:00
< / p >
2017-05-03 18:59:55 +00:00
< pre > < code class = "language-julia" > affine1 = Affine(5, 5)
affine2 = Affine(5, 5)
softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]< / code > < / pre >
2017-01-18 23:16:38 +00:00
< h2 >
2017-05-03 18:18:35 +00:00
< a class = "nav-anchor" id = "Combining-Layers-1" href = "#Combining-Layers-1" >
Combining Layers
2017-01-18 23:16:38 +00:00
< / a >
< / h2 >
< p >
A more complex model usually involves many basic layers like
< code > affine< / code >
, where we use the output of one layer as the input to the next:
< / p >
< pre > < code class = "language-julia" > mymodel1(x) = softmax(affine2(σ (affine1(x))))
mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]< / code > < / pre >
< p >
This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
< / p >
< pre > < code class = "language-julia" > mymodel2 = Chain(affine1, σ , affine2, softmax)
mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]< / code > < / pre >
< p >
< code > mymodel2< / code >
is exactly equivalent to
< code > mymodel1< / code >
because it simply calls the provided functions in sequence. We don' t have to predefine the affine layers and can also write this as:
< / p >
< pre > < code class = "language-julia" > mymodel3 = Chain(
Affine(5, 5), σ ,
Affine(5, 5), softmax)< / code > < / pre >
< p >
2017-01-18 23:26:14 +00:00
You now know enough to take a look at the
2017-01-18 23:16:38 +00:00
< a href = "../examples/logreg.html" >
logistic regression
< / a >
example, if you haven' t already.
< / p >
< h2 >
2017-05-03 18:59:55 +00:00
< a class = "nav-anchor" id = "Dressed-like-a-model-1" href = "#Dressed-like-a-model-1" >
Dressed like a model
2017-01-18 23:16:38 +00:00
< / a >
< / h2 >
< p >
2017-05-03 18:59:55 +00:00
We noted above that a model is a function with trainable parameters. Normal functions like
2017-01-18 23:16:38 +00:00
< code > exp< / code >
2017-05-03 18:59:55 +00:00
are actually models too, that happen to have 0 parameters. Flux doesn' t care, and anywhere that you use one, you can use the other. For example,
2017-01-18 23:16:38 +00:00
< code > Chain< / code >
will happily work with regular functions:
< / p >
< pre > < code class = "language-julia" > foo = Chain(exp, sum, log)
foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))< / code > < / pre >
2017-01-17 20:06:28 +00:00
< footer >
< hr / >
2017-01-18 01:18:15 +00:00
< a class = "previous" href = "../index.html" >
2017-01-17 20:06:28 +00:00
< span class = "direction" >
Previous
< / span >
< span class = "title" >
Home
< / span >
< / a >
2017-02-02 07:48:56 +00:00
< a class = "next" href = "templates.html" >
2017-01-18 01:18:15 +00:00
< span class = "direction" >
Next
< / span >
< span class = "title" >
2017-02-02 07:48:56 +00:00
Model Templates
2017-01-18 01:18:15 +00:00
< / span >
< / a >
2017-01-17 20:06:28 +00:00
< / footer >
< / article >
< / body >
< / html >