2017-03-01 12:37:00 +00:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "UTF-8" / >
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" / >
< title >
Model Building Basics · Flux
< / title >
< script >
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-36890222-9', 'auto');
ga('send', 'pageview');
< / script >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel = "stylesheet" type = "text/css" / >
< link href = "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel = "stylesheet" type = "text/css" / >
< link href = "../assets/documenter.css" rel = "stylesheet" type = "text/css" / >
< script >
documenterBaseURL=".."
< / script >
< script src = "https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main = "../assets/documenter.js" > < / script >
< script src = "../../versions.js" > < / script >
< link href = "../../flux.css" rel = "stylesheet" type = "text/css" / >
< / head >
< body >
< nav class = "toc" >
< h1 >
Flux
< / h1 >
< form class = "search" action = "../search.html" >
< select id = "version-selector" onChange = "window.location.href=this.value" >
< option value = "#" selected = "selected" disabled = "disabled" >
Version
< / option >
< / select >
< input id = "search-query" name = "q" type = "text" placeholder = "Search docs" / >
< / form >
< ul >
< li >
< a class = "toctext" href = "../index.html" >
Home
< / a >
< / li >
< li >
< span class = "toctext" >
Building Models
< / span >
< ul >
< li class = "current" >
< a class = "toctext" href = "basics.html" >
Model Building Basics
< / a >
< ul class = "internal" >
< li >
< a class = "toctext" href = "#The-Model-1" >
The Model
< / a >
< / li >
< li >
< a class = "toctext" href = "#Combining-Models-1" >
Combining Models
< / a >
< / li >
< li >
< a class = "toctext" href = "#A-Function-in-Model's-Clothing-1" >
A Function in Model' s Clothing
< / a >
< / li >
< / ul >
< / li >
< li >
< a class = "toctext" href = "templates.html" >
Model Templates
< / a >
< / li >
< li >
< a class = "toctext" href = "recurrent.html" >
Recurrence
< / a >
< / li >
< li >
< a class = "toctext" href = "debugging.html" >
Debugging
< / a >
< / li >
< / ul >
< / li >
< li >
< span class = "toctext" >
Other APIs
< / span >
< ul >
< li >
< a class = "toctext" href = "../apis/batching.html" >
Batching
< / a >
< / li >
< li >
< a class = "toctext" href = "../apis/backends.html" >
Backends
< / a >
< / li >
< li >
< a class = "toctext" href = "../apis/storage.html" >
Storing Models
< / a >
< / li >
< / ul >
< / li >
< li >
< span class = "toctext" >
In Action
< / span >
< ul >
< li >
< a class = "toctext" href = "../examples/logreg.html" >
2017-05-02 13:01:23 +00:00
Simple MNIST
2017-03-01 12:37:00 +00:00
< / a >
< / li >
< li >
< a class = "toctext" href = "../examples/char-rnn.html" >
Char RNN
< / a >
< / li >
< / ul >
< / li >
< li >
< a class = "toctext" href = "../contributing.html" >
Contributing & Help
< / a >
< / li >
< li >
< a class = "toctext" href = "../internals.html" >
Internals
< / a >
< / li >
< / ul >
< / nav >
< article id = "docs" >
< header >
< nav >
< ul >
< li >
Building Models
< / li >
< li >
< a href = "basics.html" >
Model Building Basics
< / a >
< / li >
< / ul >
2017-05-02 13:01:23 +00:00
< a class = "edit-page" href = "https://github.com/MikeInnes/Flux.jl/tree/efcb9650da31c183b94b839f66aa3467d007c33f/docs/src/models/basics.md" >
2017-03-01 12:37:00 +00:00
< span class = "fa" >
< / span >
Edit on GitHub
< / a >
< / nav >
< hr / >
< / header >
< h1 >
< a class = "nav-anchor" id = "Model-Building-Basics-1" href = "#Model-Building-Basics-1" >
Model Building Basics
< / a >
< / h1 >
< h2 >
< a class = "nav-anchor" id = "The-Model-1" href = "#The-Model-1" >
The Model
< / a >
< / h2 >
< p >
< em >
... Initialising Photon Beams ...
< / em >
< / p >
< p >
The core concept in Flux is the
< em >
model
< / em >
. A model (or " layer" ) is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):
< / p >
< pre > < code class = "language-julia" > W = randn(3,5)
b = randn(3)
affine(x) = W * x + b
x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]< / code > < / pre >
< p >
< code > affine< / code >
is simply a function which takes some vector
< code > x1< / code >
and outputs a new one
< code > y1< / code >
. For example,
< code > x1< / code >
could be data from an image and
< code > y1< / code >
could be predictions about the content of that image. However,
< code > affine< / code >
isn' t static. It has
< em >
parameters
< / em >
< code > W< / code >
and
< code > b< / code >
, and if we tweak those parameters we' ll tweak the result – hopefully to make the predictions more accurate.
< / p >
< p >
This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a
< em >
template
< / em >
which creates these functions for us:
< / p >
< pre > < code class = "language-julia" > affine1 = Affine(5, 5)
affine2 = Affine(5, 5)
softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]< / code > < / pre >
< p >
We just created two separate
< code > Affine< / code >
2017-03-09 00:13:08 +00:00
layers, and each contains its own (randomly initialised) version of
2017-03-01 12:37:00 +00:00
< code > W< / code >
and
< code > b< / code >
, leading to a different result when called with our data. It' s easy to define templates like
< code > Affine< / code >
ourselves (see
2017-03-09 00:13:08 +00:00
< a href = "templates.html" >
templates
2017-03-01 12:37:00 +00:00
< / a >
), but Flux provides
< code > Affine< / code >
out of the box, so we' ll use that for now.
< / p >
< h2 >
< a class = "nav-anchor" id = "Combining-Models-1" href = "#Combining-Models-1" >
Combining Models
< / a >
< / h2 >
< p >
< em >
... Inflating Graviton Zeppelins ...
< / em >
< / p >
< p >
A more complex model usually involves many basic layers like
< code > affine< / code >
, where we use the output of one layer as the input to the next:
< / p >
< pre > < code class = "language-julia" > mymodel1(x) = softmax(affine2(σ (affine1(x))))
mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]< / code > < / pre >
< p >
This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
< / p >
< pre > < code class = "language-julia" > mymodel2 = Chain(affine1, σ , affine2, softmax)
mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]< / code > < / pre >
< p >
< code > mymodel2< / code >
is exactly equivalent to
< code > mymodel1< / code >
because it simply calls the provided functions in sequence. We don' t have to predefine the affine layers and can also write this as:
< / p >
< pre > < code class = "language-julia" > mymodel3 = Chain(
Affine(5, 5), σ ,
Affine(5, 5), softmax)< / code > < / pre >
< p >
You now know enough to take a look at the
< a href = "../examples/logreg.html" >
logistic regression
< / a >
example, if you haven' t already.
< / p >
< h2 >
< a class = "nav-anchor" id = "A-Function-in-Model's-Clothing-1" href = "#A-Function-in-Model's-Clothing-1" >
A Function in Model' s Clothing
< / a >
< / h2 >
< p >
< em >
... Booting Dark Matter Transmogrifiers ...
< / em >
< / p >
< p >
We noted above that a " model" is a function with some number of trainable parameters. This goes both ways; a normal Julia function like
< code > exp< / code >
is effectively a model with 0 parameters. Flux doesn' t care, and anywhere that you use one, you can use the other. For example,
< code > Chain< / code >
will happily work with regular functions:
< / p >
< pre > < code class = "language-julia" > foo = Chain(exp, sum, log)
foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))< / code > < / pre >
< footer >
< hr / >
< a class = "previous" href = "../index.html" >
< span class = "direction" >
Previous
< / span >
< span class = "title" >
Home
< / span >
< / a >
< a class = "next" href = "templates.html" >
< span class = "direction" >
Next
< / span >
< span class = "title" >
Model Templates
< / span >
< / a >
< / footer >
< / article >
< / body >
< / html >