diff --git a/latest/contributing.html b/latest/contributing.html index 74bddff2..9a9454ff 100644 --- a/latest/contributing.html +++ b/latest/contributing.html @@ -97,7 +97,7 @@ Contributing & Help - + diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html index 9424772f..7c12f1ab 100644 --- a/latest/examples/logreg.html +++ b/latest/examples/logreg.html @@ -100,7 +100,7 @@ Logistic Regression - + diff --git a/latest/index.html b/latest/index.html index 786be3fa..904c655a 100644 --- a/latest/index.html +++ b/latest/index.html @@ -97,7 +97,7 @@ Home - + diff --git a/latest/internals.html b/latest/internals.html index dc94a649..c73eeae6 100644 --- a/latest/internals.html +++ b/latest/internals.html @@ -97,7 +97,7 @@ Internals - + diff --git a/latest/manual/basics.html b/latest/manual/basics.html index da09dc22..bc25f778 100644 --- a/latest/manual/basics.html +++ b/latest/manual/basics.html @@ -63,8 +63,18 @@ The Model
  • - -An MNIST Example + +Combining Models + +
  • +
  • + +A Function in Model's Clothing + +
  • +
  • + +The Template
  • @@ -113,7 +123,7 @@ First Steps - + @@ -123,8 +133,8 @@ First Steps

    - -Basic Usage + +First Steps

    @@ -132,8 +142,14 @@ Basic Usage Installation

    +

    + +... Charging Ion Capacitors ... + +

    Pkg.clone("https://github.com/MikeInnes/DataFlow.jl")
    -Pkg.clone("https://github.com/MikeInnes/Flux.jl")
    +Pkg.clone("https://github.com/MikeInnes/Flux.jl") +using Flux

    The Model @@ -141,31 +157,146 @@ The Model

    -Charging Ion Capacitors... +... Initialising Photon Beams ...

    -The core concept in Flux is that of the +The core concept in Flux is the model -. A model is simply a function with parameters. In Julia, we might define the following function: +. A model (or "layer") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):

    W = randn(3,5)
     b = randn(3)
     affine(x) = W*x + b
     
    -x1 = randn(5)
    -affine(x1)
    -> 3-element Array{Float64,1}:
    -   -0.0215644
    -   -4.07343  
    -    0.312591
    +x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414] +y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823] +

    +affine + is simply a function which takes some vector +x1 + and outputs a new one +y1 +. For example, +x1 + could be data from an image and +y1 + could be predictions about the content of that image. However, +affine + isn't static. It has + +parameters + + +W + and +b +, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate. +

    +

    +This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a + +template + + which creates these functions for us: +

    +
    affine1 = Affine(5, 5)
    +affine2 = Affine(5, 5)
    +
    +softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
    +softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]
    +

    +We just created two separate +Affine + layers, and each contains its own version of +W + and +b +, leading to a different result when called with our data. It's easy to define templates like +Affine + ourselves (see + +The Template + +), but Flux provides +Affine + out of the box. +

    - -An MNIST Example + +Combining Models

    +

    + +... Inflating Graviton Zeppelins ... + +

    +

    +A more complex model usually involves many basic layers like +affine +, where we use the output of one layer as the input to the next: +

    +
    mymodel1(x) = softmax(affine2(σ(affine1(x))))
    +mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]
    +

    +This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us: +

    +
    mymodel2 = Chain(affine1, σ, affine2, softmax)
    +mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]
    +

    +mymodel2 + is exactly equivalent to +mymodel1 + because it simply calls the provided functions in sequence. We don't have to predefine the affine layers and can also write this as: +

    +
    mymodel3 = Chain(
    +  Affine(5, 5), σ,
    +  Affine(5, 5), softmax)
    +

    +You now know understand enough to take a look at the + +logistic regression + + example, if you haven't already. +

    +

    + +A Function in Model's Clothing + +

    +

    + +... Booting Dark Matter Transmogrifiers ... + +

    +

    +We noted above that a "model" is just a function with some trainable parameters. This goes both ways; a normal Julia function like +exp + is really just a model with 0 parameters. Flux doesn't care, and anywhere that you use one, you can use the other. For example, +Chain + will happily work with regular functions: +

    +
    foo = Chain(exp, sum, log)
    +foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))
    +

    +This unification opens up the floor for some powerful features, which we'll discuss later in the guide. +

    +

    + +The Template + +

    +

    + +... Calculating Tax Expenses ... + +

    +

    +[WIP] +