diff --git a/latest/apis/backends.html b/latest/apis/backends.html index 8f76d9d5..c665c96b 100644 --- a/latest/apis/backends.html +++ b/latest/apis/backends.html @@ -150,7 +150,7 @@ Backends - + diff --git a/latest/apis/batching.html b/latest/apis/batching.html index 960ba30c..3ec4f4f6 100644 --- a/latest/apis/batching.html +++ b/latest/apis/batching.html @@ -155,7 +155,7 @@ Batching - + diff --git a/latest/apis/storage.html b/latest/apis/storage.html index 71db3cf0..eca52ecc 100644 --- a/latest/apis/storage.html +++ b/latest/apis/storage.html @@ -139,7 +139,7 @@ Storing Models - + diff --git a/latest/contributing.html b/latest/contributing.html index c00b8b86..5ace262f 100644 --- a/latest/contributing.html +++ b/latest/contributing.html @@ -136,7 +136,7 @@ Contributing & Help - + diff --git a/latest/examples/char-rnn.html b/latest/examples/char-rnn.html index 74a24473..41af6884 100644 --- a/latest/examples/char-rnn.html +++ b/latest/examples/char-rnn.html @@ -139,7 +139,7 @@ Char RNN - + diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html index 621dfa68..ef7c6e7c 100644 --- a/latest/examples/logreg.html +++ b/latest/examples/logreg.html @@ -139,7 +139,7 @@ Simple MNIST - + diff --git a/latest/index.html b/latest/index.html index f89c2c9f..a28871ad 100644 --- a/latest/index.html +++ b/latest/index.html @@ -147,7 +147,7 @@ Home - + @@ -204,10 +204,10 @@ The examples - give a feel for high-level usage. This a great way to start if you're a relative newbie to machine learning or neural networks; you can get up and running running easily. + give a feel for high-level usage.
-If you have more experience with ML, or you just don't want to see +If you want to know why Flux is unique, or just don't want to see those digits @@ -215,7 +215,14 @@ those digits model building guide - instead. The guide attempts to show how Flux's abstractions are built up and why it's powerful, but it's not all necessary to get started. + instead. +
++Flux is meant to be played with. These docs have lots of code snippets; try them out in + +Juno + +!
Using MXNet, we can get the gradient of the function, too:
-back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0])
+back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0],)
f
is effectively
@@ -225,9 +225,6 @@ Using MXNet, we can get the gradient of the function, too:
, so the gradient is
2x
as expected.
-
-For TensorFlow users this may seem similar to building a graph as usual. The difference is that Julia code still behaves like Julia code. Error messages give you helpful stacktraces that pinpoint mistakes. You can step through the code in the debugger. The code runs when it's called, as usual, rather than running once to build the graph and then again to execute it.
We can implement a model however we like as long as it fits this interface. But as hinted above,
@net
- is a particularly easy way to do it, as
-@net
- functions are models already.
+ is a particularly easy way to do it, because it gives you these functions for free.
@net
.
-W = randn(3,5)
-b = randn(3)
-@net logistic(x) = softmax(W * x + b)
+@net logistic(W, b, x) = softmax(x * W .+ b)
-x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
-y1 = logistic(x1) # [0.32676,0.0974173,0.575823]
+W = randn(10, 2)
+b = randn(1, 2)
+x = rand(1, 10) # [0.563 0.346 0.780 …] – fake data
+y = [1 0] # our desired classification of `x`
+
+ŷ = logistic(W, b, x) # [0.46 0.54]
-<!
-–
- TODO
-–
->
+The network takes a set of 10 features (
+x
+, a row vector) and produces a classification
+ŷ
+, equivalent to a probability of true vs false.
+softmax
+ scales the output to sum to one, so that we can interpret it as a probability distribution.
+We can use MXNet and get gradients: +
+logisticm = mxnet(logistic)
+logisticm(W, b, x) # [0.46 0.54]
+back!(logisticm, [0.1 -0.1], W, b, x) # (dW, db, dx)
+
+The gradient
+[0.1 -0.1]
+ says that we want to increase
+ŷ[1]
+ and decrease
+ŷ[2]
+ to get closer to
+y
+.
+back!
+ gives us the tweaks we need to make to each input (
+W
+,
+b
+,
+x
+) in order to do this. If we add these tweaks to
+W
+ and
+b
+ it will predict
+ŷ
+ more accurately.
+
+Treating parameters like
+W
+ and
+b
+ as inputs can get unwieldy in larger networks. Since they are both global we can use them directly:
+
@net logistic(x) = softmax(x * W .+ b)
+ +However, this gives us a problem: how do we get their gradients? +
+
+Flux solves this with the
+Param
+ wrapper:
+
W = param(randn(10, 2))
+b = param(randn(1, 2))
+@net logistic(x) = softmax(x * W .+ b)
+
+This works as before, but now
+W.x
+ stores the real value and
+W.Δx
+ stores its gradient, so we don't have to manage it by hand. We can even use
+update!
+ to apply the gradients automatically.
+
logisticm(x) # [0.46, 0.54]
+
+back!(logisticm, [-1 1], x)
+update!(logisticm, 0.1)
+
+logisticm(x) # [0.51, 0.49]
+
+Our network got a little closer to the target
+y
+. Now we just need to repeat this millions of times.
+
+
+Side note:
+
+ We obviously need a way to calculate the "tweak"
+[0.1, -0.1]
+ automatically. We can use a loss function like
+
+mean squared error
+
+ for this:
+
# How wrong is ŷ?
+mse([0.46, 0.54], [1, 0]) == 0.292
+# What change to `ŷ` will reduce the wrongness?
+back!(mse, -1, [0.46, 0.54], [1, 0]) == [0.54 -0.54]
function create_affine(in, out)
- W = randn(out,in)
- b = randn(out)
+ W = param(randn(out,in))
+ b = param(randn(out))
@net x -> W * x + b
end
@@ -304,8 +389,8 @@ more powerful syntax
affine1 = Affine(5, 5)
affine2 = Affine(5, 5)
-softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
-softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]
+softmax(affine1(x)) # [0.167952 0.186325 0.176683 0.238571 0.23047]
+softmax(affine2(x)) # [0.125361 0.246448 0.21966 0.124596 0.283935]
mymodel3 = Chain(
Affine(5, 5), σ,
Affine(5, 5), softmax)
- -You now know enough to take a look at the - -logistic regression - - example, if you haven't already. -
The only unfamiliar part is that we have to define all of the parameters of the LSTM upfront, which adds a few lines at the beginning. -
--Flux's very mathematical notation generalises well to handling more complex models. For example, - -this neural translation model with alignment - - can be fairly straightforwardly, and recognisably, translated from the paper into Flux code: -
-# A recurrent model which takes a token and returns a context-dependent
-# annotation.
-
-@net type Encoder
- forward
- backward
- token -> hcat(forward(token), backward(token))
-end
-
-Encoder(in::Integer, out::Integer) =
- Encoder(LSTM(in, out÷2), flip(LSTM(in, out÷2)))
-
-# A recurrent model which takes a sequence of annotations, attends, and returns
-# a predicted output token.
-
-@net type Decoder
- attend
- recur
- state; y; N
- function (anns)
- energies = map(ann -> exp(attend(hcat(state{-1}, ann))[1]), seq(anns, N))
- weights = energies./sum(energies)
- ctx = sum(map((α, ann) -> α .* ann, weights, anns))
- (_, state), y = recur((state{-1},y{-1}), ctx)
- y
- end
-end
-
-Decoder(in::Integer, out::Integer; N = 1) =
- Decoder(Affine(in+out, 1),
- unroll1(LSTM(in, out)),
- param(zeros(1, out)), param(zeros(1, out)), N)
-
-# The model
-
-Nalpha = 5 # The size of the input token vector
-Nphrase = 7 # The length of (padded) phrases
-Nhidden = 12 # The size of the hidden state
-
-encode = Encoder(Nalpha, Nhidden)
-decode = Chain(Decoder(Nhidden, Nhidden, N = Nphrase), Affine(Nhidden, Nalpha), softmax)
-
-model = Chain(
- unroll(encode, Nphrase, stateful = false),
- unroll(decode, Nphrase, stateful = false, seq = false))
- -Note that this model excercises some of the more advanced parts of the compiler and isn't stable for general use yet.
-- -... Calculating Tax Expenses ... - +We mentioned that we could factor out the repetition of defining affine layers with something like:
+function create_affine(in, out)
+ W = param(randn(out,in))
+ b = param(randn(out))
+ @net x -> W * x + b
+end
-So how does the
-Affine
- template work? We don't want to duplicate the code above whenever we need more than one affine layer:
+@net type
+ syntax provides a shortcut for this:
W₁, b₁ = randn(...)
-affine₁(x) = W₁*x + b₁
-W₂, b₂ = randn(...)
-affine₂(x) = W₂*x + b₂
-model = Chain(affine₁, affine₂)
- -Here's one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function: -
-type MyAffine
+@net type MyAffine
W
b
+ x -> x * W + b
end
-# Use the `MyAffine` layer as a model
-(l::MyAffine)(x) = l.W * x + l.b
-
# Convenience constructor
MyAffine(in::Integer, out::Integer) =
MyAffine(randn(out, in), randn(out))
@@ -203,40 +190,44 @@ model = Chain(MyAffine(5, 5), MyAffine(5, 5))
model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]
-This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the
-@net
- macro:
-
-@net type MyAffine
- W
- b
- x -> x * W + b
-end
-
-The function provided,
-x -> x * W + b
-, will be used when
-MyAffine
- is used as a model; it's just a shorter way of defining the
-(::MyAffine)(x)
- method above. (You may notice that
-W
- and
-x
- have swapped order in the model; this is due to the way batching works, which will be covered in more detail later on.)
-
-
-However,
-@net
- does not simply save us some keystrokes; it's the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.
-
-
-The above code is almost exactly how
+This is almost exactly how
Affine
- is defined in Flux itself! There's no difference between "library-level" and "user-level" models, so making your code reusable doesn't involve a lot of extra complexity. Moreover, much more complex models than
-Affine
- are equally simple to define.
+ is defined in Flux itself. Using
+@net type
+ gives us some extra conveniences:
+
+ -
+
+It creates default constructor
+MyAffine(::AbstractArray, ::AbstractArray)
+ which initialises
+param
+s for us;
+
+
+ -
+
+It subtypes
+Flux.Model
+ to explicitly mark this as a model;
+
+
+ -
+
+We can easily define custom constructors or instantiate
+Affine
+ with arbitrary weights of our choosing;
+
+
+ -
+
+We can dispatch on the
+Affine
+ type, for example to override how it gets converted to MXNet, or to hook into shape inference.
+
+
+
Models in templates
@@ -255,18 +246,6 @@ Models in templates
end
end
-Just as above, this is roughly equivalent to writing: -
-type TLP
- first
- second
-end
-
-function (self::TLP)(x)
- l1 = σ(self.first(x))
- l2 = softmax(self.second(l1))
-end
-
Clearly, the
first
and
@@ -289,51 +268,6 @@ You may recognise this as being equivalent to
Chain(
Affine(10, 20), σ
Affine(20, 15), softmax)
-
-given that it's just a sequence of calls. For simple networks
-Chain
- is completely fine, although the
-@net
- version is more powerful as we can (for example) reuse the output
-l1
- more than once.
-
-Affine
- has two array parameters,
-W
- and
-b
-. Just like any other Julia type, it's easy to instantiate an
-Affine
- layer with parameters of our choosing:
-
a = Affine(rand(10, 20), rand(20))
- -However, for convenience and to avoid errors, we'd probably rather specify the input and output dimension instead: -
-a = Affine(10, 20)
- -This is easy to implement using the usual Julia syntax for constructors: -
-Affine(in::Integer, out::Integer) =
- Affine(randn(in, out), randn(1, out))
-
-In practice, these constructors tend to take the parameter initialisation function as an argument so that it's more easily customisable, and use
-Flux.initn
- by default (which is equivalent to
-randn(...)/100
-). So
-Affine
-'s constructor really looks like this:
-
Affine(in::Integer, out::Integer; init = initn) =
- Affine(init(in, out), init(1, out))