diff --git a/latest/apis/backends.html b/latest/apis/backends.html
index 3e536d21..b0f43da5 100644
--- a/latest/apis/backends.html
+++ b/latest/apis/backends.html
@@ -150,7 +150,7 @@ Backends
-
+
diff --git a/latest/apis/batching.html b/latest/apis/batching.html
index 82657715..eb013471 100644
--- a/latest/apis/batching.html
+++ b/latest/apis/batching.html
@@ -155,7 +155,7 @@ Batching
-
+
@@ -197,7 +197,7 @@ Batches are represented the way we
think
- about them; as an list of data points. We can do all the usual array operations with them, including getting the first with
+ about them; as a list of data points. We can do all the usual array operations with them, including getting the first with
xs[1]
, iterating over them and so on. The trick is that under the hood, the data is batched into a single array:
nunroll = 50
nbatch = 50
-getseqs(chars, alphabet) = sequences((onehot(Float32, char, alphabet) for char in chars), nunroll)
-getbatches(chars, alphabet) = batches((getseqs(part, alphabet) for part in chunk(chars, nbatch))...)
+getseqs(chars, alphabet) =
+ sequences((onehot(Float32, char, alphabet) for char in chars), nunroll)
+getbatches(chars, alphabet) =
+ batches((getseqs(part, alphabet) for part in chunk(chars, nbatch))...)
Because we want the RNN to predict the next letter at each iteration, our target data is simply our input data offset by one. For example, if the input is "The quick brown fox", the target will be "he quick brown fox ". Each letter is one-hot encoded and sequences are batched together to create the training data.
-input = readstring("shakespeare_input.txt")
+input = readstring("shakespeare_input.txt");
alphabet = unique(input)
N = length(alphabet)
-Xs, Ys = getbatches(input, alphabet), getbatches(input[2:end], alphabet)
+# An iterator of (input, output) pairs
+train = zip(getbatches(input, alphabet), getbatches(input[2:end], alphabet))
+# We will evaluate the loss on a particular batch to monitor the training.
+eval = tobatch.(first(drop(train, 5)))
Creating the model and training it is straightforward:
@@ -196,7 +201,11 @@ Creating the model and training it is straightforward: m = tf(unroll(model, nunroll)) -@time Flux.train!(m, Xs, Ys, η = 0.1, epoch = 1) +# Call this to see how the model is doing +evalcb = () -> @show logloss(m(eval[1]), eval[2]) + +@time Flux.train!(m, train, η = 0.1, loss = logloss, cb = [evalcb]) +
Finally, we can sample the model. For sampling we remove the
softmax
@@ -204,9 +213,9 @@ Finally, we can sample the model. For sampling we remove the
function sample(model, n, temp = 1)
s = [rand(alphabet)]
- m = tf(unroll(model, 1))
- for i = 1:n
- push!(s, wsample(alphabet, softmax(m(Seq((onehot(Float32, s[end], alphabet),)))[1]./temp)))
+ m = unroll1(model)
+ for i = 1:n-1
+ push!(s, wsample(alphabet, softmax(m(unsqueeze(onehot(s[end], alphabet)))./temp)[1,:]))
end
return string(s...)
end
diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html
index d9b1ceaf..5fd4efc3 100644
--- a/latest/examples/logreg.html
+++ b/latest/examples/logreg.html
@@ -139,7 +139,7 @@ Simple MNIST
-
+
@@ -160,6 +160,7 @@ This walkthrough example will take you through writing a multi-layer perceptron
First, we load the data using the MNIST package:
using Flux, MNIST
+using Flux: accuracy
data = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]
train = data[1:50_000]
@@ -190,7 +191,7 @@ Otherwise, the format of the data is simple enough, it's just a list of tupl
Now we define our model, which will simply be a function from one to the other.
-m = Chain(
+m = @Chain(
Input(784),
Affine(128), relu,
Affine( 64), relu,
@@ -200,7 +201,7 @@ model = mxnet(m) # Convert to MXNet
We can try this out on our data already:
-julia> model(data[1][1])
+julia> model(tobatch(data[1][1]))
10-element Array{Float64,1}:
0.10614
0.0850447
@@ -209,7 +210,8 @@ We can try this out on our data already:
The model gives a probability of about 0.1 to each class – which is a way of saying, "I have no idea". This isn't too surprising as we haven't shown it any data yet. This is easy to fix:
-Flux.train!(model, train, test, η = 1e-4)
+Flux.train!(model, train, η = 1e-3,
+ cb = [()->@show accuracy(m, test)])
The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you'll see a progress meter, which you can hover over to see the remaining computation time.
@@ -231,7 +233,7 @@ Notice the class at 93%, suggesting our model is very confident about this image
julia> onecold(data[1][2], 0:9)
5
-julia> onecold(model(data[1][1]), 0:9)
+julia> onecold(model(tobatch(data[1][1])), 0:9)
5
Success!
diff --git a/latest/index.html b/latest/index.html
index ea560455..709b7ce7 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -147,7 +147,7 @@ Home
-
+
@@ -169,6 +169,16 @@ Flux aims to be an intuitive and powerful notation, close to the mathematics, th
So what's the catch? Flux is at an early "working prototype" stage; many things work but the API is still in a state of... well, it might change. If you're interested to find out what works, read on! +
++ +Note: + + If you're using Julia v0.5 please see + +this version + + of the docs instead.
Pkg.add("MXNet") # or "TensorFlow"
Pkg.test("Flux") # Make sure everything installed properly
+ + +Note: + + TensorFlow integration may not work properly on Julia v0.6 yet. +