build based on 854a1e1

2017-03-09 00:13:08 +00:00 · 2017-03-09 00:13:08 +00:00 · f05f0af66d
commit f05f0af66d
parent 9614b7651f
44 changed files with 4373 additions and 66 deletions
--- a/release-0.1/apis/backends.html
+++ b/release-0.1/apis/backends.html
@ -150,7 +150,7 @@ Backends
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/apis/backends.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/backends.md">
            <span class="fa">
 
            </span>
@ -185,7 +185,7 @@ This is easy to do. Just call either
 <code>tf</code>
 on a model to convert it to a model of that kind:
      </p>
-<pre><code class="language-julia">mxmodel = mxnet(model, (10, 1))
+<pre><code class="language-julia">mxmodel = mxnet(model)
 mxmodel(xs) #&gt; [0.0650, 0.0655, ...]
 # or
 tfmodel = tf(model)
--- a/release-0.1/apis/batching.html
+++ b/release-0.1/apis/batching.html
@ -155,7 +155,7 @@ Batching
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/apis/batching.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/batching.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/apis/storage.html
+++ b/release-0.1/apis/storage.html
@ -139,7 +139,7 @@ Storing Models
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/apis/storage.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/storage.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/contributing.html
+++ b/release-0.1/contributing.html
@ -136,7 +136,7 @@ Contributing &amp; Help
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/contributing.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/examples/char-rnn.html
+++ b/release-0.1/examples/char-rnn.html
@ -139,7 +139,7 @@ Char RNN
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/examples/char-rnn.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/examples/char-rnn.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/examples/logreg.html
+++ b/release-0.1/examples/logreg.html
@ -139,7 +139,7 @@ Logistic Regression
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/examples/logreg.md">
            <span class="fa">
 
            </span>
@ -196,7 +196,7 @@ Now we define our model, which will simply be a function from one to the other.
  Affine( 64), relu,
  Affine( 10), softmax)

-model = tf(model)</code></pre>
+model = tf(m)</code></pre>
      <p>
 We can try this out on our data already:
      </p>
--- a/release-0.1/index.html
+++ b/release-0.1/index.html
@ -147,7 +147,7 @@ Home
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/index.md">
            <span class="fa">
 
            </span>
@ -218,9 +218,8 @@ Installation
 ... Charging Ion Capacitors ...
        </em>
      </p>
-<pre><code class="language-julia">Pkg.clone(&quot;https://github.com/MikeInnes/DataFlow.jl&quot;)
-Pkg.clone(&quot;https://github.com/MikeInnes/Flux.jl&quot;)
-using Flux</code></pre>
+<pre><code class="language-julia">Pkg.update()
+Pkg.add(&quot;Flux.jl&quot;)</code></pre>
      <p>
 You&#39;ll also need a backend to run real training, if you don&#39;t have one already. Choose from 
        <a href="https://github.com/dmlc/MXNet.jl">
@ -232,7 +231,8 @@ TensorFlow
        </a>
 (MXNet is the recommended option if you&#39;re not sure):
      </p>
-<pre><code class="language-julia">Pkg.add(&quot;MXNet&quot;) # or &quot;TensorFlow&quot;</code></pre>
+<pre><code class="language-julia">Pkg.add(&quot;MXNet&quot;) # or &quot;TensorFlow&quot;
+Pkg.test(&quot;Flux&quot;) # Make sure everything installed properly</code></pre>
      <footer>
        <hr/>
        <a class="next" href="models/basics.html">
--- a/release-0.1/internals.html
+++ b/release-0.1/internals.html
@ -136,7 +136,7 @@ Internals
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/internals.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/models/basics.html
+++ b/release-0.1/models/basics.html
@ -155,7 +155,7 @@ Model Building Basics
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/basics.md">
            <span class="fa">
 
            </span>
@ -229,15 +229,15 @@ softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]</code><
      <p>
 We just created two separate 
 <code>Affine</code>
- layers, and each contains its own version of 
+ layers, and each contains its own (randomly initialised) version of 
 <code>W</code>
 and 
 <code>b</code>
 , leading to a different result when called with our data. It&#39;s easy to define templates like 
 <code>Affine</code>
 ourselves (see 
-        <a href="@ref">
-The Template
+        <a href="templates.html">
+templates
        </a>
 ), but Flux provides 
 <code>Affine</code>
--- a/release-0.1/models/debugging.html
+++ b/release-0.1/models/debugging.html
@ -139,7 +139,7 @@ Debugging
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/debugging.md">
            <span class="fa">
 
            </span>
@ -167,16 +167,18 @@ end

 model = TLP(Affine(10, 20), Affine(21, 15))

-mxmodel = mxnet(model, (10, 1))</code></pre>
+mxmodel = mxnet(model)
+
+mxmodel(rand(10))</code></pre>
      <p>
 Unfortunately, this model has a (fairly obvious) typo, which means that the code above won&#39;t run. Instead we get an error message:
      </p>
-<pre><code class="language-julia">InferShape Error in dot5: [20:37:39] src/operator/./matrix_op-inl.h:271:
-Check failed: (lshape[1]) == (rshape[0]) dot shape error: (15,21) X (20,1)
- in Flux.Affine at affine.jl:8
- in TLP at test.jl:6
- in mxnet(::TLP, ::Tuple{Int64,Int64}) at model.jl:40
- in mxnet(::TLP, ::Vararg{Any,N} where N) at backend.jl:20</code></pre>
+<pre><code class="language-julia">Error in operator dot2: [21:28:21] src/operator/tensor/./matrix_op-inl.h:460:
+Check failed: lshape[1] == rshape[0] (20 vs. 21) dot shape error: (1,20) X (21,15)
+Flux.Affine at affine.jl:8
+TLP at basic.jl:6
+(::Flux.MX.Model)(::Flux.Batch{Array{Float64,1},Array{Float64,2}}) at model.jl:105
+(::Flux.MX.Model)(::Array{Float64,1}) at model.jl:107</code></pre>
      <p>
 Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports 
        <em>
--- a/release-0.1/models/recurrent.html
+++ b/release-0.1/models/recurrent.html
@ -139,7 +139,7 @@ Recurrence
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/recurrent.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/models/templates.html
+++ b/release-0.1/models/templates.html
@ -155,7 +155,7 @@ Model Templates
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/templates.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/templates.md">
            <span class="fa">
 
            </span>
--- a/release-0.1/search_index.js
+++ b/release-0.1/search_index.js
@ -29,7 +29,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Home",
    "title": "Installation",
    "category": "section",
-    "text": "... Charging Ion Capacitors ...Pkg.clone(\"https://github.com/MikeInnes/DataFlow.jl\")\nPkg.clone(\"https://github.com/MikeInnes/Flux.jl\")\nusing FluxYou'll also need a backend to run real training, if you don't have one already. Choose from MXNet or TensorFlow (MXNet is the recommended option if you're not sure):Pkg.add(\"MXNet\") # or \"TensorFlow\""
+    "text": "... Charging Ion Capacitors ...Pkg.update()\nPkg.add(\"Flux.jl\")You'll also need a backend to run real training, if you don't have one already. Choose from MXNet or TensorFlow (MXNet is the recommended option if you're not sure):Pkg.add(\"MXNet\") # or \"TensorFlow\"\nPkg.test(\"Flux\") # Make sure everything installed properly"
 },

 {
@ -53,7 +53,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Building Basics",
    "title": "The Model",
    "category": "section",
-    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see The Template), but Flux provides Affine out of the box, so we'll use that for now."
+    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own (randomly initialised) version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see templates), but Flux provides Affine out of the box, so we'll use that for now."
 },

 {
@ -141,7 +141,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Debugging",
    "title": "Debugging Models",
    "category": "section",
-    "text": "Let's take our two-layer perceptron as an example again, running on MXNet:@net type TLP\n  first\n  second\n  function (x)\n    l1 = σ(first(x))\n    l2 = softmax(second(l1))\n  end\nend\n\nmodel = TLP(Affine(10, 20), Affine(21, 15))\n\nmxmodel = mxnet(model, (10, 1))Unfortunately, this model has a (fairly obvious) typo, which means that the code above won't run. Instead we get an error message:InferShape Error in dot5: [20:37:39] src/operator/./matrix_op-inl.h:271:\nCheck failed: (lshape[1]) == (rshape[0]) dot shape error: (15,21) X (20,1)\n in Flux.Affine at affine.jl:8\n in TLP at test.jl:6\n in mxnet(::TLP, ::Tuple{Int64,Int64}) at model.jl:40\n in mxnet(::TLP, ::Vararg{Any,N} where N) at backend.jl:20Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports even when no Julia code has been run, e.g. when running on a backend like MXNet. This enables us to pinpoint the source of the error very quickly even in a large model.In this case, we can immediately see that the error occurred within an Affine layer. There are two such layers, but this one was called from the second line of TLP, so it must be the second Affine layer we defined. The layer expected an input of length 21 but got 20 instead.Of course, often a stack trace isn't enough to figure out the source of an error. Another option is to simply step through the execution of the model using Gallium. While handy, however, stepping isn't always the best way to get a \"bird's eye view\" of the code. For that, Flux provides a macro called @shapes:julia> @shapes model(rand(5,10))\n\n# /Users/mike/test.jl, line 18:\ngull = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/.julia/v0.6/Flux/src/layers/affine.jl, line 8:\nlobster = gull * _::(21,15) + _::(1,15)\n# /Users/mike/test.jl, line 19:\nraven = softmax(lobster)This is a lot like Julia's own code_warntype; but instead of annotating expressions with types, we display their shapes. As a lowered form it has some quirks; input arguments are represented by Input()[N] and parameters by an underscore.This makes the problem fairly obvious. We tried to multiply the output of the first layer (5, 20) by a parameter (21, 15); the inner dimensions should have been equal.Notice that while the first Affine layer is displayed as-is, the second was inlined and we see a reference to where the W * x + b line was defined in Flux's source code. In this way Flux makes it easy to drill down into problem areas, without showing you the full graph of thousands of nodes at once.With the typo fixed, the output of @shapes looks as follows:# /Users/mike/test.jl, line 18:\nopossum = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/test.jl, line 19:\nwren = softmax(Affine(20, 15)(opossum)::(5,15))::(5,15)"
+    "text": "Let's take our two-layer perceptron as an example again, running on MXNet:@net type TLP\n  first\n  second\n  function (x)\n    l1 = σ(first(x))\n    l2 = softmax(second(l1))\n  end\nend\n\nmodel = TLP(Affine(10, 20), Affine(21, 15))\n\nmxmodel = mxnet(model)\n\nmxmodel(rand(10))Unfortunately, this model has a (fairly obvious) typo, which means that the code above won't run. Instead we get an error message:Error in operator dot2: [21:28:21] src/operator/tensor/./matrix_op-inl.h:460:\nCheck failed: lshape[1] == rshape[0] (20 vs. 21) dot shape error: (1,20) X (21,15)\nFlux.Affine at affine.jl:8\nTLP at basic.jl:6\n(::Flux.MX.Model)(::Flux.Batch{Array{Float64,1},Array{Float64,2}}) at model.jl:105\n(::Flux.MX.Model)(::Array{Float64,1}) at model.jl:107Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports even when no Julia code has been run, e.g. when running on a backend like MXNet. This enables us to pinpoint the source of the error very quickly even in a large model.In this case, we can immediately see that the error occurred within an Affine layer. There are two such layers, but this one was called from the second line of TLP, so it must be the second Affine layer we defined. The layer expected an input of length 21 but got 20 instead.Of course, often a stack trace isn't enough to figure out the source of an error. Another option is to simply step through the execution of the model using Gallium. While handy, however, stepping isn't always the best way to get a \"bird's eye view\" of the code. For that, Flux provides a macro called @shapes:julia> @shapes model(rand(5,10))\n\n# /Users/mike/test.jl, line 18:\ngull = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/.julia/v0.6/Flux/src/layers/affine.jl, line 8:\nlobster = gull * _::(21,15) + _::(1,15)\n# /Users/mike/test.jl, line 19:\nraven = softmax(lobster)This is a lot like Julia's own code_warntype; but instead of annotating expressions with types, we display their shapes. As a lowered form it has some quirks; input arguments are represented by Input()[N] and parameters by an underscore.This makes the problem fairly obvious. We tried to multiply the output of the first layer (5, 20) by a parameter (21, 15); the inner dimensions should have been equal.Notice that while the first Affine layer is displayed as-is, the second was inlined and we see a reference to where the W * x + b line was defined in Flux's source code. In this way Flux makes it easy to drill down into problem areas, without showing you the full graph of thousands of nodes at once.With the typo fixed, the output of @shapes looks as follows:# /Users/mike/test.jl, line 18:\nopossum = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/test.jl, line 19:\nwren = softmax(Affine(20, 15)(opossum)::(5,15))::(5,15)"
 },

 {
@ -205,7 +205,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Backends",
    "title": "Basic Usage",
    "category": "section",
-    "text": "model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)\nxs = rand(10)Currently, Flux's pure-Julia backend has no optimisations. This means that callingmodel(rand(10)) #> [0.0650, 0.0655, ...]directly won't have great performance. In order to run a computationally intensive training process, we rely on a backend like MXNet or TensorFlow.This is easy to do. Just call either mxnet or tf on a model to convert it to a model of that kind:mxmodel = mxnet(model, (10, 1))\nmxmodel(xs) #> [0.0650, 0.0655, ...]\n# or\ntfmodel = tf(model)\ntfmodel(xs) #> [0.0650, 0.0655, ...]These new models look and feel exactly like every other model in Flux, including returning the same result when you call them, and can be trained as usual using Flux.train!(). The difference is that the computation is being carried out by a backend, which will usually give a large speedup."
+    "text": "model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)\nxs = rand(10)Currently, Flux's pure-Julia backend has no optimisations. This means that callingmodel(rand(10)) #> [0.0650, 0.0655, ...]directly won't have great performance. In order to run a computationally intensive training process, we rely on a backend like MXNet or TensorFlow.This is easy to do. Just call either mxnet or tf on a model to convert it to a model of that kind:mxmodel = mxnet(model)\nmxmodel(xs) #> [0.0650, 0.0655, ...]\n# or\ntfmodel = tf(model)\ntfmodel(xs) #> [0.0650, 0.0655, ...]These new models look and feel exactly like every other model in Flux, including returning the same result when you call them, and can be trained as usual using Flux.train!(). The difference is that the computation is being carried out by a backend, which will usually give a large speedup."
 },

 {
@ -245,7 +245,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Logistic Regression",
    "title": "Logistic Regression with MNIST",
    "category": "section",
-    "text": "This walkthrough example will take you through writing a multi-layer perceptron that classifies MNIST digits with high accuracy.First, we load the data using the MNIST package:using Flux, MNIST\n\ndata = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]\ntrain = data[1:50_000]\ntest = data[50_001:60_000]The only Flux-specific function here is onehot, which takes a class label and turns it into a one-hot-encoded vector that we can use for training. For example:julia> onehot(:b, [:a, :b, :c])\n3-element Array{Int64,1}:\n 0\n 1\n 0Otherwise, the format of the data is simple enough, it's just a list of tuples from input to output. For example:julia> data[1]\n([0.0,0.0,0.0, … 0.0,0.0,0.0],[0,0,0,0,0,1,0,0,0,0])data[1][1] is a 28*28 == 784 length vector (mostly zeros due to the black background) and data[1][2] is its classification.Now we define our model, which will simply be a function from one to the other.m = Chain(\n  Input(784),\n  Affine(128), relu,\n  Affine( 64), relu,\n  Affine( 10), softmax)\n\nmodel = tf(model)We can try this out on our data already:julia> model(data[1][1])\n10-element Array{Float64,1}:\n 0.10614  \n 0.0850447\n 0.101474\n ...The model gives a probability of about 0.1 to each class – which is a way of saying, \"I have no idea\". This isn't too surprising as we haven't shown it any data yet. This is easy to fix:Flux.train!(model, train, test, η = 1e-4)The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you'll see a progress meter, which you can hover over to see the remaining computation time.Towards the end of the training process, Flux will have reported that the accuracy of the model is now about 90%. We can try it on our data again:10-element Array{Float32,1}:\n ...\n 5.11423f-7\n 0.9354     \n 3.1033f-5  \n 0.000127077\n ...Notice the class at 93%, suggesting our model is very confident about this image. We can use onecold to compare the true and predicted classes:julia> onecold(data[1][2], 0:9)\n5\n\njulia> onecold(model(data[1][1]), 0:9)\n5Success!"
+    "text": "This walkthrough example will take you through writing a multi-layer perceptron that classifies MNIST digits with high accuracy.First, we load the data using the MNIST package:using Flux, MNIST\n\ndata = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]\ntrain = data[1:50_000]\ntest = data[50_001:60_000]The only Flux-specific function here is onehot, which takes a class label and turns it into a one-hot-encoded vector that we can use for training. For example:julia> onehot(:b, [:a, :b, :c])\n3-element Array{Int64,1}:\n 0\n 1\n 0Otherwise, the format of the data is simple enough, it's just a list of tuples from input to output. For example:julia> data[1]\n([0.0,0.0,0.0, … 0.0,0.0,0.0],[0,0,0,0,0,1,0,0,0,0])data[1][1] is a 28*28 == 784 length vector (mostly zeros due to the black background) and data[1][2] is its classification.Now we define our model, which will simply be a function from one to the other.m = Chain(\n  Input(784),\n  Affine(128), relu,\n  Affine( 64), relu,\n  Affine( 10), softmax)\n\nmodel = tf(m)We can try this out on our data already:julia> model(data[1][1])\n10-element Array{Float64,1}:\n 0.10614  \n 0.0850447\n 0.101474\n ...The model gives a probability of about 0.1 to each class – which is a way of saying, \"I have no idea\". This isn't too surprising as we haven't shown it any data yet. This is easy to fix:Flux.train!(model, train, test, η = 1e-4)The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you'll see a progress meter, which you can hover over to see the remaining computation time.Towards the end of the training process, Flux will have reported that the accuracy of the model is now about 90%. We can try it on our data again:10-element Array{Float32,1}:\n ...\n 5.11423f-7\n 0.9354     \n 3.1033f-5  \n 0.000127077\n ...Notice the class at 93%, suggesting our model is very confident about this image. We can use onecold to compare the true and predicted classes:julia> onecold(data[1][2], 0:9)\n5\n\njulia> onecold(model(data[1][1]), 0:9)\n5Success!"
 },

 {
--- a/stable/apis/backends.html
+++ b/stable/apis/backends.html
@ -150,7 +150,7 @@ Backends
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/apis/backends.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/backends.md">
            <span class="fa">
 
            </span>
@ -185,7 +185,7 @@ This is easy to do. Just call either
 <code>tf</code>
 on a model to convert it to a model of that kind:
      </p>
-<pre><code class="language-julia">mxmodel = mxnet(model, (10, 1))
+<pre><code class="language-julia">mxmodel = mxnet(model)
 mxmodel(xs) #&gt; [0.0650, 0.0655, ...]
 # or
 tfmodel = tf(model)
--- a/stable/apis/batching.html
+++ b/stable/apis/batching.html
@ -155,7 +155,7 @@ Batching
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/apis/batching.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/batching.md">
            <span class="fa">
 
            </span>
--- a/stable/apis/storage.html
+++ b/stable/apis/storage.html
@ -139,7 +139,7 @@ Storing Models
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/apis/storage.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/storage.md">
            <span class="fa">
 
            </span>
--- a/stable/contributing.html
+++ b/stable/contributing.html
@ -136,7 +136,7 @@ Contributing &amp; Help
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/contributing.md">
            <span class="fa">
 
            </span>
--- a/stable/examples/char-rnn.html
+++ b/stable/examples/char-rnn.html
@ -139,7 +139,7 @@ Char RNN
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/examples/char-rnn.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/examples/char-rnn.md">
            <span class="fa">
 
            </span>
--- a/stable/examples/logreg.html
+++ b/stable/examples/logreg.html
@ -139,7 +139,7 @@ Logistic Regression
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/examples/logreg.md">
            <span class="fa">
 
            </span>
@ -196,7 +196,7 @@ Now we define our model, which will simply be a function from one to the other.
  Affine( 64), relu,
  Affine( 10), softmax)

-model = tf(model)</code></pre>
+model = tf(m)</code></pre>
      <p>
 We can try this out on our data already:
      </p>
--- a/stable/index.html
+++ b/stable/index.html
@ -147,7 +147,7 @@ Home
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/index.md">
            <span class="fa">
 
            </span>
@ -218,9 +218,8 @@ Installation
 ... Charging Ion Capacitors ...
        </em>
      </p>
-<pre><code class="language-julia">Pkg.clone(&quot;https://github.com/MikeInnes/DataFlow.jl&quot;)
-Pkg.clone(&quot;https://github.com/MikeInnes/Flux.jl&quot;)
-using Flux</code></pre>
+<pre><code class="language-julia">Pkg.update()
+Pkg.add(&quot;Flux.jl&quot;)</code></pre>
      <p>
 You&#39;ll also need a backend to run real training, if you don&#39;t have one already. Choose from 
        <a href="https://github.com/dmlc/MXNet.jl">
@ -232,7 +231,8 @@ TensorFlow
        </a>
 (MXNet is the recommended option if you&#39;re not sure):
      </p>
-<pre><code class="language-julia">Pkg.add(&quot;MXNet&quot;) # or &quot;TensorFlow&quot;</code></pre>
+<pre><code class="language-julia">Pkg.add(&quot;MXNet&quot;) # or &quot;TensorFlow&quot;
+Pkg.test(&quot;Flux&quot;) # Make sure everything installed properly</code></pre>
      <footer>
        <hr/>
        <a class="next" href="models/basics.html">
--- a/stable/internals.html
+++ b/stable/internals.html
@ -136,7 +136,7 @@ Internals
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/internals.md">
            <span class="fa">
 
            </span>
--- a/stable/models/basics.html
+++ b/stable/models/basics.html
@ -155,7 +155,7 @@ Model Building Basics
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/basics.md">
            <span class="fa">
 
            </span>
@ -229,15 +229,15 @@ softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]</code><
      <p>
 We just created two separate 
 <code>Affine</code>
- layers, and each contains its own version of 
+ layers, and each contains its own (randomly initialised) version of 
 <code>W</code>
 and 
 <code>b</code>
 , leading to a different result when called with our data. It&#39;s easy to define templates like 
 <code>Affine</code>
 ourselves (see 
-        <a href="@ref">
-The Template
+        <a href="templates.html">
+templates
        </a>
 ), but Flux provides 
 <code>Affine</code>
--- a/stable/models/debugging.html
+++ b/stable/models/debugging.html
@ -139,7 +139,7 @@ Debugging
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/debugging.md">
            <span class="fa">
 
            </span>
@ -167,16 +167,18 @@ end

 model = TLP(Affine(10, 20), Affine(21, 15))

-mxmodel = mxnet(model, (10, 1))</code></pre>
+mxmodel = mxnet(model)
+
+mxmodel(rand(10))</code></pre>
      <p>
 Unfortunately, this model has a (fairly obvious) typo, which means that the code above won&#39;t run. Instead we get an error message:
      </p>
-<pre><code class="language-julia">InferShape Error in dot5: [20:37:39] src/operator/./matrix_op-inl.h:271:
-Check failed: (lshape[1]) == (rshape[0]) dot shape error: (15,21) X (20,1)
- in Flux.Affine at affine.jl:8
- in TLP at test.jl:6
- in mxnet(::TLP, ::Tuple{Int64,Int64}) at model.jl:40
- in mxnet(::TLP, ::Vararg{Any,N} where N) at backend.jl:20</code></pre>
+<pre><code class="language-julia">Error in operator dot2: [21:28:21] src/operator/tensor/./matrix_op-inl.h:460:
+Check failed: lshape[1] == rshape[0] (20 vs. 21) dot shape error: (1,20) X (21,15)
+Flux.Affine at affine.jl:8
+TLP at basic.jl:6
+(::Flux.MX.Model)(::Flux.Batch{Array{Float64,1},Array{Float64,2}}) at model.jl:105
+(::Flux.MX.Model)(::Array{Float64,1}) at model.jl:107</code></pre>
      <p>
 Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports 
        <em>
--- a/stable/models/recurrent.html
+++ b/stable/models/recurrent.html
@ -139,7 +139,7 @@ Recurrence
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/recurrent.md">
            <span class="fa">
 
            </span>
--- a/stable/models/templates.html
+++ b/stable/models/templates.html
@ -155,7 +155,7 @@ Model Templates
              </a>
            </li>
          </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/1c317eeefec910170cc72a4fe09ac54e187b3624/docs/src/models/templates.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/templates.md">
            <span class="fa">
 
            </span>
--- a/stable/search_index.js
+++ b/stable/search_index.js
@ -29,7 +29,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Home",
    "title": "Installation",
    "category": "section",
-    "text": "... Charging Ion Capacitors ...Pkg.clone(\"https://github.com/MikeInnes/DataFlow.jl\")\nPkg.clone(\"https://github.com/MikeInnes/Flux.jl\")\nusing FluxYou'll also need a backend to run real training, if you don't have one already. Choose from MXNet or TensorFlow (MXNet is the recommended option if you're not sure):Pkg.add(\"MXNet\") # or \"TensorFlow\""
+    "text": "... Charging Ion Capacitors ...Pkg.update()\nPkg.add(\"Flux.jl\")You'll also need a backend to run real training, if you don't have one already. Choose from MXNet or TensorFlow (MXNet is the recommended option if you're not sure):Pkg.add(\"MXNet\") # or \"TensorFlow\"\nPkg.test(\"Flux\") # Make sure everything installed properly"
 },

 {
@ -53,7 +53,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Model Building Basics",
    "title": "The Model",
    "category": "section",
-    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see The Template), but Flux provides Affine out of the box, so we'll use that for now."
+    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own (randomly initialised) version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see templates), but Flux provides Affine out of the box, so we'll use that for now."
 },

 {
@ -141,7 +141,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Debugging",
    "title": "Debugging Models",
    "category": "section",
-    "text": "Let's take our two-layer perceptron as an example again, running on MXNet:@net type TLP\n  first\n  second\n  function (x)\n    l1 = σ(first(x))\n    l2 = softmax(second(l1))\n  end\nend\n\nmodel = TLP(Affine(10, 20), Affine(21, 15))\n\nmxmodel = mxnet(model, (10, 1))Unfortunately, this model has a (fairly obvious) typo, which means that the code above won't run. Instead we get an error message:InferShape Error in dot5: [20:37:39] src/operator/./matrix_op-inl.h:271:\nCheck failed: (lshape[1]) == (rshape[0]) dot shape error: (15,21) X (20,1)\n in Flux.Affine at affine.jl:8\n in TLP at test.jl:6\n in mxnet(::TLP, ::Tuple{Int64,Int64}) at model.jl:40\n in mxnet(::TLP, ::Vararg{Any,N} where N) at backend.jl:20Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports even when no Julia code has been run, e.g. when running on a backend like MXNet. This enables us to pinpoint the source of the error very quickly even in a large model.In this case, we can immediately see that the error occurred within an Affine layer. There are two such layers, but this one was called from the second line of TLP, so it must be the second Affine layer we defined. The layer expected an input of length 21 but got 20 instead.Of course, often a stack trace isn't enough to figure out the source of an error. Another option is to simply step through the execution of the model using Gallium. While handy, however, stepping isn't always the best way to get a \"bird's eye view\" of the code. For that, Flux provides a macro called @shapes:julia> @shapes model(rand(5,10))\n\n# /Users/mike/test.jl, line 18:\ngull = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/.julia/v0.6/Flux/src/layers/affine.jl, line 8:\nlobster = gull * _::(21,15) + _::(1,15)\n# /Users/mike/test.jl, line 19:\nraven = softmax(lobster)This is a lot like Julia's own code_warntype; but instead of annotating expressions with types, we display their shapes. As a lowered form it has some quirks; input arguments are represented by Input()[N] and parameters by an underscore.This makes the problem fairly obvious. We tried to multiply the output of the first layer (5, 20) by a parameter (21, 15); the inner dimensions should have been equal.Notice that while the first Affine layer is displayed as-is, the second was inlined and we see a reference to where the W * x + b line was defined in Flux's source code. In this way Flux makes it easy to drill down into problem areas, without showing you the full graph of thousands of nodes at once.With the typo fixed, the output of @shapes looks as follows:# /Users/mike/test.jl, line 18:\nopossum = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/test.jl, line 19:\nwren = softmax(Affine(20, 15)(opossum)::(5,15))::(5,15)"
+    "text": "Let's take our two-layer perceptron as an example again, running on MXNet:@net type TLP\n  first\n  second\n  function (x)\n    l1 = σ(first(x))\n    l2 = softmax(second(l1))\n  end\nend\n\nmodel = TLP(Affine(10, 20), Affine(21, 15))\n\nmxmodel = mxnet(model)\n\nmxmodel(rand(10))Unfortunately, this model has a (fairly obvious) typo, which means that the code above won't run. Instead we get an error message:Error in operator dot2: [21:28:21] src/operator/tensor/./matrix_op-inl.h:460:\nCheck failed: lshape[1] == rshape[0] (20 vs. 21) dot shape error: (1,20) X (21,15)\nFlux.Affine at affine.jl:8\nTLP at basic.jl:6\n(::Flux.MX.Model)(::Flux.Batch{Array{Float64,1},Array{Float64,2}}) at model.jl:105\n(::Flux.MX.Model)(::Array{Float64,1}) at model.jl:107Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports even when no Julia code has been run, e.g. when running on a backend like MXNet. This enables us to pinpoint the source of the error very quickly even in a large model.In this case, we can immediately see that the error occurred within an Affine layer. There are two such layers, but this one was called from the second line of TLP, so it must be the second Affine layer we defined. The layer expected an input of length 21 but got 20 instead.Of course, often a stack trace isn't enough to figure out the source of an error. Another option is to simply step through the execution of the model using Gallium. While handy, however, stepping isn't always the best way to get a \"bird's eye view\" of the code. For that, Flux provides a macro called @shapes:julia> @shapes model(rand(5,10))\n\n# /Users/mike/test.jl, line 18:\ngull = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/.julia/v0.6/Flux/src/layers/affine.jl, line 8:\nlobster = gull * _::(21,15) + _::(1,15)\n# /Users/mike/test.jl, line 19:\nraven = softmax(lobster)This is a lot like Julia's own code_warntype; but instead of annotating expressions with types, we display their shapes. As a lowered form it has some quirks; input arguments are represented by Input()[N] and parameters by an underscore.This makes the problem fairly obvious. We tried to multiply the output of the first layer (5, 20) by a parameter (21, 15); the inner dimensions should have been equal.Notice that while the first Affine layer is displayed as-is, the second was inlined and we see a reference to where the W * x + b line was defined in Flux's source code. In this way Flux makes it easy to drill down into problem areas, without showing you the full graph of thousands of nodes at once.With the typo fixed, the output of @shapes looks as follows:# /Users/mike/test.jl, line 18:\nopossum = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/test.jl, line 19:\nwren = softmax(Affine(20, 15)(opossum)::(5,15))::(5,15)"
 },

 {
@ -205,7 +205,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Backends",
    "title": "Basic Usage",
    "category": "section",
-    "text": "model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)\nxs = rand(10)Currently, Flux's pure-Julia backend has no optimisations. This means that callingmodel(rand(10)) #> [0.0650, 0.0655, ...]directly won't have great performance. In order to run a computationally intensive training process, we rely on a backend like MXNet or TensorFlow.This is easy to do. Just call either mxnet or tf on a model to convert it to a model of that kind:mxmodel = mxnet(model, (10, 1))\nmxmodel(xs) #> [0.0650, 0.0655, ...]\n# or\ntfmodel = tf(model)\ntfmodel(xs) #> [0.0650, 0.0655, ...]These new models look and feel exactly like every other model in Flux, including returning the same result when you call them, and can be trained as usual using Flux.train!(). The difference is that the computation is being carried out by a backend, which will usually give a large speedup."
+    "text": "model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)\nxs = rand(10)Currently, Flux's pure-Julia backend has no optimisations. This means that callingmodel(rand(10)) #> [0.0650, 0.0655, ...]directly won't have great performance. In order to run a computationally intensive training process, we rely on a backend like MXNet or TensorFlow.This is easy to do. Just call either mxnet or tf on a model to convert it to a model of that kind:mxmodel = mxnet(model)\nmxmodel(xs) #> [0.0650, 0.0655, ...]\n# or\ntfmodel = tf(model)\ntfmodel(xs) #> [0.0650, 0.0655, ...]These new models look and feel exactly like every other model in Flux, including returning the same result when you call them, and can be trained as usual using Flux.train!(). The difference is that the computation is being carried out by a backend, which will usually give a large speedup."
 },

 {
@ -245,7 +245,7 @@ var documenterSearchIndex = {"docs": [
    "page": "Logistic Regression",
    "title": "Logistic Regression with MNIST",
    "category": "section",
-    "text": "This walkthrough example will take you through writing a multi-layer perceptron that classifies MNIST digits with high accuracy.First, we load the data using the MNIST package:using Flux, MNIST\n\ndata = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]\ntrain = data[1:50_000]\ntest = data[50_001:60_000]The only Flux-specific function here is onehot, which takes a class label and turns it into a one-hot-encoded vector that we can use for training. For example:julia> onehot(:b, [:a, :b, :c])\n3-element Array{Int64,1}:\n 0\n 1\n 0Otherwise, the format of the data is simple enough, it's just a list of tuples from input to output. For example:julia> data[1]\n([0.0,0.0,0.0, … 0.0,0.0,0.0],[0,0,0,0,0,1,0,0,0,0])data[1][1] is a 28*28 == 784 length vector (mostly zeros due to the black background) and data[1][2] is its classification.Now we define our model, which will simply be a function from one to the other.m = Chain(\n  Input(784),\n  Affine(128), relu,\n  Affine( 64), relu,\n  Affine( 10), softmax)\n\nmodel = tf(model)We can try this out on our data already:julia> model(data[1][1])\n10-element Array{Float64,1}:\n 0.10614  \n 0.0850447\n 0.101474\n ...The model gives a probability of about 0.1 to each class – which is a way of saying, \"I have no idea\". This isn't too surprising as we haven't shown it any data yet. This is easy to fix:Flux.train!(model, train, test, η = 1e-4)The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you'll see a progress meter, which you can hover over to see the remaining computation time.Towards the end of the training process, Flux will have reported that the accuracy of the model is now about 90%. We can try it on our data again:10-element Array{Float32,1}:\n ...\n 5.11423f-7\n 0.9354     \n 3.1033f-5  \n 0.000127077\n ...Notice the class at 93%, suggesting our model is very confident about this image. We can use onecold to compare the true and predicted classes:julia> onecold(data[1][2], 0:9)\n5\n\njulia> onecold(model(data[1][1]), 0:9)\n5Success!"
+    "text": "This walkthrough example will take you through writing a multi-layer perceptron that classifies MNIST digits with high accuracy.First, we load the data using the MNIST package:using Flux, MNIST\n\ndata = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]\ntrain = data[1:50_000]\ntest = data[50_001:60_000]The only Flux-specific function here is onehot, which takes a class label and turns it into a one-hot-encoded vector that we can use for training. For example:julia> onehot(:b, [:a, :b, :c])\n3-element Array{Int64,1}:\n 0\n 1\n 0Otherwise, the format of the data is simple enough, it's just a list of tuples from input to output. For example:julia> data[1]\n([0.0,0.0,0.0, … 0.0,0.0,0.0],[0,0,0,0,0,1,0,0,0,0])data[1][1] is a 28*28 == 784 length vector (mostly zeros due to the black background) and data[1][2] is its classification.Now we define our model, which will simply be a function from one to the other.m = Chain(\n  Input(784),\n  Affine(128), relu,\n  Affine( 64), relu,\n  Affine( 10), softmax)\n\nmodel = tf(m)We can try this out on our data already:julia> model(data[1][1])\n10-element Array{Float64,1}:\n 0.10614  \n 0.0850447\n 0.101474\n ...The model gives a probability of about 0.1 to each class – which is a way of saying, \"I have no idea\". This isn't too surprising as we haven't shown it any data yet. This is easy to fix:Flux.train!(model, train, test, η = 1e-4)The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you'll see a progress meter, which you can hover over to see the remaining computation time.Towards the end of the training process, Flux will have reported that the accuracy of the model is now about 90%. We can try it on our data again:10-element Array{Float32,1}:\n ...\n 5.11423f-7\n 0.9354     \n 3.1033f-5  \n 0.000127077\n ...Notice the class at 93%, suggesting our model is very confident about this image. We can use onecold to compare the true and predicted classes:julia> onecold(data[1][2], 0:9)\n5\n\njulia> onecold(model(data[1][1]), 0:9)\n5Success!"
 },

 {
--- a/v0.1.1/apis/backends.html
+++ b/v0.1.1/apis/backends.html
@ -0,0 +1,253 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Backends · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="batching.html">
+Batching
+              </a>
+            </li>
+            <li class="current">
+              <a class="toctext" href="backends.html">
+Backends
+              </a>
+              <ul class="internal">
+                <li>
+                  <a class="toctext" href="#Basic-Usage-1">
+Basic Usage
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Native-Integration-1">
+Native Integration
+                  </a>
+                </li>
+              </ul>
+            </li>
+            <li>
+              <a class="toctext" href="storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Other APIs
+            </li>
+            <li>
+              <a href="backends.html">
+Backends
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/backends.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Backends-1" href="#Backends-1">
+Backends
+        </a>
+      </h1>
+      <h2>
+        <a class="nav-anchor" id="Basic-Usage-1" href="#Basic-Usage-1">
+Basic Usage
+        </a>
+      </h2>
+<pre><code class="language-julia">model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)
+xs = rand(10)</code></pre>
+      <p>
+Currently, Flux&#39;s pure-Julia backend has no optimisations. This means that calling
+      </p>
+<pre><code class="language-julia">model(rand(10)) #&gt; [0.0650, 0.0655, ...]</code></pre>
+      <p>
+directly won&#39;t have great performance. In order to run a computationally intensive training process, we rely on a backend like MXNet or TensorFlow.
+      </p>
+      <p>
+This is easy to do. Just call either 
+<code>mxnet</code>
+ or 
+<code>tf</code>
+ on a model to convert it to a model of that kind:
+      </p>
+<pre><code class="language-julia">mxmodel = mxnet(model)
+mxmodel(xs) #&gt; [0.0650, 0.0655, ...]
+# or
+tfmodel = tf(model)
+tfmodel(xs) #&gt; [0.0650, 0.0655, ...]</code></pre>
+      <p>
+These new models look and feel exactly like every other model in Flux, including returning the same result when you call them, and can be trained as usual using 
+<code>Flux.train!()</code>
+. The difference is that the computation is being carried out by a backend, which will usually give a large speedup.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Native-Integration-1" href="#Native-Integration-1">
+Native Integration
+        </a>
+      </h2>
+      <p>
+Flux aims to provide high-level APIs that work well across backends, but in some cases you may want to take advantage of features specific to a given backend. In these cases it&#39;s easy to &quot;drop down&quot; and use the backend&#39;s API directly, where appropriate. For example:
+      </p>
+<pre><code class="language-julia">using MXNet
+Flux.loadmx()
+
+mxmodel = mx.FeedForward(model)</code></pre>
+      <p>
+This returns a standard 
+<code>mx.FeedForward</code>
+ instance, just like you might have created using MXNet&#39;s usual API. You can then use this with MXNet&#39;s data provider implementation, custom optimisers, or distributed training processes.
+      </p>
+      <p>
+Same goes for TensorFlow, where it&#39;s easy to create a 
+<code>Tensor</code>
+ object:
+      </p>
+<pre><code class="language-julia">using TensorFlow
+Flux.loadtf()
+
+x  = placeholder(Float32)
+y = Tensor(model, x)</code></pre>
+      <p>
+This makes makes it easy to take advantage of Flux&#39;s model description and debugging tools while also getting the benefit of the work put into these backends. You can check out how this looks with the integration examples 
+        <a href="https://github.com/MikeInnes/Flux.jl/tree/master/examples">
+here
+        </a>
+.
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="batching.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Batching
+          </span>
+        </a>
+        <a class="next" href="storage.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Storing Models
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/apis/batching.html
+++ b/v0.1.1/apis/batching.html
@ -0,0 +1,392 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Batching · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li class="current">
+              <a class="toctext" href="batching.html">
+Batching
+              </a>
+              <ul class="internal">
+                <li>
+                  <a class="toctext" href="#Basics-1">
+Basics
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Sequences-and-Nesting-1">
+Sequences and Nesting
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Future-Work-1">
+Future Work
+                  </a>
+                </li>
+              </ul>
+            </li>
+            <li>
+              <a class="toctext" href="backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Other APIs
+            </li>
+            <li>
+              <a href="batching.html">
+Batching
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/batching.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Batching-1" href="#Batching-1">
+Batching
+        </a>
+      </h1>
+      <h2>
+        <a class="nav-anchor" id="Basics-1" href="#Basics-1">
+Basics
+        </a>
+      </h2>
+      <p>
+Existing machine learning frameworks and libraries represent batching, and other properties of data, only implicitly. Your machine learning data is a large 
+<code>N</code>
+-dimensional array, which may have a shape like:
+      </p>
+<pre><code class="language-julia">100 × 50 × 256 × 256</code></pre>
+      <p>
+Typically, this might represent that you have (say) a batch of 100 samples, where each sample is a 50-long sequence of 256×256 images. This is great for performance, but array operations often become much more cumbersome as a result. Especially if you manipulate dimensions at runtime as an optimisation, debugging models can become extremely fiddly, with a proliferation of 
+<code>X × Y × Z</code>
+ arrays and no information about where they came from.
+      </p>
+      <p>
+Flux introduces a new approach where the batch dimension is represented explicitly as part of the data. For example:
+      </p>
+<pre><code class="language-julia">julia&gt; xs = Batch([[1,2,3], [4,5,6]])
+2-element Batch of Vector{Int64}:
+ [1,2,3]
+ [4,5,6]</code></pre>
+      <p>
+Batches are represented the way we 
+        <em>
+think
+        </em>
+ about them; as an list of data points. We can do all the usual array operations with them, including getting the first with 
+<code>xs[1]</code>
+, iterating over them and so on. The trick is that under the hood, the data is batched into a single array:
+      </p>
+<pre><code class="language-julia">julia&gt; rawbatch(xs)
+2×3 Array{Int64,2}:
+ 1  2  3
+ 4  5  6</code></pre>
+      <p>
+When we put a 
+<code>Batch</code>
+ object into a model, the model is ultimately working with a single array, which means there&#39;s no performance overhead and we get the full benefit of standard batching.
+      </p>
+      <p>
+Turning a set of vectors into a matrix is fairly easy anyway, so what&#39;s the big deal? Well, it gets more interesting as we start working with more complex data. Say we were working with 4×4 images:
+      </p>
+<pre><code class="language-julia">julia&gt; xs = Batch([[1 2; 3 4], [5 6; 7 8]])
+2-element Flux.Batch of Array{Int64,2}:
+ [1 2; 3 4]
+ [5 6; 7 8]</code></pre>
+      <p>
+The raw batch array is much messier, and harder to recognise:
+      </p>
+<pre><code class="language-julia">julia&gt; rawbatch(xs)
+2×2×2 Array{Int64,3}:
+[:, :, 1] =
+ 1  3
+ 5  7
+
+[:, :, 2] =
+ 2  4
+ 6  8</code></pre>
+      <p>
+Furthermore, because the batches acts like a list of arrays, we can use simple and familiar operations on it:
+      </p>
+<pre><code class="language-julia">julia&gt; map(flatten, xs)
+2-element Array{Array{Int64,1},1}:
+ [1,3,2,4]
+ [5,7,6,8]</code></pre>
+      <p>
+<code>flatten</code>
+ is simple enough over a single data point, but flattening a batched data set is more complex and you end up needing arcane array operations like 
+<code>mapslices</code>
+. A 
+<code>Batch</code>
+ can just handle this for you for free, and more importantly it ensures that your operations are 
+        <em>
+correct
+        </em>
+ – that you haven&#39;t mixed up your batch and data dimensions, or used the wrong array op, and so on.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Sequences-and-Nesting-1" href="#Sequences-and-Nesting-1">
+Sequences and Nesting
+        </a>
+      </h2>
+      <p>
+As well as 
+<code>Batch</code>
+, there&#39;s a structure called 
+<code>Seq</code>
+ which behaves very similarly. Let&#39;s say we have two one-hot encoded DNA sequences:
+      </p>
+<pre><code class="language-julia">julia&gt; x1 = Seq([[0,1,0,0], [1,0,0,0], [0,0,0,1]]) # [A, T, C, G]
+julia&gt; x2 = Seq([[0,0,1,0], [0,0,0,1], [0,0,1,0]])
+
+julia&gt; rawbatch(x1)
+3×4 Array{Int64,2}:
+ 0  1  0  0
+ 1  0  0  0
+ 0  0  0  1</code></pre>
+      <p>
+This is identical to 
+<code>Batch</code>
+ so far; but where it gets interesting is that you can actually nest these types:
+      </p>
+<pre><code class="language-julia">julia&gt; xs = Batch([x1, x2])
+2-element Batch of Seq of Vector{Int64}:
+ [[0,1,0,0],[1,0,0,0],[0,0,0,1]]
+ [[0,0,1,0],[0,0,0,1],[0,0,1,0]]</code></pre>
+      <p>
+Again, this represents itself intuitively as a list-of-lists-of-lists, but 
+<code>rawbatch</code>
+ shows that the real underlying value is an 
+<code>Array{Int64,3}</code>
+ of shape 
+<code>2×3×4</code>
+.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Future-Work-1" href="#Future-Work-1">
+Future Work
+        </a>
+      </h2>
+      <p>
+The design of batching is still a fairly early work in progress, though it&#39;s used in a few places in the system. For example, all Flux models expect to be given 
+<code>Batch</code>
+ objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it&#39;s convenient to call a model with a single data point like 
+<code>f([1,2,3])</code>
+.
+      </p>
+      <p>
+Right now, the 
+<code>Batch</code>
+ or 
+<code>Seq</code>
+ types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:
+      </p>
+      <ul>
+        <li>
+          <p>
+Code can be written in a batch-agnostic way or be generic across batching strategies.
+          </p>
+        </li>
+        <li>
+          <p>
+Batching and optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.
+          </p>
+        </li>
+        <li>
+          <p>
+This also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations.
+          </p>
+        </li>
+      </ul>
+      <p>
+Here&#39;s a more detailed illustration of how it might look for code to be &quot;generic across batching&quot;. Take for example a weight matrix 
+<code>W</code>
+ times a vector 
+<code>x</code>
+, as used in a logistic regression or a simple neural network:
+      </p>
+<pre><code class="language-julia">   W    *   x  =&gt;   y
+(10×28) * (28) =&gt; (10)</code></pre>
+      <p>
+If we want to work with a batch of 50 
+<code>x</code>
+s, one option is to stack the data into a matrix of size 
+<code>28 × 50</code>
+.
+      </p>
+<pre><code class="language-julia">   W    *    x    =&gt;    y
+(10×28) * (28×50) =&gt; (10×50)</code></pre>
+      <p>
+This works, but we may find that it&#39;s slow or doesn&#39;t fit well with the rest of the model, which batches on the first dimension. For that reason we may instead want to put the data in a 
+<code>50 × 28</code>
+ matrix and alter the code as follows:
+      </p>
+<pre><code class="language-julia">   x    *    W&#39;   =&gt;    y
+(50×28) * (28×10) =&gt; (50×10)</code></pre>
+      <p>
+to make the shapes work out. This code change is not ideal; in more complex cases it can become fiddly and error-prone, and it means that the code is less reusable, tied to a particular implementation strategy.
+      </p>
+      <p>
+There&#39;s an alternative. We keep the same code, but represent the batched 
+<code>x</code>
+s as either a 
+<code>Batch{Vector,1}</code>
+ or a 
+<code>Batch{Vector,2}</code>
+, depending on how the data is stacked. Then we can simply overload 
+<code>*</code>
+ as follows:
+      </p>
+<pre><code class="language-julia">*(W::Matrix, x::Batch{Vector,1}) = x * W&#39;
+*(W::Matrix, x::Batch{Vector,2}) = W * x</code></pre>
+      <p>
+This means that we can always write 
+<code>W*x</code>
+, and the code is reusable in a larger network regardless of the overall batching approach. Moreover, Julia&#39;s type system ensures there&#39;s no runtime cost to doing this, and we can compile the code appropriately for backends like TensorFlow as well.
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="../models/debugging.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Debugging
+          </span>
+        </a>
+        <a class="next" href="backends.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Backends
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/apis/storage.html
+++ b/v0.1.1/apis/storage.html
@ -0,0 +1,207 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Storing Models · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="backends.html">
+Backends
+              </a>
+            </li>
+            <li class="current">
+              <a class="toctext" href="storage.html">
+Storing Models
+              </a>
+              <ul class="internal"></ul>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Other APIs
+            </li>
+            <li>
+              <a href="storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/apis/storage.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Loading-and-Saving-Models-1" href="#Loading-and-Saving-Models-1">
+Loading and Saving Models
+        </a>
+      </h1>
+<pre><code class="language-julia">model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)</code></pre>
+      <p>
+Since models are just simple Julia data structures, it&#39;s very easy to save and load them using any of Julia&#39;s existing serialisation formats. For example, using Julia&#39;s built-in 
+<code>serialize</code>
+:
+      </p>
+<pre><code class="language-julia">open(io -&gt; serialize(io, model), &quot;model.jls&quot;, &quot;w&quot;)
+open(io -&gt; deserialize(io), &quot;model.jls&quot;)</code></pre>
+      <p>
+One issue with 
+<code>serialize</code>
+ is that it doesn&#39;t promise compatibility between major Julia versions. For longer-term storage it&#39;s good to use a package like 
+        <a href="https://github.com/JuliaIO/JLD.jl">
+JLD
+        </a>
+.
+      </p>
+<pre><code class="language-julia">using JLD
+@save &quot;model.jld&quot; model
+@load &quot;model.jld&quot;</code></pre>
+      <p>
+However, JLD will break for some models as functions are not supported on 0.5+. You can resolve that by checking out 
+        <a href="https://github.com/JuliaIO/JLD.jl/pull/137">
+this branch
+        </a>
+.
+      </p>
+      <p>
+Right now this is the only storage format Flux supports. In future Flux will support loading and saving other model formats (on an as-needed basis).
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="backends.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Backends
+          </span>
+        </a>
+        <a class="next" href="../examples/logreg.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Logistic Regression
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/assets/documenter.css
+++ b/v0.1.1/assets/documenter.css
@ -0,0 +1,403 @@
+/*
+ * The default CSS style for Documenter.jl generated sites
+ *
+ * Heavily inspired by the Julia Sphinx theme
+ *     https://github.com/JuliaLang/JuliaDoc
+ * which extends the sphinx_rtd_theme
+ *     https://github.com/snide/sphinx_rtd_theme
+ *
+ * Part of Documenter.jl
+ *     https://github.com/JuliaDocs/Documenter.jl
+ *
+ * License: MIT
+ */
+
+/* fonts */
+body, input {
+  font-family: 'Lato', 'Helvetica Neue', Arial, sans-serif;
+  font-size: 16px;
+  color: #222;
+  text-rendering: optimizeLegibility;
+}
+
+pre, code {
+  font-family: 'Ubuntu Mono', Monaco, courier, monospace;
+}
+
+a {
+    color: #2980b9;
+    text-decoration: none;
+}
+
+a:hover {
+    color: #3091d1;
+}
+
+a:visited {
+    color: #9b59b6;
+}
+
+body {
+    line-height: 1.5;
+}
+
+h1 { font-size: 1.75em; }
+h2 { font-size: 1.50em; }
+h3 { font-size: 1.25em; }
+h4 { font-size: 1.15em; }
+h5 { font-size: 1.10em; }
+h6 { font-size: 1em; }
+
+h4, h5, h6 {
+    margin: 1em 0;
+}
+
+img {
+    max-width: 100%;
+}
+
+table {
+    border-collapse: collapse;
+    margin: 1em 0;
+}
+
+th, td {
+    border: 1px solid #e1e4e5;
+    padding: 0.5em 1em;
+}
+
+th {
+    border-bottom-width: 2px;
+}
+
+tr:nth-child(even) {
+    background-color: #f3f6f6;
+}
+
+hr {
+    border: 0;
+    border-top: 1px solid #e5e5e5;
+}
+
+/* Inline code and code blocks */
+
+code {
+    padding: 0.1em;
+    background-color: rgba(0,0,0,.04);
+    border-radius: 3px;
+}
+
+pre {
+    background-color: #f5f5f5;
+    border: 1px solid #dddddd;
+    border-radius: 3px;
+    padding: 0.5em;
+    overflow: auto;
+}
+
+pre code {
+    padding: 0;
+    background-color: initial;
+}
+
+/* Headers in admonitions and docstrings */
+.admonition h1,
+article section.docstring h1 {
+    font-size: 1.25em;
+}
+
+.admonition h2,
+article section.docstring h2 {
+    font-size: 1.10em;
+}
+
+.admonition h3,
+.admonition h4,
+.admonition h5,
+.admonition h6,
+article section.docstring h3,
+article section.docstring h4,
+article section.docstring h5,
+article section.docstring h6 {
+    font-size: 1em;
+}
+
+/* Navigation */
+nav.toc {
+    position: fixed;
+    top: 0;
+    left: 0;
+    bottom: 0;
+    width: 20em;
+    overflow-y: auto;
+    padding: 1em 0;
+    background-color: #fcfcfc;
+    box-shadow: inset -14px 0px 5px -12px rgb(210,210,210);
+}
+
+nav.toc .logo {
+    margin: 0 auto;
+    display: block;
+    max-height: 6em;
+    max-width: 18em;
+}
+
+nav.toc h1 {
+    text-align: center;
+}
+
+nav.toc input {
+    display: block;
+    height: 2em;
+    width: 90%;
+    width: calc(100% - 5em);
+    margin: 0 auto;
+    padding: 0 1em;
+    border: 1px solid #c9c9c9;
+    border-radius: 1em;
+    font-size: smaller;
+}
+
+nav.toc select {
+    display: block;
+    height: 2em;
+    width: calc(100% - 3em);
+    margin: 5px auto;
+    font-size: smaller;
+    text-align: center;
+}
+
+nav.toc > ul * {
+    margin: 0;
+}
+
+nav.toc ul {
+    color: #b3b3b3;
+    padding: 0;
+    list-style: none;
+}
+
+nav.toc ul .toctext {
+    color: inherit;
+    display: block;
+}
+
+nav.toc ul a:hover {
+    background-color: #4e4a4a;
+}
+
+nav.toc ul.internal a {
+    color: inherit;
+    display: block;
+}
+
+nav.toc ul.internal a:hover {
+    background-color: #d6d6d6;
+}
+
+nav.toc ul.internal {
+    color: gray;
+    background-color: #e3e3e3;
+    box-shadow: inset -14px 0px 5px -12px rgb(210,210,210);
+    list-style: none;
+}
+
+nav.toc ul.internal li.toplevel {
+    border-top: 1px solid #c9c9c9;
+    font-weight: bold;
+}
+
+nav.toc ul.internal li.toplevel:first-child {
+    border-top: none;
+}
+
+nav.toc .toctext {
+    padding-top: 0.3em;
+    padding-bottom: 0.3em;
+    padding-right: 1em;
+}
+
+nav.toc ul .toctext {
+    padding-left: 1em;
+}
+
+nav.toc ul ul .toctext {
+    padding-left: 2em;
+}
+
+nav.toc ul ul ul .toctext {
+    padding-left: 3em;
+}
+
+nav.toc li.current > .toctext {
+    border-top: 1px solid #c9c9c9;
+    border-bottom: 1px solid #c9c9c9;
+    color: #404040;
+    font-weight: bold;
+    background-color: white;
+}
+
+article {
+    margin-left: 20em;
+    min-width: 20em;
+    max-width: 48em;
+    padding: 2em;
+}
+
+article > header {}
+
+article > header nav ul {
+    display: inline-block;
+    list-style: none;
+    margin: 0;
+    padding: 0;
+}
+
+article > header nav li {
+    display: inline-block;
+    padding-right: 0.2em;
+}
+
+article > header nav li:before {
+    content: "»";
+    padding-right: 0.2em;
+}
+
+article > header .edit-page {
+    float: right;
+}
+
+article > footer {}
+
+article > footer a.prev {
+    float: left;
+}
+article > footer a.next {
+    float: right;
+}
+
+article > footer a .direction:after {
+    content: ": ";
+}
+
+article hr {
+    margin: 1em 0;
+}
+
+article section.docstring {
+    border: 1px solid #ddd;
+    margin: 0.5em 0;
+    padding: 0.5em;
+    border-radius: 3px;
+}
+
+article section.docstring .docstring-header {
+    margin-bottom: 1em;
+}
+
+article section.docstring .docstring-binding {
+    color: #333;
+    font-weight: bold;
+}
+
+article section.docstring .docstring-category {
+    font-style: italic;
+}
+
+article section.docstring a.source-link {
+  float: left;
+  font-weight: bold;
+}
+
+.nav-anchor,
+.nav-anchor:hover,
+.nav-anchor:visited {
+    color: #333;
+}
+
+/*
+ * Admonitions
+ *
+ * Colors (title, body)
+ * warning: #f0b37e #ffedcc (orange)
+ * note:    #6ab0de #e7f2fa (blue)
+ * tip:     #1abc9c #dbfaf4 (green)
+*/
+.admonition {
+    border-radius: 3px;
+    background-color: #eeeeee;
+}
+
+.admonition-title {
+    border-radius: 3px 3px 0 0;
+    background-color: #9b9b9b;
+    padding: 0.15em 0.5em;
+}
+
+.admonition-text {
+    padding: 0.5em;
+}
+
+.admonition-text > :first-child {
+    margin-top: 0;
+}
+
+.admonition-text > :last-child {
+    margin-bottom: 0;
+}
+
+.admonition > .admonition-title:before {
+    font-family: "FontAwesome";
+    margin-right: 5px;
+    content: "\f06a";
+}
+
+.admonition.warning > .admonition-title {
+    background-color: #f0b37e;
+}
+
+.admonition.warning {
+    background-color: #ffedcc;
+}
+
+.admonition.note > .admonition-title {
+    background-color: #6ab0de;
+}
+
+.admonition.note {
+    background-color: #e7f2fa;
+}
+
+.admonition.tip > .admonition-title {
+    background-color: #1abc9c;
+}
+
+.admonition.tip {
+    background-color: #dbfaf4;
+}
+
+
+/* footnotes */
+.footnote {
+    padding-left: 0.8em;
+    border-left: 2px solid #ccc;
+}
+
+/* Search page */
+#search-results .category {
+    font-size: smaller;
+}
+
+#search-results .category:before {
+    content: " ";
+}
+
+/* Overriding the <code> block style of highligh.js.
+ * We have to override the padding and the background-color, since we style this
+ * part ourselves. Specifically, we style the <pre> surrounding the <code>, while
+ * highlight.js applies the .hljs style directly to the <code> tag.
+ */
+.hljs {
+    background-color: transparent;
+    padding: 0;
+}
--- a/v0.1.1/assets/documenter.js
+++ b/v0.1.1/assets/documenter.js
@ -0,0 +1,65 @@
+/*
+ * Part of Documenter.jl
+ *     https://github.com/JuliaDocs/Documenter.jl
+ *
+ * License: MIT
+ */
+
+requirejs.config({
+    paths: {
+        'jquery': 'https://code.jquery.com/jquery-3.1.0.js?',
+        'jqueryui': 'https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.0/jquery-ui.min',
+        'mathjax': 'https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML',
+        'highlight': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/highlight.min',
+        'highlight-julia': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/languages/julia.min',
+    },
+    shim: {
+        'mathjax' : {
+            exports: "MathJax"
+        },
+        'highlight-julia': ['highlight']
+    }
+});
+
+// Load MathJax
+require(['mathjax'], function(MathJax) {
+    MathJax.Hub.Config({
+      "tex2jax": {
+        inlineMath: [['$','$'], ['\\(','\\)']],
+        processEscapes: true
+      }
+    });
+    MathJax.Hub.Config({
+      config: ["MMLorHTML.js"],
+      jax: [
+        "input/TeX",
+        "output/HTML-CSS",
+        "output/NativeMML"
+      ],
+      extensions: [
+        "MathMenu.js",
+        "MathZoom.js",
+        "TeX/AMSmath.js",
+        "TeX/AMSsymbols.js",
+        "TeX/autobold.js",
+        "TeX/autoload-all.js"
+      ]
+    });
+    MathJax.Hub.Config({
+      TeX: { equationNumbers: { autoNumber: "AMS" } }
+    });
+})
+
+require(['jquery', 'highlight', 'highlight-julia'], function($, hljs) {
+    $(document).ready(function() {
+        if (typeof DOC_VERSIONS !== 'undefined') {
+            var version_selector = $("#version-selector");
+            DOC_VERSIONS.forEach(function(each) {
+                var option = $("<option value='" + documenterBaseURL + "/../" + each + "'>" + each + "</option>");
+                version_selector.append(option);
+            });
+        }
+        hljs.initHighlighting();
+    })
+
+})
--- a/v0.1.1/assets/search.js
+++ b/v0.1.1/assets/search.js
@ -0,0 +1,91 @@
+/*
+ * Part of Documenter.jl
+ *     https://github.com/JuliaDocs/Documenter.jl
+ *
+ * License: MIT
+ */
+
+// parseUri 1.2.2
+// (c) Steven Levithan <stevenlevithan.com>
+// MIT License
+function parseUri (str) {
+	var	o   = parseUri.options,
+		m   = o.parser[o.strictMode ? "strict" : "loose"].exec(str),
+		uri = {},
+		i   = 14;
+
+	while (i--) uri[o.key[i]] = m[i] || "";
+
+	uri[o.q.name] = {};
+	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
+		if ($1) uri[o.q.name][$1] = $2;
+	});
+
+	return uri;
+};
+parseUri.options = {
+	strictMode: false,
+	key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"],
+	q:   {
+		name:   "queryKey",
+		parser: /(?:^|&)([^&=]*)=?([^&]*)/g
+	},
+	parser: {
+		strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/,
+		loose:  /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
+	}
+};
+
+requirejs.config({
+    paths: {
+        'jquery': 'https://code.jquery.com/jquery-3.1.0.js?',
+        'lunr': 'https://cdnjs.cloudflare.com/ajax/libs/lunr.js/0.7.1/lunr.min',
+    }
+});
+
+var currentScript = document.currentScript;
+
+require(["jquery", "lunr"], function($, lunr) {
+    var index = lunr(function () {
+        this.ref('location')
+        this.field('title', {boost: 10})
+        this.field('text')
+    })
+    var store = {}
+
+    documenterSearchIndex['docs'].forEach(function(e) {
+        index.add(e)
+        store[e.location] = e
+    })
+
+    $(function(){
+        function update_search(query) {
+            results = index.search(query)
+            $('#search-info').text("Number of results: " + results.length)
+            $('#search-results').empty()
+            results.forEach(function(result) {
+                data = store[result.ref]
+                link = $('<a>')
+                link.text(data.title)
+                link.attr('href', documenterBaseURL+'/'+result.ref)
+                cat = $('<span class="category">('+data.category+')</span>')
+                li = $('<li>').append(link).append(cat)
+                $('#search-results').append(li)
+            })
+        }
+
+        function update_search_box() {
+            query = $('#search-query').val()
+            update_search(query)
+        }
+
+        $('#search-query').keyup(update_search_box)
+        $('#search-query').change(update_search_box)
+
+        search_query = parseUri(window.location).queryKey["q"]
+        if(search_query !== undefined) {
+            $("#search-query").val(search_query)
+        }
+        update_search_box();
+    })
+})
--- a/v0.1.1/contributing.html
+++ b/v0.1.1/contributing.html
@ -0,0 +1,212 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Contributing &amp; Help · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL="."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="assets/documenter.js"></script>
+    <script src="../versions.js"></script>
+    <link href="../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li class="current">
+          <a class="toctext" href="contributing.html">
+Contributing &amp; Help
+          </a>
+          <ul class="internal"></ul>
+        </li>
+        <li>
+          <a class="toctext" href="internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+              <a href="contributing.html">
+Contributing &amp; Help
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/contributing.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Contributing-1" href="#Contributing-1">
+Contributing
+        </a>
+      </h1>
+      <p>
+If you need help, please ask on the 
+        <a href="https://discourse.julialang.org/">
+Julia forum
+        </a>
+ or on Flux&#39;s 
+        <a href="https://gitter.im/MikeInnes/Flux.jl">
+Gitter
+        </a>
+.
+      </p>
+      <p>
+Right now, the best way to help out is to try out the examples and report any issues or missing features as you find them. The second best way is to help us spread the word, perhaps by 
+        <a href="https://github.com/MikeInnes/Flux.jl">
+starring the repo
+        </a>
+.
+      </p>
+      <p>
+If you&#39;re interested in hacking on Flux, most of the 
+        <a href="https://github.com/MikeInnes/Flux.jl/tree/master/src">
+code
+        </a>
+ is pretty straightforward. Adding new 
+        <a href="https://github.com/MikeInnes/Flux.jl/tree/master/src/layers">
+layer definitions
+        </a>
+ or cost functions is simple using the Flux DSL itself, and things like data utilities and training processes are all plain Julia code. The 
+<code>compiler</code>
+ directory is a bit more involved and is documented in 
+        <a href="interals.html">
+internals
+        </a>
+, but most changes won&#39;t need to touch that.
+      </p>
+      <p>
+If you get stuck or need anything, let us know!
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="examples/char-rnn.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Char RNN
+          </span>
+        </a>
+        <a class="next" href="internals.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Internals
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/examples/char-rnn.html
+++ b/v0.1.1/examples/char-rnn.html
@ -0,0 +1,269 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Char RNN · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li class="current">
+              <a class="toctext" href="char-rnn.html">
+Char RNN
+              </a>
+              <ul class="internal"></ul>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+In Action
+            </li>
+            <li>
+              <a href="char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/examples/char-rnn.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Char-RNN-1" href="#Char-RNN-1">
+Char RNN
+        </a>
+      </h1>
+      <p>
+This walkthrough will take you through a model like that used in 
+        <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">
+Karpathy&#39;s 2015 blog post
+        </a>
+, which can learn to generate text in the style of Shakespeare (or whatever else you may use as input). 
+<code>shakespeare_input.txt</code>
+ is 
+        <a href="http://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt">
+here
+        </a>
+.
+      </p>
+<pre><code class="language-julia">using Flux
+import StatsBase: wsample</code></pre>
+      <p>
+Firstly, we define up front how many steps we want to unroll the RNN, and the number of data points to batch together. Then we create some functions to prepare our data, using Flux&#39;s built-in utilities.
+      </p>
+<pre><code class="language-julia">nunroll = 50
+nbatch = 50
+
+getseqs(chars, alphabet) = sequences((onehot(Float32, char, alphabet) for char in chars), nunroll)
+getbatches(chars, alphabet) = batches((getseqs(part, alphabet) for part in chunk(chars, nbatch))...)</code></pre>
+      <p>
+Because we want the RNN to predict the next letter at each iteration, our target data is simply our input data offset by one. For example, if the input is &quot;The quick brown fox&quot;, the target will be &quot;he quick brown fox &quot;. Each letter is one-hot encoded and sequences are batched together to create the training data.
+      </p>
+<pre><code class="language-julia">input = readstring(&quot;shakespeare_input.txt&quot;)
+alphabet = unique(input)
+N = length(alphabet)
+
+Xs, Ys = getbatches(input, alphabet), getbatches(input[2:end], alphabet)</code></pre>
+      <p>
+Creating the model and training it is straightforward:
+      </p>
+<pre><code class="language-julia">model = Chain(
+  Input(N),
+  LSTM(N, 256),
+  LSTM(256, 256),
+  Affine(256, N),
+  softmax)
+
+m = tf(unroll(model, nunroll))
+
+@time Flux.train!(m, Xs, Ys, η = 0.1, epoch = 1)</code></pre>
+      <p>
+Finally, we can sample the model. For sampling we remove the 
+<code>softmax</code>
+ from the end of the chain so that we can &quot;sharpen&quot; the resulting probabilities.
+      </p>
+<pre><code class="language-julia">function sample(model, n, temp = 1)
+  s = [rand(alphabet)]
+  m = tf(unroll(model, 1))
+  for i = 1:n
+    push!(s, wsample(alphabet, softmax(m(Seq((onehot(Float32, s[end], alphabet),)))[1]./temp)))
+  end
+  return string(s...)
+end
+
+sample(model[1:end-1], 100)</code></pre>
+      <p>
+<code>sample</code>
+ then produces a string of Shakespeare-like text. This won&#39;t produce great results after only a single epoch (though they will be recognisably different from the untrained model). Going for 30 epochs or so produces good results.
+      </p>
+      <p>
+Trained on 
+        <a href="https://gist.githubusercontent.com/MikeInnes/c2d11b57a58d7f2466b8013b88df1f1c/raw/4423f7cb07c71c80bd6458bb94f7bf5338403284/julia.jl">
+a dataset from base Julia
+        </a>
+, the network can produce code like:
+      </p>
+<pre><code class="language-julia">function show(io::IO, md::Githompty)
+    Buffer(jowerTriangular(inals[i], initabs_indices), characters, side, nextfloat(typeof(x)))
+    isnull(r) &amp;&amp; return
+    start::I!
+    for j = 1:length(b,1)
+        a = s-&gt;cosvect(code)
+        return
+    end
+    indsERenv | maximum(func,lsg))
+    for i = 1:last(Abjelar) &amp;&amp; fname (=== nothing)
+        throw(ArgumentError(&quot;read is declave non-fast-a/remaining of not descride method names&quot;))
+    end
+    if e.ht === Int
+        # update file to a stroducative, but is decould.
+        # xna i -GB =# [unsafe_color &lt;c *has may num 20&lt;11E 16/s
+        tuple | Expr(:(UnitLowerTriangular(transpose,(repl.ptr)))
+        dims = pipe_read(s,Int(a)...)
+    ex,0 + y.uilid_func &amp; find_finwprevend(msg,:2)
+    ex = stage(c)
+    # uvvalue begin
+    end
+end</code></pre>
+      <footer>
+        <hr/>
+        <a class="previous" href="logreg.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Logistic Regression
+          </span>
+        </a>
+        <a class="next" href="../contributing.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Contributing &amp; Help
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/examples/logreg.html
+++ b/v0.1.1/examples/logreg.html
@ -0,0 +1,260 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Logistic Regression · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li class="current">
+              <a class="toctext" href="logreg.html">
+Logistic Regression
+              </a>
+              <ul class="internal"></ul>
+            </li>
+            <li>
+              <a class="toctext" href="char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+In Action
+            </li>
+            <li>
+              <a href="logreg.html">
+Logistic Regression
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/examples/logreg.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Logistic-Regression-with-MNIST-1" href="#Logistic-Regression-with-MNIST-1">
+Logistic Regression with MNIST
+        </a>
+      </h1>
+      <p>
+This walkthrough example will take you through writing a multi-layer perceptron that classifies MNIST digits with high accuracy.
+      </p>
+      <p>
+First, we load the data using the MNIST package:
+      </p>
+<pre><code class="language-julia">using Flux, MNIST
+
+data = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]
+train = data[1:50_000]
+test = data[50_001:60_000]</code></pre>
+      <p>
+The only Flux-specific function here is 
+<code>onehot</code>
+, which takes a class label and turns it into a one-hot-encoded vector that we can use for training. For example:
+      </p>
+<pre><code class="language-julia">julia&gt; onehot(:b, [:a, :b, :c])
+3-element Array{Int64,1}:
+ 0
+ 1
+ 0</code></pre>
+      <p>
+Otherwise, the format of the data is simple enough, it&#39;s just a list of tuples from input to output. For example:
+      </p>
+<pre><code class="language-julia">julia&gt; data[1]
+([0.0,0.0,0.0, … 0.0,0.0,0.0],[0,0,0,0,0,1,0,0,0,0])</code></pre>
+      <p>
+<code>data[1][1]</code>
+ is a 
+<code>28*28 == 784</code>
+ length vector (mostly zeros due to the black background) and 
+<code>data[1][2]</code>
+ is its classification.
+      </p>
+      <p>
+Now we define our model, which will simply be a function from one to the other.
+      </p>
+<pre><code class="language-julia">m = Chain(
+  Input(784),
+  Affine(128), relu,
+  Affine( 64), relu,
+  Affine( 10), softmax)
+
+model = tf(m)</code></pre>
+      <p>
+We can try this out on our data already:
+      </p>
+<pre><code class="language-julia">julia&gt; model(data[1][1])
+10-element Array{Float64,1}:
+ 0.10614  
+ 0.0850447
+ 0.101474
+ ...</code></pre>
+      <p>
+The model gives a probability of about 0.1 to each class – which is a way of saying, &quot;I have no idea&quot;. This isn&#39;t too surprising as we haven&#39;t shown it any data yet. This is easy to fix:
+      </p>
+<pre><code class="language-julia">Flux.train!(model, train, test, η = 1e-4)</code></pre>
+      <p>
+The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you&#39;ll see a progress meter, which you can hover over to see the remaining computation time.
+      </p>
+      <p>
+Towards the end of the training process, Flux will have reported that the accuracy of the model is now about 90%. We can try it on our data again:
+      </p>
+<pre><code class="language-julia">10-element Array{Float32,1}:
+ ...
+ 5.11423f-7
+ 0.9354     
+ 3.1033f-5  
+ 0.000127077
+ ...</code></pre>
+      <p>
+Notice the class at 93%, suggesting our model is very confident about this image. We can use 
+<code>onecold</code>
+ to compare the true and predicted classes:
+      </p>
+<pre><code class="language-julia">julia&gt; onecold(data[1][2], 0:9)
+5
+
+julia&gt; onecold(model(data[1][1]), 0:9)
+5</code></pre>
+      <p>
+Success!
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="../apis/storage.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Storing Models
+          </span>
+        </a>
+        <a class="next" href="char-rnn.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Char RNN
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@ -0,0 +1,249 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Home · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL="."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="assets/documenter.js"></script>
+    <script src="../versions.js"></script>
+    <link href="../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li class="current">
+          <a class="toctext" href="index.html">
+Home
+          </a>
+          <ul class="internal">
+            <li>
+              <a class="toctext" href="#Where-do-I-start?-1">
+Where do I start?
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="#Installation-1">
+Installation
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+              <a href="index.html">
+Home
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/index.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Flux-1" href="#Flux-1">
+Flux
+        </a>
+      </h1>
+      <p>
+Flux is a high-level interface for machine learning, implemented in Julia.
+      </p>
+      <p>
+Flux aims to be an intuitive and powerful notation, close to the mathematics, that provides advanced features like auto-unrolling and closures. Simple models are trivial, while the most complex architectures are tractable, taking orders of magnitude less code than in other frameworks. Meanwhile, the Flux compiler provides excellent error messages and tools for debugging when things go wrong.
+      </p>
+      <p>
+So what&#39;s the catch? Flux is at an early &quot;working prototype&quot; stage; many things work but the API is still in a state of... well, it might change. If you&#39;re interested to find out what works, read on!
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Where-do-I-start?-1" href="#Where-do-I-start?-1">
+Where do I start?
+        </a>
+      </h2>
+      <p>
+The 
+        <a href="examples/logreg.html">
+examples
+        </a>
+ are the best way to get a feel for how Flux looks. This a great way to start if you&#39;re a relative newbie to machine learning or neural networks; you should be able to get the examples running fairly easily.
+      </p>
+      <p>
+If you have more experience with ML, or you just don&#39;t want to see 
+        <em>
+those digits
+        </em>
+ again, check out the 
+        <a href="models/basics.html">
+model building guide
+        </a>
+ instead. The Guide attempts to motivate Flux&#39;s programming model and approach with examples. However, it also gets into advanced usage very quickly; it&#39;s not necessary to memorise all the details to use Flux effectively.
+      </p>
+      <p>
+The sections on 
+        <a href="models/recurrent.html">
+Recurrence
+        </a>
+, 
+        <a href="models/debugging.html">
+Debugging
+        </a>
+ and 
+        <a href="apis/batching.html">
+Batching
+        </a>
+ best illustrate what makes Flux unique.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Installation-1" href="#Installation-1">
+Installation
+        </a>
+      </h2>
+      <p>
+        <em>
+... Charging Ion Capacitors ...
+        </em>
+      </p>
+<pre><code class="language-julia">Pkg.update()
+Pkg.add(&quot;Flux.jl&quot;)</code></pre>
+      <p>
+You&#39;ll also need a backend to run real training, if you don&#39;t have one already. Choose from 
+        <a href="https://github.com/dmlc/MXNet.jl">
+MXNet
+        </a>
+ or 
+        <a href="https://github.com/malmaud/TensorFlow.jl">
+TensorFlow
+        </a>
+ (MXNet is the recommended option if you&#39;re not sure):
+      </p>
+<pre><code class="language-julia">Pkg.add(&quot;MXNet&quot;) # or &quot;TensorFlow&quot;
+Pkg.test(&quot;Flux&quot;) # Make sure everything installed properly</code></pre>
+      <footer>
+        <hr/>
+        <a class="next" href="models/basics.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Model Building Basics
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/internals.html
+++ b/v0.1.1/internals.html
@ -0,0 +1,169 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Internals · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL="."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="assets/documenter.js"></script>
+    <script src="../versions.js"></script>
+    <link href="../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li class="current">
+          <a class="toctext" href="internals.html">
+Internals
+          </a>
+          <ul class="internal"></ul>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+              <a href="internals.html">
+Internals
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/internals.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Internals-1" href="#Internals-1">
+Internals
+        </a>
+      </h1>
+      <p>
+[WIP]
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="contributing.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Contributing &amp; Help
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/models/basics.html
+++ b/v0.1.1/models/basics.html
@ -0,0 +1,324 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Model Building Basics · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li class="current">
+              <a class="toctext" href="basics.html">
+Model Building Basics
+              </a>
+              <ul class="internal">
+                <li>
+                  <a class="toctext" href="#The-Model-1">
+The Model
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Combining-Models-1">
+Combining Models
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#A-Function-in-Model's-Clothing-1">
+A Function in Model&#39;s Clothing
+                  </a>
+                </li>
+              </ul>
+            </li>
+            <li>
+              <a class="toctext" href="templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Building Models
+            </li>
+            <li>
+              <a href="basics.html">
+Model Building Basics
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/basics.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">
+Model Building Basics
+        </a>
+      </h1>
+      <h2>
+        <a class="nav-anchor" id="The-Model-1" href="#The-Model-1">
+The Model
+        </a>
+      </h2>
+      <p>
+        <em>
+... Initialising Photon Beams ...
+        </em>
+      </p>
+      <p>
+The core concept in Flux is the 
+        <em>
+model
+        </em>
+. A model (or &quot;layer&quot;) is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):
+      </p>
+<pre><code class="language-julia">W = randn(3,5)
+b = randn(3)
+affine(x) = W * x + b
+
+x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
+y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]</code></pre>
+      <p>
+<code>affine</code>
+ is simply a function which takes some vector 
+<code>x1</code>
+ and outputs a new one 
+<code>y1</code>
+. For example, 
+<code>x1</code>
+ could be data from an image and 
+<code>y1</code>
+ could be predictions about the content of that image. However, 
+<code>affine</code>
+ isn&#39;t static. It has 
+        <em>
+parameters
+        </em>
+ 
+<code>W</code>
+ and 
+<code>b</code>
+, and if we tweak those parameters we&#39;ll tweak the result – hopefully to make the predictions more accurate.
+      </p>
+      <p>
+This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a 
+        <em>
+template
+        </em>
+ which creates these functions for us:
+      </p>
+<pre><code class="language-julia">affine1 = Affine(5, 5)
+affine2 = Affine(5, 5)
+
+softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
+softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]</code></pre>
+      <p>
+We just created two separate 
+<code>Affine</code>
+ layers, and each contains its own (randomly initialised) version of 
+<code>W</code>
+ and 
+<code>b</code>
+, leading to a different result when called with our data. It&#39;s easy to define templates like 
+<code>Affine</code>
+ ourselves (see 
+        <a href="templates.html">
+templates
+        </a>
+), but Flux provides 
+<code>Affine</code>
+ out of the box, so we&#39;ll use that for now.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Combining-Models-1" href="#Combining-Models-1">
+Combining Models
+        </a>
+      </h2>
+      <p>
+        <em>
+... Inflating Graviton Zeppelins ...
+        </em>
+      </p>
+      <p>
+A more complex model usually involves many basic layers like 
+<code>affine</code>
+, where we use the output of one layer as the input to the next:
+      </p>
+<pre><code class="language-julia">mymodel1(x) = softmax(affine2(σ(affine1(x))))
+mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
+      <p>
+This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
+      </p>
+<pre><code class="language-julia">mymodel2 = Chain(affine1, σ, affine2, softmax)
+mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
+      <p>
+<code>mymodel2</code>
+ is exactly equivalent to 
+<code>mymodel1</code>
+ because it simply calls the provided functions in sequence. We don&#39;t have to predefine the affine layers and can also write this as:
+      </p>
+<pre><code class="language-julia">mymodel3 = Chain(
+  Affine(5, 5), σ,
+  Affine(5, 5), softmax)</code></pre>
+      <p>
+You now know enough to take a look at the 
+        <a href="../examples/logreg.html">
+logistic regression
+        </a>
+ example, if you haven&#39;t already.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="A-Function-in-Model's-Clothing-1" href="#A-Function-in-Model's-Clothing-1">
+A Function in Model&#39;s Clothing
+        </a>
+      </h2>
+      <p>
+        <em>
+... Booting Dark Matter Transmogrifiers ...
+        </em>
+      </p>
+      <p>
+We noted above that a &quot;model&quot; is a function with some number of trainable parameters. This goes both ways; a normal Julia function like 
+<code>exp</code>
+ is effectively a model with 0 parameters. Flux doesn&#39;t care, and anywhere that you use one, you can use the other. For example, 
+<code>Chain</code>
+ will happily work with regular functions:
+      </p>
+<pre><code class="language-julia">foo = Chain(exp, sum, log)
+foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))</code></pre>
+      <footer>
+        <hr/>
+        <a class="previous" href="../index.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Home
+          </span>
+        </a>
+        <a class="next" href="templates.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Model Templates
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/models/debugging.html
+++ b/v0.1.1/models/debugging.html
@ -0,0 +1,262 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Debugging · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li class="current">
+              <a class="toctext" href="debugging.html">
+Debugging
+              </a>
+              <ul class="internal"></ul>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Building Models
+            </li>
+            <li>
+              <a href="debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/debugging.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Debugging-Models-1" href="#Debugging-Models-1">
+Debugging Models
+        </a>
+      </h1>
+      <p>
+Let&#39;s take our two-layer perceptron as an example again, running on MXNet:
+      </p>
+<pre><code class="language-julia">@net type TLP
+  first
+  second
+  function (x)
+    l1 = σ(first(x))
+    l2 = softmax(second(l1))
+  end
+end
+
+model = TLP(Affine(10, 20), Affine(21, 15))
+
+mxmodel = mxnet(model)
+
+mxmodel(rand(10))</code></pre>
+      <p>
+Unfortunately, this model has a (fairly obvious) typo, which means that the code above won&#39;t run. Instead we get an error message:
+      </p>
+<pre><code class="language-julia">Error in operator dot2: [21:28:21] src/operator/tensor/./matrix_op-inl.h:460:
+Check failed: lshape[1] == rshape[0] (20 vs. 21) dot shape error: (1,20) X (21,15)
+Flux.Affine at affine.jl:8
+TLP at basic.jl:6
+(::Flux.MX.Model)(::Flux.Batch{Array{Float64,1},Array{Float64,2}}) at model.jl:105
+(::Flux.MX.Model)(::Array{Float64,1}) at model.jl:107</code></pre>
+      <p>
+Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports 
+        <em>
+even when no Julia code has been run
+        </em>
+, e.g. when running on a backend like MXNet. This enables us to pinpoint the source of the error very quickly even in a large model.
+      </p>
+      <p>
+In this case, we can immediately see that the error occurred within an 
+<code>Affine</code>
+ layer. There are two such layers, but this one was called from the second line of 
+<code>TLP</code>
+, so it must be the second 
+<code>Affine</code>
+ layer we defined. The layer expected an input of length 21 but got 20 instead.
+      </p>
+      <p>
+Of course, often a stack trace isn&#39;t enough to figure out the source of an error. Another option is to simply step through the execution of the model using Gallium. While handy, however, stepping isn&#39;t always the best way to get a &quot;bird&#39;s eye view&quot; of the code. For that, Flux provides a macro called 
+<code>@shapes</code>
+:
+      </p>
+<pre><code class="language-julia">julia&gt; @shapes model(rand(5,10))
+
+# /Users/mike/test.jl, line 18:
+gull = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)
+# /Users/mike/.julia/v0.6/Flux/src/layers/affine.jl, line 8:
+lobster = gull * _::(21,15) + _::(1,15)
+# /Users/mike/test.jl, line 19:
+raven = softmax(lobster)</code></pre>
+      <p>
+This is a lot like Julia&#39;s own 
+<code>code_warntype</code>
+; but instead of annotating expressions with types, we display their shapes. As a lowered form it has some quirks; input arguments are represented by 
+<code>Input()[N]</code>
+ and parameters by an underscore.
+      </p>
+      <p>
+This makes the problem fairly obvious. We tried to multiply the output of the first layer 
+<code>(5, 20)</code>
+ by a parameter 
+<code>(21, 15)</code>
+; the inner dimensions should have been equal.
+      </p>
+      <p>
+Notice that while the first 
+<code>Affine</code>
+ layer is displayed as-is, the second was inlined and we see a reference to where the 
+<code>W * x + b</code>
+ line was defined in Flux&#39;s source code. In this way Flux makes it easy to drill down into problem areas, without showing you the full graph of thousands of nodes at once.
+      </p>
+      <p>
+With the typo fixed, the output of 
+<code>@shapes</code>
+ looks as follows:
+      </p>
+<pre><code class="language-julia"># /Users/mike/test.jl, line 18:
+opossum = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)
+# /Users/mike/test.jl, line 19:
+wren = softmax(Affine(20, 15)(opossum)::(5,15))::(5,15)</code></pre>
+      <footer>
+        <hr/>
+        <a class="previous" href="recurrent.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Recurrence
+          </span>
+        </a>
+        <a class="next" href="../apis/batching.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Batching
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/models/recurrent.html
+++ b/v0.1.1/models/recurrent.html
@ -0,0 +1,324 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Recurrence · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="templates.html">
+Model Templates
+              </a>
+            </li>
+            <li class="current">
+              <a class="toctext" href="recurrent.html">
+Recurrence
+              </a>
+              <ul class="internal"></ul>
+            </li>
+            <li>
+              <a class="toctext" href="debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Building Models
+            </li>
+            <li>
+              <a href="recurrent.html">
+Recurrence
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/recurrent.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Recurrent-Models-1" href="#Recurrent-Models-1">
+Recurrent Models
+        </a>
+      </h1>
+      <p>
+        <a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">
+Recurrence
+        </a>
+ is a first-class feature in Flux and recurrent models are very easy to build and use. Recurrences are often illustrated as cycles or self-dependencies in the graph; they can also be thought of as a hidden output from / input to the network. For example, for a sequence of inputs 
+<code>x1, x2, x3 ...</code>
+ we produce predictions as follows:
+      </p>
+<pre><code class="language-julia">y1 = f(W, x1) # `f` is the model, `W` represents the parameters
+y2 = f(W, x2)
+y3 = f(W, x3)
+...</code></pre>
+      <p>
+Each evaluation is independent and the prediction made for a given input will always be the same. That makes a lot of sense for, say, MNIST images, but less sense when predicting a sequence. For that case we introduce the hidden state:
+      </p>
+<pre><code class="language-julia">y1, s = f(W, x1, s)
+y2, s = f(W, x2, s)
+y3, s = f(W, x3, s)
+...</code></pre>
+      <p>
+The state 
+<code>s</code>
+ allows the prediction to depend not only on the current input 
+<code>x</code>
+ but also on the history of past inputs.
+      </p>
+      <p>
+The simplest recurrent network looks as follows in Flux, and it should be familiar if you&#39;ve seen the equations defining an RNN before:
+      </p>
+<pre><code class="language-julia">@net type Recurrent
+  Wxy; Wyy; by
+  y
+  function (x)
+    y = tanh( x * Wxy + y{-1} * Wyy + by )
+  end
+end</code></pre>
+      <p>
+The only difference from a regular feed-forward layer is that we create a variable 
+<code>y</code>
+ which is defined as depending on itself. The 
+<code>y{-1}</code>
+ syntax means &quot;take the value of 
+<code>y</code>
+ from the previous run of the network&quot;.
+      </p>
+      <p>
+Using recurrent layers is straightforward and no different feedforard ones in terms of the 
+<code>Chain</code>
+ macro etc. For example:
+      </p>
+<pre><code class="language-julia">model = Chain(
+    Affine(784, 20), σ
+    Recurrent(20, 30),
+    Recurrent(30, 15))</code></pre>
+      <p>
+Before using the model we need to unroll it. This happens with the 
+<code>unroll</code>
+ function:
+      </p>
+<pre><code class="language-julia">unroll(model, 20)</code></pre>
+      <p>
+This call creates an unrolled, feed-forward version of the model which accepts N (= 20) inputs and generates N predictions at a time. Essentially, the model is replicated N times and Flux ties the hidden outputs 
+<code>y</code>
+ to hidden inputs.
+      </p>
+      <p>
+Here&#39;s a more complex recurrent layer, an LSTM, and again it should be familiar if you&#39;ve seen the 
+        <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">
+equations
+        </a>
+:
+      </p>
+<pre><code class="language-julia">@net type LSTM
+  Wxf; Wyf; bf
+  Wxi; Wyi; bi
+  Wxo; Wyo; bo
+  Wxc; Wyc; bc
+  y; state
+  function (x)
+    # Gates
+    forget = σ( x * Wxf + y{-1} * Wyf + bf )
+    input  = σ( x * Wxi + y{-1} * Wyi + bi )
+    output = σ( x * Wxo + y{-1} * Wyo + bo )
+    # State update and output
+    state′ = tanh( x * Wxc + y{-1} * Wyc + bc )
+    state  = forget .* state{-1} + input .* state′
+    y = output .* tanh(state)
+  end
+end</code></pre>
+      <p>
+The only unfamiliar part is that we have to define all of the parameters of the LSTM upfront, which adds a few lines at the beginning.
+      </p>
+      <p>
+Flux&#39;s very mathematical notation generalises well to handling more complex models. For example, 
+        <a href="https://arxiv.org/abs/1409.0473">
+this neural translation model with alignment
+        </a>
+ can be fairly straightforwardly, and recognisably, translated from the paper into Flux code:
+      </p>
+<pre><code class="language-julia"># A recurrent model which takes a token and returns a context-dependent
+# annotation.
+
+@net type Encoder
+  forward
+  backward
+  token -&gt; hcat(forward(token), backward(token))
+end
+
+Encoder(in::Integer, out::Integer) =
+  Encoder(LSTM(in, out÷2), flip(LSTM(in, out÷2)))
+
+# A recurrent model which takes a sequence of annotations, attends, and returns
+# a predicted output token.
+
+@net type Decoder
+  attend
+  recur
+  state; y; N
+  function (anns)
+    energies = map(ann -&gt; exp(attend(hcat(state{-1}, ann))[1]), seq(anns, N))
+    weights = energies./sum(energies)
+    ctx = sum(map((α, ann) -&gt; α .* ann, weights, anns))
+    (_, state), y = recur((state{-1},y{-1}), ctx)
+    y
+  end
+end
+
+Decoder(in::Integer, out::Integer; N = 1) =
+  Decoder(Affine(in+out, 1),
+          unroll1(LSTM(in, out)),
+          param(zeros(1, out)), param(zeros(1, out)), N)
+
+# The model
+
+Nalpha  =  5 # The size of the input token vector
+Nphrase =  7 # The length of (padded) phrases
+Nhidden = 12 # The size of the hidden state
+
+encode = Encoder(Nalpha, Nhidden)
+decode = Chain(Decoder(Nhidden, Nhidden, N = Nphrase), Affine(Nhidden, Nalpha), softmax)
+
+model = Chain(
+  unroll(encode, Nphrase, stateful = false),
+  unroll(decode, Nphrase, stateful = false, seq = false))</code></pre>
+      <p>
+Note that this model excercises some of the more advanced parts of the compiler and isn&#39;t stable for general use yet.
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="templates.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Model Templates
+          </span>
+        </a>
+        <a class="next" href="debugging.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Debugging
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/models/templates.html
+++ b/v0.1.1/models/templates.html
@ -0,0 +1,370 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Model Templates · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="../assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL=".."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="../assets/documenter.js"></script>
+    <script src="../../versions.js"></script>
+    <link href="../../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="../search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="../index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li class="current">
+              <a class="toctext" href="templates.html">
+Model Templates
+              </a>
+              <ul class="internal">
+                <li>
+                  <a class="toctext" href="#Models-in-templates-1">
+Models in templates
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Constructors-1">
+Constructors
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Supported-syntax-1">
+Supported syntax
+                  </a>
+                </li>
+              </ul>
+            </li>
+            <li>
+              <a class="toctext" href="recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="../examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="../examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="../contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="../internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article id="docs">
+      <header>
+        <nav>
+          <ul>
+            <li>
+Building Models
+            </li>
+            <li>
+              <a href="templates.html">
+Model Templates
+              </a>
+            </li>
+          </ul>
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/854a1e18865742c59b6db6c58e16fee6ef9ef8ce/docs/src/models/templates.md">
+            <span class="fa">
+
+            </span>
+ Edit on GitHub
+          </a>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+        <a class="nav-anchor" id="Model-Templates-1" href="#Model-Templates-1">
+Model Templates
+        </a>
+      </h1>
+      <p>
+        <em>
+... Calculating Tax Expenses ...
+        </em>
+      </p>
+      <p>
+So how does the 
+<code>Affine</code>
+ template work? We don&#39;t want to duplicate the code above whenever we need more than one affine layer:
+      </p>
+<pre><code class="language-julia">W₁, b₁ = randn(...)
+affine₁(x) = W₁*x + b₁
+W₂, b₂ = randn(...)
+affine₂(x) = W₂*x + b₂
+model = Chain(affine₁, affine₂)</code></pre>
+      <p>
+Here&#39;s one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:
+      </p>
+<pre><code class="language-julia">type MyAffine
+  W
+  b
+end
+
+# Use the `MyAffine` layer as a model
+(l::MyAffine)(x) = l.W * x + l.b
+
+# Convenience constructor
+MyAffine(in::Integer, out::Integer) =
+  MyAffine(randn(out, in), randn(out))
+
+model = Chain(MyAffine(5, 5), MyAffine(5, 5))
+
+model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]</code></pre>
+      <p>
+This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the 
+<code>@net</code>
+ macro:
+      </p>
+<pre><code class="language-julia">@net type MyAffine
+  W
+  b
+  x -&gt; x * W + b
+end</code></pre>
+      <p>
+The function provided, 
+<code>x -&gt; x * W + b</code>
+, will be used when 
+<code>MyAffine</code>
+ is used as a model; it&#39;s just a shorter way of defining the 
+<code>(::MyAffine)(x)</code>
+ method above. (You may notice that 
+<code>W</code>
+ and 
+<code>x</code>
+ have swapped order in the model; this is due to the way batching works, which will be covered in more detail later on.)
+      </p>
+      <p>
+However, 
+<code>@net</code>
+ does not simply save us some keystrokes; it&#39;s the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.
+      </p>
+      <p>
+The above code is almost exactly how 
+<code>Affine</code>
+ is defined in Flux itself! There&#39;s no difference between &quot;library-level&quot; and &quot;user-level&quot; models, so making your code reusable doesn&#39;t involve a lot of extra complexity. Moreover, much more complex models than 
+<code>Affine</code>
+ are equally simple to define.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Models-in-templates-1" href="#Models-in-templates-1">
+Models in templates
+        </a>
+      </h2>
+      <p>
+<code>@net</code>
+ models can contain sub-models as well as just array parameters:
+      </p>
+<pre><code class="language-julia">@net type TLP
+  first
+  second
+  function (x)
+    l1 = σ(first(x))
+    l2 = softmax(second(l1))
+  end
+end</code></pre>
+      <p>
+Just as above, this is roughly equivalent to writing:
+      </p>
+<pre><code class="language-julia">type TLP
+  first
+  second
+end
+
+function (self::TLP)(x)
+  l1 = σ(self.first(x))
+  l2 = softmax(self.second(l1))
+end</code></pre>
+      <p>
+Clearly, the 
+<code>first</code>
+ and 
+<code>second</code>
+ parameters are not arrays here, but should be models themselves, and produce a result when called with an input array 
+<code>x</code>
+. The 
+<code>Affine</code>
+ layer fits the bill, so we can instantiate 
+<code>TLP</code>
+ with two of them:
+      </p>
+<pre><code class="language-julia">model = TLP(Affine(10, 20),
+            Affine(20, 15))
+x1 = rand(20)
+model(x1) # [0.057852,0.0409741,0.0609625,0.0575354 ...</code></pre>
+      <p>
+You may recognise this as being equivalent to
+      </p>
+<pre><code class="language-julia">Chain(
+  Affine(10, 20), σ
+  Affine(20, 15), softmax)</code></pre>
+      <p>
+given that it&#39;s just a sequence of calls. For simple networks 
+<code>Chain</code>
+ is completely fine, although the 
+<code>@net</code>
+ version is more powerful as we can (for example) reuse the output 
+<code>l1</code>
+ more than once.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Constructors-1" href="#Constructors-1">
+Constructors
+        </a>
+      </h2>
+      <p>
+<code>Affine</code>
+ has two array parameters, 
+<code>W</code>
+ and 
+<code>b</code>
+. Just like any other Julia type, it&#39;s easy to instantiate an 
+<code>Affine</code>
+ layer with parameters of our choosing:
+      </p>
+<pre><code class="language-julia">a = Affine(rand(10, 20), rand(20))</code></pre>
+      <p>
+However, for convenience and to avoid errors, we&#39;d probably rather specify the input and output dimension instead:
+      </p>
+<pre><code class="language-julia">a = Affine(10, 20)</code></pre>
+      <p>
+This is easy to implement using the usual Julia syntax for constructors:
+      </p>
+<pre><code class="language-julia">Affine(in::Integer, out::Integer) =
+  Affine(randn(in, out), randn(1, out))</code></pre>
+      <p>
+In practice, these constructors tend to take the parameter initialisation function as an argument so that it&#39;s more easily customisable, and use 
+<code>Flux.initn</code>
+ by default (which is equivalent to 
+<code>randn(...)/100</code>
+). So 
+<code>Affine</code>
+&#39;s constructor really looks like this:
+      </p>
+<pre><code class="language-julia">Affine(in::Integer, out::Integer; init = initn) =
+  Affine(init(in, out), init(1, out))</code></pre>
+      <h2>
+        <a class="nav-anchor" id="Supported-syntax-1" href="#Supported-syntax-1">
+Supported syntax
+        </a>
+      </h2>
+      <p>
+The syntax used to define a forward pass like 
+<code>x -&gt; x*W + b</code>
+ behaves exactly like Julia code for the most part. However, it&#39;s important to remember that it&#39;s defining a dataflow graph, not a general Julia expression. In practice this means that anything side-effectful, or things like control flow and 
+<code>println</code>
+s, won&#39;t work as expected. In future we&#39;ll continue to expand support for Julia syntax and features.
+      </p>
+      <footer>
+        <hr/>
+        <a class="previous" href="basics.html">
+          <span class="direction">
+Previous
+          </span>
+          <span class="title">
+Model Building Basics
+          </span>
+        </a>
+        <a class="next" href="recurrent.html">
+          <span class="direction">
+Next
+          </span>
+          <span class="title">
+Recurrence
+          </span>
+        </a>
+      </footer>
+    </article>
+  </body>
+</html>
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@ -0,0 +1,153 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+    <title>
+Search · Flux
+    </title>
+    <script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-36890222-9', 'auto');
+ga('send', 'pageview');
+
+    </script>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.5.0/styles/default.min.css" rel="stylesheet" type="text/css"/>
+    <link href="https://fonts.googleapis.com/css?family=Lato|Ubuntu+Mono" rel="stylesheet" type="text/css"/>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
+    <link href="assets/documenter.css" rel="stylesheet" type="text/css"/>
+    <script>
+documenterBaseURL="."
+    </script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js" data-main="assets/documenter.js"></script>
+    <script src="../versions.js"></script>
+    <link href="../flux.css" rel="stylesheet" type="text/css"/>
+  </head>
+  <body>
+    <nav class="toc">
+      <h1>
+Flux
+      </h1>
+      <form class="search" action="search.html">
+        <select id="version-selector" onChange="window.location.href=this.value">
+          <option value="#" selected="selected" disabled="disabled">
+Version
+          </option>
+        </select>
+        <input id="search-query" name="q" type="text" placeholder="Search docs"/>
+      </form>
+      <ul>
+        <li>
+          <a class="toctext" href="index.html">
+Home
+          </a>
+        </li>
+        <li>
+          <span class="toctext">
+Building Models
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="models/basics.html">
+Model Building Basics
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/templates.html">
+Model Templates
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/recurrent.html">
+Recurrence
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="models/debugging.html">
+Debugging
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+Other APIs
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="apis/batching.html">
+Batching
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/backends.html">
+Backends
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="apis/storage.html">
+Storing Models
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <span class="toctext">
+In Action
+          </span>
+          <ul>
+            <li>
+              <a class="toctext" href="examples/logreg.html">
+Logistic Regression
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="examples/char-rnn.html">
+Char RNN
+              </a>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <a class="toctext" href="contributing.html">
+Contributing &amp; Help
+          </a>
+        </li>
+        <li>
+          <a class="toctext" href="internals.html">
+Internals
+          </a>
+        </li>
+      </ul>
+    </nav>
+    <article>
+      <header>
+        <nav>
+          <ul>
+            <li>
+Search
+            </li>
+          </ul>
+        </nav>
+        <hr/>
+      </header>
+      <h1>
+Search
+      </h1>
+      <p id="search-info">
+Number of results: 
+        <span id="search-results-number">
+loading...
+        </span>
+      </p>
+      <ul id="search-results"></ul>
+    </article>
+  </body>
+  <script src="search_index.js"></script>
+  <script src="assets/search.js"></script>
+</html>
--- a/v0.1.1/search_index.js
+++ b/v0.1.1/search_index.js
@ -0,0 +1,299 @@
+var documenterSearchIndex = {"docs": [
+
+{
+    "location": "index.html#",
+    "page": "Home",
+    "title": "Home",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "index.html#Flux-1",
+    "page": "Home",
+    "title": "Flux",
+    "category": "section",
+    "text": "Flux is a high-level interface for machine learning, implemented in Julia.Flux aims to be an intuitive and powerful notation, close to the mathematics, that provides advanced features like auto-unrolling and closures. Simple models are trivial, while the most complex architectures are tractable, taking orders of magnitude less code than in other frameworks. Meanwhile, the Flux compiler provides excellent error messages and tools for debugging when things go wrong.So what's the catch? Flux is at an early \"working prototype\" stage; many things work but the API is still in a state of... well, it might change. If you're interested to find out what works, read on!"
+},
+
+{
+    "location": "index.html#Where-do-I-start?-1",
+    "page": "Home",
+    "title": "Where do I start?",
+    "category": "section",
+    "text": "The examples are the best way to get a feel for how Flux looks. This a great way to start if you're a relative newbie to machine learning or neural networks; you should be able to get the examples running fairly easily.If you have more experience with ML, or you just don't want to see those digits again, check out the model building guide instead. The Guide attempts to motivate Flux's programming model and approach with examples. However, it also gets into advanced usage very quickly; it's not necessary to memorise all the details to use Flux effectively.The sections on Recurrence, Debugging and Batching best illustrate what makes Flux unique."
+},
+
+{
+    "location": "index.html#Installation-1",
+    "page": "Home",
+    "title": "Installation",
+    "category": "section",
+    "text": "... Charging Ion Capacitors ...Pkg.update()\nPkg.add(\"Flux.jl\")You'll also need a backend to run real training, if you don't have one already. Choose from MXNet or TensorFlow (MXNet is the recommended option if you're not sure):Pkg.add(\"MXNet\") # or \"TensorFlow\"\nPkg.test(\"Flux\") # Make sure everything installed properly"
+},
+
+{
+    "location": "models/basics.html#",
+    "page": "Model Building Basics",
+    "title": "Model Building Basics",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "models/basics.html#Model-Building-Basics-1",
+    "page": "Model Building Basics",
+    "title": "Model Building Basics",
+    "category": "section",
+    "text": ""
+},
+
+{
+    "location": "models/basics.html#The-Model-1",
+    "page": "Model Building Basics",
+    "title": "The Model",
+    "category": "section",
+    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own (randomly initialised) version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see templates), but Flux provides Affine out of the box, so we'll use that for now."
+},
+
+{
+    "location": "models/basics.html#Combining-Models-1",
+    "page": "Model Building Basics",
+    "title": "Combining Models",
+    "category": "section",
+    "text": "... Inflating Graviton Zeppelins ...A more complex model usually involves many basic layers like affine, where we use the output of one layer as the input to the next:mymodel1(x) = softmax(affine2(σ(affine1(x))))\nmymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:mymodel2 = Chain(affine1, σ, affine2, softmax)\nmymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]mymodel2 is exactly equivalent to mymodel1 because it simply calls the provided functions in sequence. We don't have to predefine the affine layers and can also write this as:mymodel3 = Chain(\n  Affine(5, 5), σ,\n  Affine(5, 5), softmax)You now know enough to take a look at the logistic regression example, if you haven't already."
+},
+
+{
+    "location": "models/basics.html#A-Function-in-Model's-Clothing-1",
+    "page": "Model Building Basics",
+    "title": "A Function in Model's Clothing",
+    "category": "section",
+    "text": "... Booting Dark Matter Transmogrifiers ...We noted above that a \"model\" is a function with some number of trainable parameters. This goes both ways; a normal Julia function like exp is effectively a model with 0 parameters. Flux doesn't care, and anywhere that you use one, you can use the other. For example, Chain will happily work with regular functions:foo = Chain(exp, sum, log)\nfoo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))"
+},
+
+{
+    "location": "models/templates.html#",
+    "page": "Model Templates",
+    "title": "Model Templates",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "models/templates.html#Model-Templates-1",
+    "page": "Model Templates",
+    "title": "Model Templates",
+    "category": "section",
+    "text": "... Calculating Tax Expenses ...So how does the Affine template work? We don't want to duplicate the code above whenever we need more than one affine layer:W₁, b₁ = randn(...)\naffine₁(x) = W₁*x + b₁\nW₂, b₂ = randn(...)\naffine₂(x) = W₂*x + b₂\nmodel = Chain(affine₁, affine₂)Here's one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:type MyAffine\n  W\n  b\nend\n\n# Use the `MyAffine` layer as a model\n(l::MyAffine)(x) = l.W * x + l.b\n\n# Convenience constructor\nMyAffine(in::Integer, out::Integer) =\n  MyAffine(randn(out, in), randn(out))\n\nmodel = Chain(MyAffine(5, 5), MyAffine(5, 5))\n\nmodel(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the @net macro:@net type MyAffine\n  W\n  b\n  x -> x * W + b\nendThe function provided, x -> x * W + b, will be used when MyAffine is used as a model; it's just a shorter way of defining the (::MyAffine)(x) method above. (You may notice that W and x have swapped order in the model; this is due to the way batching works, which will be covered in more detail later on.)However, @net does not simply save us some keystrokes; it's the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.The above code is almost exactly how Affine is defined in Flux itself! There's no difference between \"library-level\" and \"user-level\" models, so making your code reusable doesn't involve a lot of extra complexity. Moreover, much more complex models than Affine are equally simple to define."
+},
+
+{
+    "location": "models/templates.html#Models-in-templates-1",
+    "page": "Model Templates",
+    "title": "Models in templates",
+    "category": "section",
+    "text": "@net models can contain sub-models as well as just array parameters:@net type TLP\n  first\n  second\n  function (x)\n    l1 = σ(first(x))\n    l2 = softmax(second(l1))\n  end\nendJust as above, this is roughly equivalent to writing:type TLP\n  first\n  second\nend\n\nfunction (self::TLP)(x)\n  l1 = σ(self.first(x))\n  l2 = softmax(self.second(l1))\nendClearly, the first and second parameters are not arrays here, but should be models themselves, and produce a result when called with an input array x. The Affine layer fits the bill, so we can instantiate TLP with two of them:model = TLP(Affine(10, 20),\n            Affine(20, 15))\nx1 = rand(20)\nmodel(x1) # [0.057852,0.0409741,0.0609625,0.0575354 ...You may recognise this as being equivalent toChain(\n  Affine(10, 20), σ\n  Affine(20, 15), softmax)given that it's just a sequence of calls. For simple networks Chain is completely fine, although the @net version is more powerful as we can (for example) reuse the output l1 more than once."
+},
+
+{
+    "location": "models/templates.html#Constructors-1",
+    "page": "Model Templates",
+    "title": "Constructors",
+    "category": "section",
+    "text": "Affine has two array parameters, W and b. Just like any other Julia type, it's easy to instantiate an Affine layer with parameters of our choosing:a = Affine(rand(10, 20), rand(20))However, for convenience and to avoid errors, we'd probably rather specify the input and output dimension instead:a = Affine(10, 20)This is easy to implement using the usual Julia syntax for constructors:Affine(in::Integer, out::Integer) =\n  Affine(randn(in, out), randn(1, out))In practice, these constructors tend to take the parameter initialisation function as an argument so that it's more easily customisable, and use Flux.initn by default (which is equivalent to randn(...)/100). So Affine's constructor really looks like this:Affine(in::Integer, out::Integer; init = initn) =\n  Affine(init(in, out), init(1, out))"
+},
+
+{
+    "location": "models/templates.html#Supported-syntax-1",
+    "page": "Model Templates",
+    "title": "Supported syntax",
+    "category": "section",
+    "text": "The syntax used to define a forward pass like x -> x*W + b behaves exactly like Julia code for the most part. However, it's important to remember that it's defining a dataflow graph, not a general Julia expression. In practice this means that anything side-effectful, or things like control flow and printlns, won't work as expected. In future we'll continue to expand support for Julia syntax and features."
+},
+
+{
+    "location": "models/recurrent.html#",
+    "page": "Recurrence",
+    "title": "Recurrence",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "models/recurrent.html#Recurrent-Models-1",
+    "page": "Recurrence",
+    "title": "Recurrent Models",
+    "category": "section",
+    "text": "Recurrence is a first-class feature in Flux and recurrent models are very easy to build and use. Recurrences are often illustrated as cycles or self-dependencies in the graph; they can also be thought of as a hidden output from / input to the network. For example, for a sequence of inputs x1, x2, x3 ... we produce predictions as follows:y1 = f(W, x1) # `f` is the model, `W` represents the parameters\ny2 = f(W, x2)\ny3 = f(W, x3)\n...Each evaluation is independent and the prediction made for a given input will always be the same. That makes a lot of sense for, say, MNIST images, but less sense when predicting a sequence. For that case we introduce the hidden state:y1, s = f(W, x1, s)\ny2, s = f(W, x2, s)\ny3, s = f(W, x3, s)\n...The state s allows the prediction to depend not only on the current input x but also on the history of past inputs.The simplest recurrent network looks as follows in Flux, and it should be familiar if you've seen the equations defining an RNN before:@net type Recurrent\n  Wxy; Wyy; by\n  y\n  function (x)\n    y = tanh( x * Wxy + y{-1} * Wyy + by )\n  end\nendThe only difference from a regular feed-forward layer is that we create a variable y which is defined as depending on itself. The y{-1} syntax means \"take the value of y from the previous run of the network\".Using recurrent layers is straightforward and no different feedforard ones in terms of the Chain macro etc. For example:model = Chain(\n    Affine(784, 20), σ\n    Recurrent(20, 30),\n    Recurrent(30, 15))Before using the model we need to unroll it. This happens with the unroll function:unroll(model, 20)This call creates an unrolled, feed-forward version of the model which accepts N (= 20) inputs and generates N predictions at a time. Essentially, the model is replicated N times and Flux ties the hidden outputs y to hidden inputs.Here's a more complex recurrent layer, an LSTM, and again it should be familiar if you've seen the equations:@net type LSTM\n  Wxf; Wyf; bf\n  Wxi; Wyi; bi\n  Wxo; Wyo; bo\n  Wxc; Wyc; bc\n  y; state\n  function (x)\n    # Gates\n    forget = σ( x * Wxf + y{-1} * Wyf + bf )\n    input  = σ( x * Wxi + y{-1} * Wyi + bi )\n    output = σ( x * Wxo + y{-1} * Wyo + bo )\n    # State update and output\n    state′ = tanh( x * Wxc + y{-1} * Wyc + bc )\n    state  = forget .* state{-1} + input .* state′\n    y = output .* tanh(state)\n  end\nendThe only unfamiliar part is that we have to define all of the parameters of the LSTM upfront, which adds a few lines at the beginning.Flux's very mathematical notation generalises well to handling more complex models. For example, this neural translation model with alignment can be fairly straightforwardly, and recognisably, translated from the paper into Flux code:# A recurrent model which takes a token and returns a context-dependent\n# annotation.\n\n@net type Encoder\n  forward\n  backward\n  token -> hcat(forward(token), backward(token))\nend\n\nEncoder(in::Integer, out::Integer) =\n  Encoder(LSTM(in, out÷2), flip(LSTM(in, out÷2)))\n\n# A recurrent model which takes a sequence of annotations, attends, and returns\n# a predicted output token.\n\n@net type Decoder\n  attend\n  recur\n  state; y; N\n  function (anns)\n    energies = map(ann -> exp(attend(hcat(state{-1}, ann))[1]), seq(anns, N))\n    weights = energies./sum(energies)\n    ctx = sum(map((α, ann) -> α .* ann, weights, anns))\n    (_, state), y = recur((state{-1},y{-1}), ctx)\n    y\n  end\nend\n\nDecoder(in::Integer, out::Integer; N = 1) =\n  Decoder(Affine(in+out, 1),\n          unroll1(LSTM(in, out)),\n          param(zeros(1, out)), param(zeros(1, out)), N)\n\n# The model\n\nNalpha  =  5 # The size of the input token vector\nNphrase =  7 # The length of (padded) phrases\nNhidden = 12 # The size of the hidden state\n\nencode = Encoder(Nalpha, Nhidden)\ndecode = Chain(Decoder(Nhidden, Nhidden, N = Nphrase), Affine(Nhidden, Nalpha), softmax)\n\nmodel = Chain(\n  unroll(encode, Nphrase, stateful = false),\n  unroll(decode, Nphrase, stateful = false, seq = false))Note that this model excercises some of the more advanced parts of the compiler and isn't stable for general use yet."
+},
+
+{
+    "location": "models/debugging.html#",
+    "page": "Debugging",
+    "title": "Debugging",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "models/debugging.html#Debugging-Models-1",
+    "page": "Debugging",
+    "title": "Debugging Models",
+    "category": "section",
+    "text": "Let's take our two-layer perceptron as an example again, running on MXNet:@net type TLP\n  first\n  second\n  function (x)\n    l1 = σ(first(x))\n    l2 = softmax(second(l1))\n  end\nend\n\nmodel = TLP(Affine(10, 20), Affine(21, 15))\n\nmxmodel = mxnet(model)\n\nmxmodel(rand(10))Unfortunately, this model has a (fairly obvious) typo, which means that the code above won't run. Instead we get an error message:Error in operator dot2: [21:28:21] src/operator/tensor/./matrix_op-inl.h:460:\nCheck failed: lshape[1] == rshape[0] (20 vs. 21) dot shape error: (1,20) X (21,15)\nFlux.Affine at affine.jl:8\nTLP at basic.jl:6\n(::Flux.MX.Model)(::Flux.Batch{Array{Float64,1},Array{Float64,2}}) at model.jl:105\n(::Flux.MX.Model)(::Array{Float64,1}) at model.jl:107Most frameworks would only give the error message here – not so helpful if you have thousands of nodes in your computational graph. However, Flux is able to give good error reports even when no Julia code has been run, e.g. when running on a backend like MXNet. This enables us to pinpoint the source of the error very quickly even in a large model.In this case, we can immediately see that the error occurred within an Affine layer. There are two such layers, but this one was called from the second line of TLP, so it must be the second Affine layer we defined. The layer expected an input of length 21 but got 20 instead.Of course, often a stack trace isn't enough to figure out the source of an error. Another option is to simply step through the execution of the model using Gallium. While handy, however, stepping isn't always the best way to get a \"bird's eye view\" of the code. For that, Flux provides a macro called @shapes:julia> @shapes model(rand(5,10))\n\n# /Users/mike/test.jl, line 18:\ngull = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/.julia/v0.6/Flux/src/layers/affine.jl, line 8:\nlobster = gull * _::(21,15) + _::(1,15)\n# /Users/mike/test.jl, line 19:\nraven = softmax(lobster)This is a lot like Julia's own code_warntype; but instead of annotating expressions with types, we display their shapes. As a lowered form it has some quirks; input arguments are represented by Input()[N] and parameters by an underscore.This makes the problem fairly obvious. We tried to multiply the output of the first layer (5, 20) by a parameter (21, 15); the inner dimensions should have been equal.Notice that while the first Affine layer is displayed as-is, the second was inlined and we see a reference to where the W * x + b line was defined in Flux's source code. In this way Flux makes it easy to drill down into problem areas, without showing you the full graph of thousands of nodes at once.With the typo fixed, the output of @shapes looks as follows:# /Users/mike/test.jl, line 18:\nopossum = σ(Affine(10, 20)(Input()[1]::(5,10))::(5,20))::(5,20)\n# /Users/mike/test.jl, line 19:\nwren = softmax(Affine(20, 15)(opossum)::(5,15))::(5,15)"
+},
+
+{
+    "location": "apis/batching.html#",
+    "page": "Batching",
+    "title": "Batching",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "apis/batching.html#Batching-1",
+    "page": "Batching",
+    "title": "Batching",
+    "category": "section",
+    "text": ""
+},
+
+{
+    "location": "apis/batching.html#Basics-1",
+    "page": "Batching",
+    "title": "Basics",
+    "category": "section",
+    "text": "Existing machine learning frameworks and libraries represent batching, and other properties of data, only implicitly. Your machine learning data is a large N-dimensional array, which may have a shape like:100 × 50 × 256 × 256Typically, this might represent that you have (say) a batch of 100 samples, where each sample is a 50-long sequence of 256×256 images. This is great for performance, but array operations often become much more cumbersome as a result. Especially if you manipulate dimensions at runtime as an optimisation, debugging models can become extremely fiddly, with a proliferation of X × Y × Z arrays and no information about where they came from.Flux introduces a new approach where the batch dimension is represented explicitly as part of the data. For example:julia> xs = Batch([[1,2,3], [4,5,6]])\n2-element Batch of Vector{Int64}:\n [1,2,3]\n [4,5,6]Batches are represented the way we think about them; as an list of data points. We can do all the usual array operations with them, including getting the first with xs[1], iterating over them and so on. The trick is that under the hood, the data is batched into a single array:julia> rawbatch(xs)\n2×3 Array{Int64,2}:\n 1  2  3\n 4  5  6When we put a Batch object into a model, the model is ultimately working with a single array, which means there's no performance overhead and we get the full benefit of standard batching.Turning a set of vectors into a matrix is fairly easy anyway, so what's the big deal? Well, it gets more interesting as we start working with more complex data. Say we were working with 4×4 images:julia> xs = Batch([[1 2; 3 4], [5 6; 7 8]])\n2-element Flux.Batch of Array{Int64,2}:\n [1 2; 3 4]\n [5 6; 7 8]The raw batch array is much messier, and harder to recognise:julia> rawbatch(xs)\n2×2×2 Array{Int64,3}:\n[:, :, 1] =\n 1  3\n 5  7\n\n[:, :, 2] =\n 2  4\n 6  8Furthermore, because the batches acts like a list of arrays, we can use simple and familiar operations on it:julia> map(flatten, xs)\n2-element Array{Array{Int64,1},1}:\n [1,3,2,4]\n [5,7,6,8]flatten is simple enough over a single data point, but flattening a batched data set is more complex and you end up needing arcane array operations like mapslices. A Batch can just handle this for you for free, and more importantly it ensures that your operations are correct – that you haven't mixed up your batch and data dimensions, or used the wrong array op, and so on."
+},
+
+{
+    "location": "apis/batching.html#Sequences-and-Nesting-1",
+    "page": "Batching",
+    "title": "Sequences and Nesting",
+    "category": "section",
+    "text": "As well as Batch, there's a structure called Seq which behaves very similarly. Let's say we have two one-hot encoded DNA sequences:julia> x1 = Seq([[0,1,0,0], [1,0,0,0], [0,0,0,1]]) # [A, T, C, G]\njulia> x2 = Seq([[0,0,1,0], [0,0,0,1], [0,0,1,0]])\n\njulia> rawbatch(x1)\n3×4 Array{Int64,2}:\n 0  1  0  0\n 1  0  0  0\n 0  0  0  1This is identical to Batch so far; but where it gets interesting is that you can actually nest these types:julia> xs = Batch([x1, x2])\n2-element Batch of Seq of Vector{Int64}:\n [[0,1,0,0],[1,0,0,0],[0,0,0,1]]\n [[0,0,1,0],[0,0,0,1],[0,0,1,0]]Again, this represents itself intuitively as a list-of-lists-of-lists, but rawbatch shows that the real underlying value is an Array{Int64,3} of shape 2×3×4."
+},
+
+{
+    "location": "apis/batching.html#Future-Work-1",
+    "page": "Batching",
+    "title": "Future Work",
+    "category": "section",
+    "text": "The design of batching is still a fairly early work in progress, though it's used in a few places in the system. For example, all Flux models expect to be given Batch objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it's convenient to call a model with a single data point like f([1,2,3]).Right now, the Batch or Seq types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:Code can be written in a batch-agnostic way or be generic across batching strategies.\nBatching and optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.\nThis also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations.Here's a more detailed illustration of how it might look for code to be \"generic across batching\". Take for example a weight matrix W times a vector x, as used in a logistic regression or a simple neural network:   W    *   x  =>   y\n(10×28) * (28) => (10)If we want to work with a batch of 50 xs, one option is to stack the data into a matrix of size 28 × 50.   W    *    x    =>    y\n(10×28) * (28×50) => (10×50)This works, but we may find that it's slow or doesn't fit well with the rest of the model, which batches on the first dimension. For that reason we may instead want to put the data in a 50 × 28 matrix and alter the code as follows:   x    *    W'   =>    y\n(50×28) * (28×10) => (50×10)to make the shapes work out. This code change is not ideal; in more complex cases it can become fiddly and error-prone, and it means that the code is less reusable, tied to a particular implementation strategy.There's an alternative. We keep the same code, but represent the batched xs as either a Batch{Vector,1} or a Batch{Vector,2}, depending on how the data is stacked. Then we can simply overload * as follows:*(W::Matrix, x::Batch{Vector,1}) = x * W'\n*(W::Matrix, x::Batch{Vector,2}) = W * xThis means that we can always write W*x, and the code is reusable in a larger network regardless of the overall batching approach. Moreover, Julia's type system ensures there's no runtime cost to doing this, and we can compile the code appropriately for backends like TensorFlow as well."
+},
+
+{
+    "location": "apis/backends.html#",
+    "page": "Backends",
+    "title": "Backends",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "apis/backends.html#Backends-1",
+    "page": "Backends",
+    "title": "Backends",
+    "category": "section",
+    "text": ""
+},
+
+{
+    "location": "apis/backends.html#Basic-Usage-1",
+    "page": "Backends",
+    "title": "Basic Usage",
+    "category": "section",
+    "text": "model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)\nxs = rand(10)Currently, Flux's pure-Julia backend has no optimisations. This means that callingmodel(rand(10)) #> [0.0650, 0.0655, ...]directly won't have great performance. In order to run a computationally intensive training process, we rely on a backend like MXNet or TensorFlow.This is easy to do. Just call either mxnet or tf on a model to convert it to a model of that kind:mxmodel = mxnet(model)\nmxmodel(xs) #> [0.0650, 0.0655, ...]\n# or\ntfmodel = tf(model)\ntfmodel(xs) #> [0.0650, 0.0655, ...]These new models look and feel exactly like every other model in Flux, including returning the same result when you call them, and can be trained as usual using Flux.train!(). The difference is that the computation is being carried out by a backend, which will usually give a large speedup."
+},
+
+{
+    "location": "apis/backends.html#Native-Integration-1",
+    "page": "Backends",
+    "title": "Native Integration",
+    "category": "section",
+    "text": "Flux aims to provide high-level APIs that work well across backends, but in some cases you may want to take advantage of features specific to a given backend. In these cases it's easy to \"drop down\" and use the backend's API directly, where appropriate. For example:using MXNet\nFlux.loadmx()\n\nmxmodel = mx.FeedForward(model)This returns a standard mx.FeedForward instance, just like you might have created using MXNet's usual API. You can then use this with MXNet's data provider implementation, custom optimisers, or distributed training processes.Same goes for TensorFlow, where it's easy to create a Tensor object:using TensorFlow\nFlux.loadtf()\n\nx  = placeholder(Float32)\ny = Tensor(model, x)This makes makes it easy to take advantage of Flux's model description and debugging tools while also getting the benefit of the work put into these backends. You can check out how this looks with the integration examples here."
+},
+
+{
+    "location": "apis/storage.html#",
+    "page": "Storing Models",
+    "title": "Storing Models",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "apis/storage.html#Loading-and-Saving-Models-1",
+    "page": "Storing Models",
+    "title": "Loading and Saving Models",
+    "category": "section",
+    "text": "model = Chain(Affine(10, 20), σ, Affine(20, 15), softmax)Since models are just simple Julia data structures, it's very easy to save and load them using any of Julia's existing serialisation formats. For example, using Julia's built-in serialize:open(io -> serialize(io, model), \"model.jls\", \"w\")\nopen(io -> deserialize(io), \"model.jls\")One issue with serialize is that it doesn't promise compatibility between major Julia versions. For longer-term storage it's good to use a package like JLD.using JLD\n@save \"model.jld\" model\n@load \"model.jld\"However, JLD will break for some models as functions are not supported on 0.5+. You can resolve that by checking out this branch.Right now this is the only storage format Flux supports. In future Flux will support loading and saving other model formats (on an as-needed basis)."
+},
+
+{
+    "location": "examples/logreg.html#",
+    "page": "Logistic Regression",
+    "title": "Logistic Regression",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "examples/logreg.html#Logistic-Regression-with-MNIST-1",
+    "page": "Logistic Regression",
+    "title": "Logistic Regression with MNIST",
+    "category": "section",
+    "text": "This walkthrough example will take you through writing a multi-layer perceptron that classifies MNIST digits with high accuracy.First, we load the data using the MNIST package:using Flux, MNIST\n\ndata = [(trainfeatures(i), onehot(trainlabel(i), 0:9)) for i = 1:60_000]\ntrain = data[1:50_000]\ntest = data[50_001:60_000]The only Flux-specific function here is onehot, which takes a class label and turns it into a one-hot-encoded vector that we can use for training. For example:julia> onehot(:b, [:a, :b, :c])\n3-element Array{Int64,1}:\n 0\n 1\n 0Otherwise, the format of the data is simple enough, it's just a list of tuples from input to output. For example:julia> data[1]\n([0.0,0.0,0.0, … 0.0,0.0,0.0],[0,0,0,0,0,1,0,0,0,0])data[1][1] is a 28*28 == 784 length vector (mostly zeros due to the black background) and data[1][2] is its classification.Now we define our model, which will simply be a function from one to the other.m = Chain(\n  Input(784),\n  Affine(128), relu,\n  Affine( 64), relu,\n  Affine( 10), softmax)\n\nmodel = tf(m)We can try this out on our data already:julia> model(data[1][1])\n10-element Array{Float64,1}:\n 0.10614  \n 0.0850447\n 0.101474\n ...The model gives a probability of about 0.1 to each class – which is a way of saying, \"I have no idea\". This isn't too surprising as we haven't shown it any data yet. This is easy to fix:Flux.train!(model, train, test, η = 1e-4)The training step takes about 5 minutes (to make it faster we can do smarter things like batching). If you run this code in Juno, you'll see a progress meter, which you can hover over to see the remaining computation time.Towards the end of the training process, Flux will have reported that the accuracy of the model is now about 90%. We can try it on our data again:10-element Array{Float32,1}:\n ...\n 5.11423f-7\n 0.9354     \n 3.1033f-5  \n 0.000127077\n ...Notice the class at 93%, suggesting our model is very confident about this image. We can use onecold to compare the true and predicted classes:julia> onecold(data[1][2], 0:9)\n5\n\njulia> onecold(model(data[1][1]), 0:9)\n5Success!"
+},
+
+{
+    "location": "examples/char-rnn.html#",
+    "page": "Char RNN",
+    "title": "Char RNN",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "examples/char-rnn.html#Char-RNN-1",
+    "page": "Char RNN",
+    "title": "Char RNN",
+    "category": "section",
+    "text": "This walkthrough will take you through a model like that used in Karpathy's 2015 blog post, which can learn to generate text in the style of Shakespeare (or whatever else you may use as input). shakespeare_input.txt is here.using Flux\nimport StatsBase: wsampleFirstly, we define up front how many steps we want to unroll the RNN, and the number of data points to batch together. Then we create some functions to prepare our data, using Flux's built-in utilities.nunroll = 50\nnbatch = 50\n\ngetseqs(chars, alphabet) = sequences((onehot(Float32, char, alphabet) for char in chars), nunroll)\ngetbatches(chars, alphabet) = batches((getseqs(part, alphabet) for part in chunk(chars, nbatch))...)Because we want the RNN to predict the next letter at each iteration, our target data is simply our input data offset by one. For example, if the input is \"The quick brown fox\", the target will be \"he quick brown fox \". Each letter is one-hot encoded and sequences are batched together to create the training data.input = readstring(\"shakespeare_input.txt\")\nalphabet = unique(input)\nN = length(alphabet)\n\nXs, Ys = getbatches(input, alphabet), getbatches(input[2:end], alphabet)Creating the model and training it is straightforward:model = Chain(\n  Input(N),\n  LSTM(N, 256),\n  LSTM(256, 256),\n  Affine(256, N),\n  softmax)\n\nm = tf(unroll(model, nunroll))\n\n@time Flux.train!(m, Xs, Ys, η = 0.1, epoch = 1)Finally, we can sample the model. For sampling we remove the softmax from the end of the chain so that we can \"sharpen\" the resulting probabilities.function sample(model, n, temp = 1)\n  s = [rand(alphabet)]\n  m = tf(unroll(model, 1))\n  for i = 1:n\n    push!(s, wsample(alphabet, softmax(m(Seq((onehot(Float32, s[end], alphabet),)))[1]./temp)))\n  end\n  return string(s...)\nend\n\nsample(model[1:end-1], 100)sample then produces a string of Shakespeare-like text. This won't produce great results after only a single epoch (though they will be recognisably different from the untrained model). Going for 30 epochs or so produces good results.Trained on a dataset from base Julia, the network can produce code like:function show(io::IO, md::Githompty)\n    Buffer(jowerTriangular(inals[i], initabs_indices), characters, side, nextfloat(typeof(x)))\n    isnull(r) && return\n    start::I!\n    for j = 1:length(b,1)\n        a = s->cosvect(code)\n        return\n    end\n    indsERenv | maximum(func,lsg))\n    for i = 1:last(Abjelar) && fname (=== nothing)\n        throw(ArgumentError(\"read is declave non-fast-a/remaining of not descride method names\"))\n    end\n    if e.ht === Int\n        # update file to a stroducative, but is decould.\n        # xna i -GB =# [unsafe_color <c *has may num 20<11E 16/s\n        tuple | Expr(:(UnitLowerTriangular(transpose,(repl.ptr)))\n        dims = pipe_read(s,Int(a)...)\n    ex,0 + y.uilid_func & find_finwprevend(msg,:2)\n    ex = stage(c)\n    # uvvalue begin\n    end\nend"
+},
+
+{
+    "location": "contributing.html#",
+    "page": "Contributing & Help",
+    "title": "Contributing & Help",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "contributing.html#Contributing-1",
+    "page": "Contributing & Help",
+    "title": "Contributing",
+    "category": "section",
+    "text": "If you need help, please ask on the Julia forum or on Flux's Gitter.Right now, the best way to help out is to try out the examples and report any issues or missing features as you find them. The second best way is to help us spread the word, perhaps by starring the repo.If you're interested in hacking on Flux, most of the code is pretty straightforward. Adding new layer definitions or cost functions is simple using the Flux DSL itself, and things like data utilities and training processes are all plain Julia code. The compiler directory is a bit more involved and is documented in internals, but most changes won't need to touch that.If you get stuck or need anything, let us know!"
+},
+
+{
+    "location": "internals.html#",
+    "page": "Internals",
+    "title": "Internals",
+    "category": "page",
+    "text": ""
+},
+
+{
+    "location": "internals.html#Internals-1",
+    "page": "Internals",
+    "title": "Internals",
+    "category": "section",
+    "text": "[WIP]"
+},
+
+]}
--- a/versions.js
+++ b/versions.js
@ -2,5 +2,6 @@ var DOC_VERSIONS = [
  "stable",
  "latest",
  "release-0.1",
+  "v0.1.1",
  "v0.1.0",
 ];