diff --git a/latest/contributing.html b/latest/contributing.html
index 98840478..7f80ade0 100644
--- a/latest/contributing.html
+++ b/latest/contributing.html
@@ -104,7 +104,7 @@ Contributing &amp; Help
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/contributing.md">
             <span class="fa">
 
             </span>
diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html
index 4b36d4f4..20f162b1 100644
--- a/latest/examples/logreg.html
+++ b/latest/examples/logreg.html
@@ -107,7 +107,7 @@ Logistic Regression
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/examples/logreg.md">
             <span class="fa">
 
             </span>
diff --git a/latest/index.html b/latest/index.html
index 32f2a9ab..79dd75c3 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -46,7 +46,13 @@ Version
           <a class="toctext" href="index.html">
 Home
           </a>
-          <ul class="internal"></ul>
+          <ul class="internal">
+            <li>
+              <a class="toctext" href="#Installation-1">
+Installation
+              </a>
+            </li>
+          </ul>
         </li>
         <li>
           <span class="toctext">
@@ -104,7 +110,7 @@ Home
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/index.md">
             <span class="fa">
 
             </span>
@@ -133,6 +139,30 @@ If you&#39;re interested to find out what
 does
         </em>
  work, read on!
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Installation-1" href="#Installation-1">
+Installation
+        </a>
+      </h2>
+      <p>
+        <em>
+... Charging Ion Capacitors ...
+        </em>
+      </p>
+<pre><code class="language-julia">Pkg.clone(&quot;https://github.com/MikeInnes/DataFlow.jl&quot;)
+Pkg.clone(&quot;https://github.com/MikeInnes/Flux.jl&quot;)
+using Flux</code></pre>
+      <p>
+You&#39;ll also need a backend to run real training, if you don&#39;t have one already. Choose from 
+        <a href="https://github.com/dmlc/MXNet.jl">
+MXNet
+        </a>
+ or 
+        <a href="https://github.com/malmaud/TensorFlow.jl">
+TensorFlow
+        </a>
+ (MXNet is the recommended option if you&#39;re not sure):
       </p>
       <footer>
         <hr/>
diff --git a/latest/internals.html b/latest/internals.html
index 82a9c303..9ddad405 100644
--- a/latest/internals.html
+++ b/latest/internals.html
@@ -104,7 +104,7 @@ Internals
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/internals.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/basics.html b/latest/models/basics.html
index 6dfc673e..6c73e805 100644
--- a/latest/models/basics.html
+++ b/latest/models/basics.html
@@ -57,11 +57,6 @@ Building Models
 First Steps
               </a>
               <ul class="internal">
-                <li>
-                  <a class="toctext" href="#Installation-1">
-Installation
-                  </a>
-                </li>
                 <li>
                   <a class="toctext" href="#The-Model-1">
 The Model
@@ -133,7 +128,7 @@ First Steps
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/models/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/models/basics.md">
             <span class="fa">
 
             </span>
@@ -143,23 +138,11 @@ First Steps
         <hr/>
       </header>
       <h1>
-        <a class="nav-anchor" id="First-Steps-1" href="#First-Steps-1">
-First Steps
+        <a class="nav-anchor" id="Model-Building-Basics-1" href="#Model-Building-Basics-1">
+Model Building Basics
         </a>
       </h1>
-      <h2>
-        <a class="nav-anchor" id="Installation-1" href="#Installation-1">
-Installation
-        </a>
-      </h2>
-      <p>
-        <em>
-... Charging Ion Capacitors ...
-        </em>
-      </p>
-<pre><code class="language-julia">Pkg.clone(&quot;https://github.com/MikeInnes/DataFlow.jl&quot;)
-Pkg.clone(&quot;https://github.com/MikeInnes/Flux.jl&quot;)
-using Flux</code></pre>
+<pre><code class="language-none">Pkg.add(&quot;MXNet&quot;) # or &quot;TensorFlow&quot;</code></pre>
       <h2>
         <a class="nav-anchor" id="The-Model-1" href="#The-Model-1">
 The Model
@@ -179,7 +162,7 @@ model
       </p>
 <pre><code class="language-julia">W = randn(3,5)
 b = randn(3)
-affine(x) = W*x + b
+affine(x) = W * x + b
 
 x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
 y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]</code></pre>
@@ -232,7 +215,7 @@ The Template
         </a>
 ), but Flux provides 
 <code>Affine</code>
- out of the box.
+ out of the box, so we&#39;ll use that for now.
       </p>
       <h2>
         <a class="nav-anchor" id="Combining-Models-1" href="#Combining-Models-1">
@@ -283,17 +266,14 @@ A Function in Model&#39;s Clothing
         </em>
       </p>
       <p>
-We noted above that a &quot;model&quot; is just a function with some trainable parameters. This goes both ways; a normal Julia function like 
+We noted above that a &quot;model&quot; is a function with some number of trainable parameters. This goes both ways; a normal Julia function like 
 <code>exp</code>
- is really just a model with 0 parameters. Flux doesn&#39;t care, and anywhere that you use one, you can use the other. For example, 
+ is effectively a model with 0 parameters. Flux doesn&#39;t care, and anywhere that you use one, you can use the other. For example, 
 <code>Chain</code>
  will happily work with regular functions:
       </p>
 <pre><code class="language-julia">foo = Chain(exp, sum, log)
 foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))</code></pre>
-      <p>
-This unification opens up the floor for some powerful features, which we&#39;ll discuss later in the guide.
-      </p>
       <h2>
         <a class="nav-anchor" id="The-Template-1" href="#The-Template-1">
 The Template
@@ -305,7 +285,63 @@ The Template
         </em>
       </p>
       <p>
-[WIP]
+So how does the 
+<code>Affine</code>
+ template work? We don&#39;t want to duplicate the code above whenever we need more than one affine layer:
+      </p>
+<pre><code class="language-julia">W₁, b₁ = randn(...)
+affine₁(x) = W₁*x + b₁
+W₂, b₂ = randn(...)
+affine₂(x) = W₂*x + b₂
+model = Chain(affine₁, affine₂)</code></pre>
+      <p>
+Here&#39;s one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:
+      </p>
+<pre><code class="language-julia">type MyAffine
+  W
+  b
+end
+
+# Use the `MyAffine` layer as a model
+(l::MyAffine)(x) = l.W * x + l.b
+
+# Convenience constructor
+MyAffine(in::Integer, out::Integer) =
+  MyAffine(randn(out, in), randn(out))
+
+model = Chain(MyAffine(5, 5), MyAffine(5, 5))
+
+model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]</code></pre>
+      <p>
+This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the 
+<code>@net</code>
+ macro:
+      </p>
+<pre><code class="language-julia">@net type MyAffine
+  W
+  b
+  x -&gt; W * x + b
+end</code></pre>
+      <p>
+The function provided, 
+<code>x -&gt; W * x + b</code>
+, will be used when 
+<code>MyAffine</code>
+ is used as a model; it&#39;s just a shorter way of defining the 
+<code>(::MyAffine)(x)</code>
+ method above.
+      </p>
+      <p>
+However, 
+<code>@net</code>
+ does not simply save us some keystrokes; it&#39;s the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.
+      </p>
+      <p>
+The above code is almost exactly how 
+<code>Affine</code>
+ is defined in Flux itself! There&#39;s no difference between &quot;library-level&quot; and &quot;user-level&quot; models, so making your code reusable doesn&#39;t involve a lot of extra complexity. Moreover, much more complex models than 
+<code>Affine</code>
+ are equally simple to define, and equally close to the mathematical notation; read on to find out how.
       </p>
       <footer>
         <hr/>
diff --git a/latest/models/debugging.html b/latest/models/debugging.html
index 47d8be55..e9a64799 100644
--- a/latest/models/debugging.html
+++ b/latest/models/debugging.html
@@ -107,7 +107,7 @@ Debugging
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/models/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/models/debugging.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/recurrent.html b/latest/models/recurrent.html
index 123b126f..f95154aa 100644
--- a/latest/models/recurrent.html
+++ b/latest/models/recurrent.html
@@ -107,7 +107,7 @@ Recurrence
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/4c8922270f2846e9af14dddffd2379add874f9c2/docs/src/models/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/3fdf01e62fcf94f75c58b1ffe1f9aa576d74fced/docs/src/models/recurrent.md">
             <span class="fa">
 
             </span>
diff --git a/latest/search_index.js b/latest/search_index.js
index 418705b5..b9b916f9 100644
--- a/latest/search_index.js
+++ b/latest/search_index.js
@@ -16,6 +16,14 @@ var documenterSearchIndex = {"docs": [
     "text": "Flux is a high-level interface for machine learning, implemented in Julia.Flux aims to be an intuitive and powerful notation, close to the mathematics, that provides advanced features like auto-unrolling and closures. Simple models are trivial, while the most complex architectures are tractable, taking orders of magnitude less code than in other frameworks. Meanwhile, the Flux compiler provides excellent error messages and tools for debugging when things go wrong.So what's the catch? Flux is at an early \"working prototype\" stage; many things work but the API is still in a state of... well, it might change. Also, this documentation is pretty incomplete.If you're interested to find out what does work, read on!"
 },
 
+{
+    "location": "index.html#Installation-1",
+    "page": "Home",
+    "title": "Installation",
+    "category": "section",
+    "text": "... Charging Ion Capacitors ...Pkg.clone(\"https://github.com/MikeInnes/DataFlow.jl\")\nPkg.clone(\"https://github.com/MikeInnes/Flux.jl\")\nusing FluxYou'll also need a backend to run real training, if you don't have one already. Choose from MXNet or TensorFlow (MXNet is the recommended option if you're not sure):"
+},
+
 {
     "location": "models/basics.html#",
     "page": "First Steps",
@@ -25,19 +33,11 @@ var documenterSearchIndex = {"docs": [
 },
 
 {
-    "location": "models/basics.html#First-Steps-1",
+    "location": "models/basics.html#Model-Building-Basics-1",
     "page": "First Steps",
-    "title": "First Steps",
+    "title": "Model Building Basics",
     "category": "section",
-    "text": ""
-},
-
-{
-    "location": "models/basics.html#Installation-1",
-    "page": "First Steps",
-    "title": "Installation",
-    "category": "section",
-    "text": "... Charging Ion Capacitors ...Pkg.clone(\"https://github.com/MikeInnes/DataFlow.jl\")\nPkg.clone(\"https://github.com/MikeInnes/Flux.jl\")\nusing Flux"
+    "text": "Pkg.add(\"MXNet\") # or \"TensorFlow\""
 },
 
 {
@@ -45,7 +45,7 @@ var documenterSearchIndex = {"docs": [
     "page": "First Steps",
     "title": "The Model",
     "category": "section",
-    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W*x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see The Template), but Flux provides Affine out of the box."
+    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see The Template), but Flux provides Affine out of the box, so we'll use that for now."
 },
 
 {
@@ -61,7 +61,7 @@ var documenterSearchIndex = {"docs": [
     "page": "First Steps",
     "title": "A Function in Model's Clothing",
     "category": "section",
-    "text": "... Booting Dark Matter Transmogrifiers ...We noted above that a \"model\" is just a function with some trainable parameters. This goes both ways; a normal Julia function like exp is really just a model with 0 parameters. Flux doesn't care, and anywhere that you use one, you can use the other. For example, Chain will happily work with regular functions:foo = Chain(exp, sum, log)\nfoo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))This unification opens up the floor for some powerful features, which we'll discuss later in the guide."
+    "text": "... Booting Dark Matter Transmogrifiers ...We noted above that a \"model\" is a function with some number of trainable parameters. This goes both ways; a normal Julia function like exp is effectively a model with 0 parameters. Flux doesn't care, and anywhere that you use one, you can use the other. For example, Chain will happily work with regular functions:foo = Chain(exp, sum, log)\nfoo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))"
 },
 
 {
@@ -69,7 +69,7 @@ var documenterSearchIndex = {"docs": [
     "page": "First Steps",
     "title": "The Template",
     "category": "section",
-    "text": "... Calculating Tax Expenses ...[WIP]"
+    "text": "... Calculating Tax Expenses ...So how does the Affine template work? We don't want to duplicate the code above whenever we need more than one affine layer:W₁, b₁ = randn(...)\naffine₁(x) = W₁*x + b₁\nW₂, b₂ = randn(...)\naffine₂(x) = W₂*x + b₂\nmodel = Chain(affine₁, affine₂)Here's one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:type MyAffine\n  W\n  b\nend\n\n# Use the `MyAffine` layer as a model\n(l::MyAffine)(x) = l.W * x + l.b\n\n# Convenience constructor\nMyAffine(in::Integer, out::Integer) =\n  MyAffine(randn(out, in), randn(out))\n\nmodel = Chain(MyAffine(5, 5), MyAffine(5, 5))\n\nmodel(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the @net macro:@net type MyAffine\n  W\n  b\n  x -> W * x + b\nendThe function provided, x -> W * x + b, will be used when MyAffine is used as a model; it's just a shorter way of defining the (::MyAffine)(x) method above.However, @net does not simply save us some keystrokes; it's the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.The above code is almost exactly how Affine is defined in Flux itself! There's no difference between \"library-level\" and \"user-level\" models, so making your code reusable doesn't involve a lot of extra complexity. Moreover, much more complex models than Affine are equally simple to define, and equally close to the mathematical notation; read on to find out how."
 },
 
 {