diff --git a/latest/contributing.html b/latest/contributing.html
index 74bddff2..9a9454ff 100644
--- a/latest/contributing.html
+++ b/latest/contributing.html
@@ -97,7 +97,7 @@ Contributing &amp; Help
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/contributing.md">
             <span class="fa">
 
             </span>
diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html
index 9424772f..7c12f1ab 100644
--- a/latest/examples/logreg.html
+++ b/latest/examples/logreg.html
@@ -100,7 +100,7 @@ Logistic Regression
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/examples/logreg.md">
             <span class="fa">
 
             </span>
diff --git a/latest/index.html b/latest/index.html
index 786be3fa..904c655a 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -97,7 +97,7 @@ Home
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/index.md">
             <span class="fa">
 
             </span>
diff --git a/latest/internals.html b/latest/internals.html
index dc94a649..c73eeae6 100644
--- a/latest/internals.html
+++ b/latest/internals.html
@@ -97,7 +97,7 @@ Internals
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/internals.md">
             <span class="fa">
 
             </span>
diff --git a/latest/manual/basics.html b/latest/manual/basics.html
index da09dc22..bc25f778 100644
--- a/latest/manual/basics.html
+++ b/latest/manual/basics.html
@@ -63,8 +63,18 @@ The Model
               </a>
             </li>
             <li>
-              <a class="toctext" href="#An-MNIST-Example-1">
-An MNIST Example
+              <a class="toctext" href="#Combining-Models-1">
+Combining Models
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="#A-Function-in-Model's-Clothing-1">
+A Function in Model&#39;s Clothing
+              </a>
+            </li>
+            <li>
+              <a class="toctext" href="#The-Template-1">
+The Template
               </a>
             </li>
           </ul>
@@ -113,7 +123,7 @@ First Steps
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/manual/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/manual/basics.md">
             <span class="fa">
 
             </span>
@@ -123,8 +133,8 @@ First Steps
         <hr/>
       </header>
       <h1>
-        <a class="nav-anchor" id="Basic-Usage-1" href="#Basic-Usage-1">
-Basic Usage
+        <a class="nav-anchor" id="First-Steps-1" href="#First-Steps-1">
+First Steps
         </a>
       </h1>
       <h2>
@@ -132,8 +142,14 @@ Basic Usage
 Installation
         </a>
       </h2>
+      <p>
+        <em>
+... Charging Ion Capacitors ...
+        </em>
+      </p>
 <pre><code class="language-julia">Pkg.clone(&quot;https://github.com/MikeInnes/DataFlow.jl&quot;)
-Pkg.clone(&quot;https://github.com/MikeInnes/Flux.jl&quot;)</code></pre>
+Pkg.clone(&quot;https://github.com/MikeInnes/Flux.jl&quot;)
+using Flux</code></pre>
       <h2>
         <a class="nav-anchor" id="The-Model-1" href="#The-Model-1">
 The Model
@@ -141,31 +157,146 @@ The Model
       </h2>
       <p>
         <em>
-Charging Ion Capacitors...
+... Initialising Photon Beams ...
         </em>
       </p>
       <p>
-The core concept in Flux is that of the 
+The core concept in Flux is the 
         <em>
 model
         </em>
-. A model is simply a function with parameters. In Julia, we might define the following function:
+. A model (or &quot;layer&quot;) is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):
       </p>
 <pre><code class="language-julia">W = randn(3,5)
 b = randn(3)
 affine(x) = W*x + b
 
-x1 = randn(5)
-affine(x1)
-&gt; 3-element Array{Float64,1}:
-   -0.0215644
-   -4.07343  
-    0.312591</code></pre>
+x1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]
+y1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]</code></pre>
+      <p>
+<code>affine</code>
+ is simply a function which takes some vector 
+<code>x1</code>
+ and outputs a new one 
+<code>y1</code>
+. For example, 
+<code>x1</code>
+ could be data from an image and 
+<code>y1</code>
+ could be predictions about the content of that image. However, 
+<code>affine</code>
+ isn&#39;t static. It has 
+        <em>
+parameters
+        </em>
+ 
+<code>W</code>
+ and 
+<code>b</code>
+, and if we tweak those parameters we&#39;ll tweak the result – hopefully to make the predictions more accurate.
+      </p>
+      <p>
+This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a 
+        <em>
+template
+        </em>
+ which creates these functions for us:
+      </p>
+<pre><code class="language-julia">affine1 = Affine(5, 5)
+affine2 = Affine(5, 5)
+
+softmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]
+softmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]</code></pre>
+      <p>
+We just created two separate 
+<code>Affine</code>
+ layers, and each contains its own version of 
+<code>W</code>
+ and 
+<code>b</code>
+, leading to a different result when called with our data. It&#39;s easy to define templates like 
+<code>Affine</code>
+ ourselves (see 
+        <a href="basics.html#The-Template-1">
+The Template
+        </a>
+), but Flux provides 
+<code>Affine</code>
+ out of the box.
+      </p>
       <h2>
-        <a class="nav-anchor" id="An-MNIST-Example-1" href="#An-MNIST-Example-1">
-An MNIST Example
+        <a class="nav-anchor" id="Combining-Models-1" href="#Combining-Models-1">
+Combining Models
         </a>
       </h2>
+      <p>
+        <em>
+... Inflating Graviton Zeppelins ...
+        </em>
+      </p>
+      <p>
+A more complex model usually involves many basic layers like 
+<code>affine</code>
+, where we use the output of one layer as the input to the next:
+      </p>
+<pre><code class="language-julia">mymodel1(x) = softmax(affine2(σ(affine1(x))))
+mymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
+      <p>
+This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:
+      </p>
+<pre><code class="language-julia">mymodel2 = Chain(affine1, σ, affine2, softmax)
+mymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]</code></pre>
+      <p>
+<code>mymodel2</code>
+ is exactly equivalent to 
+<code>mymodel1</code>
+ because it simply calls the provided functions in sequence. We don&#39;t have to predefine the affine layers and can also write this as:
+      </p>
+<pre><code class="language-julia">mymodel3 = Chain(
+  Affine(5, 5), σ,
+  Affine(5, 5), softmax)</code></pre>
+      <p>
+You now know understand enough to take a look at the 
+        <a href="../examples/logreg.html">
+logistic regression
+        </a>
+ example, if you haven&#39;t already.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="A-Function-in-Model's-Clothing-1" href="#A-Function-in-Model's-Clothing-1">
+A Function in Model&#39;s Clothing
+        </a>
+      </h2>
+      <p>
+        <em>
+... Booting Dark Matter Transmogrifiers ...
+        </em>
+      </p>
+      <p>
+We noted above that a &quot;model&quot; is just a function with some trainable parameters. This goes both ways; a normal Julia function like 
+<code>exp</code>
+ is really just a model with 0 parameters. Flux doesn&#39;t care, and anywhere that you use one, you can use the other. For example, 
+<code>Chain</code>
+ will happily work with regular functions:
+      </p>
+<pre><code class="language-julia">foo = Chain(exp, sum, log)
+foo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))</code></pre>
+      <p>
+This unification opens up the floor for some powerful features, which we&#39;ll discuss later in the guide.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="The-Template-1" href="#The-Template-1">
+The Template
+        </a>
+      </h2>
+      <p>
+        <em>
+... Calculating Tax Expenses ...
+        </em>
+      </p>
+      <p>
+[WIP]
+      </p>
       <footer>
         <hr/>
         <a class="previous" href="../index.html">
diff --git a/latest/manual/debugging.html b/latest/manual/debugging.html
index 143ea473..762d0acb 100644
--- a/latest/manual/debugging.html
+++ b/latest/manual/debugging.html
@@ -97,7 +97,7 @@ Debugging
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/manual/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/manual/debugging.md">
             <span class="fa">
 
             </span>
diff --git a/latest/manual/recurrent.html b/latest/manual/recurrent.html
index 3f1e46cb..98fc8679 100644
--- a/latest/manual/recurrent.html
+++ b/latest/manual/recurrent.html
@@ -97,7 +97,7 @@ Recurrence
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/a146c478f3f39fd990ce94ce4a55ac0974a9f5b1/docs/src/manual/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/2b25491e40f2d595f063497a32f52227bca84f12/docs/src/manual/recurrent.md">
             <span class="fa">
 
             </span>
diff --git a/latest/search_index.js b/latest/search_index.js
index dcaf15d4..c3a38704 100644
--- a/latest/search_index.js
+++ b/latest/search_index.js
@@ -25,9 +25,9 @@ var documenterSearchIndex = {"docs": [
 },
 
 {
-    "location": "manual/basics.html#Basic-Usage-1",
+    "location": "manual/basics.html#First-Steps-1",
     "page": "First Steps",
-    "title": "Basic Usage",
+    "title": "First Steps",
     "category": "section",
     "text": ""
 },
@@ -37,7 +37,7 @@ var documenterSearchIndex = {"docs": [
     "page": "First Steps",
     "title": "Installation",
     "category": "section",
-    "text": "Pkg.clone(\"https://github.com/MikeInnes/DataFlow.jl\")\nPkg.clone(\"https://github.com/MikeInnes/Flux.jl\")"
+    "text": "... Charging Ion Capacitors ...Pkg.clone(\"https://github.com/MikeInnes/DataFlow.jl\")\nPkg.clone(\"https://github.com/MikeInnes/Flux.jl\")\nusing Flux"
 },
 
 {
@@ -45,15 +45,31 @@ var documenterSearchIndex = {"docs": [
     "page": "First Steps",
     "title": "The Model",
     "category": "section",
-    "text": "Charging Ion Capacitors...The core concept in Flux is that of the model. A model is simply a function with parameters. In Julia, we might define the following function:W = randn(3,5)\nb = randn(3)\naffine(x) = W*x + b\n\nx1 = randn(5)\naffine(x1)\n> 3-element Array{Float64,1}:\n   -0.0215644\n   -4.07343  \n    0.312591"
+    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W*x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see The Template), but Flux provides Affine out of the box."
 },
 
 {
-    "location": "manual/basics.html#An-MNIST-Example-1",
+    "location": "manual/basics.html#Combining-Models-1",
     "page": "First Steps",
-    "title": "An MNIST Example",
+    "title": "Combining Models",
     "category": "section",
-    "text": ""
+    "text": "... Inflating Graviton Zeppelins ...A more complex model usually involves many basic layers like affine, where we use the output of one layer as the input to the next:mymodel1(x) = softmax(affine2(σ(affine1(x))))\nmymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:mymodel2 = Chain(affine1, σ, affine2, softmax)\nmymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]mymodel2 is exactly equivalent to mymodel1 because it simply calls the provided functions in sequence. We don't have to predefine the affine layers and can also write this as:mymodel3 = Chain(\n  Affine(5, 5), σ,\n  Affine(5, 5), softmax)You now know understand enough to take a look at the logistic regression example, if you haven't already."
+},
+
+{
+    "location": "manual/basics.html#A-Function-in-Model's-Clothing-1",
+    "page": "First Steps",
+    "title": "A Function in Model's Clothing",
+    "category": "section",
+    "text": "... Booting Dark Matter Transmogrifiers ...We noted above that a \"model\" is just a function with some trainable parameters. This goes both ways; a normal Julia function like exp is really just a model with 0 parameters. Flux doesn't care, and anywhere that you use one, you can use the other. For example, Chain will happily work with regular functions:foo = Chain(exp, sum, log)\nfoo([1,2,3]) == 3.408 == log(sum(exp([1,2,3])))This unification opens up the floor for some powerful features, which we'll discuss later in the guide."
+},
+
+{
+    "location": "manual/basics.html#The-Template-1",
+    "page": "First Steps",
+    "title": "The Template",
+    "category": "section",
+    "text": "... Calculating Tax Expenses ...[WIP]"
 },
 
 {