diff --git a/latest/apis/backends.html b/latest/apis/backends.html
index 1ce58c3d..e8c81278 100644
--- a/latest/apis/backends.html
+++ b/latest/apis/backends.html
@@ -150,7 +150,7 @@ Backends
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/apis/backends.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/apis/backends.md">
             <span class="fa">
 
             </span>
diff --git a/latest/apis/batching.html b/latest/apis/batching.html
index c3fa3a8d..343ae5a9 100644
--- a/latest/apis/batching.html
+++ b/latest/apis/batching.html
@@ -155,7 +155,7 @@ Batching
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/apis/batching.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/apis/batching.md">
             <span class="fa">
 
             </span>
diff --git a/latest/apis/storage.html b/latest/apis/storage.html
index 5976cc5c..ba2918c3 100644
--- a/latest/apis/storage.html
+++ b/latest/apis/storage.html
@@ -139,7 +139,7 @@ Storing Models
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/apis/storage.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/apis/storage.md">
             <span class="fa">
 
             </span>
diff --git a/latest/contributing.html b/latest/contributing.html
index f4e1b2be..9669ead0 100644
--- a/latest/contributing.html
+++ b/latest/contributing.html
@@ -136,7 +136,7 @@ Contributing &amp; Help
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/contributing.md">
             <span class="fa">
 
             </span>
diff --git a/latest/examples/char-rnn.html b/latest/examples/char-rnn.html
index 8081cb24..4b1a0f7b 100644
--- a/latest/examples/char-rnn.html
+++ b/latest/examples/char-rnn.html
@@ -139,7 +139,7 @@ Char RNN
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/examples/char-rnn.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/examples/char-rnn.md">
             <span class="fa">
 
             </span>
diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html
index 16775fe3..b2aaa2f9 100644
--- a/latest/examples/logreg.html
+++ b/latest/examples/logreg.html
@@ -139,7 +139,7 @@ Simple MNIST
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/examples/logreg.md">
             <span class="fa">
 
             </span>
diff --git a/latest/index.html b/latest/index.html
index a8eea678..426df9f5 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -147,7 +147,7 @@ Home
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/index.md">
             <span class="fa">
 
             </span>
@@ -162,13 +162,17 @@ Flux
         </a>
       </h1>
       <p>
-Flux is a high-level interface for machine learning, implemented in Julia.
+Flux is a machine learning library, implemented in Julia. In a nutshell, it simply lets you run normal Julia code on a backend like TensorFlow. It also provides many conveniences for doing deep learning in particular.
       </p>
       <p>
-Flux aims to be an intuitive and powerful notation, close to the mathematics, that provides advanced features like auto-unrolling and closures. Simple models are trivial, while the most complex architectures are tractable, taking orders of magnitude less code than in other frameworks. Meanwhile, the Flux compiler provides excellent error messages and tools for debugging when things go wrong.
+This gives you great flexibility. You can use a convenient Keras-like API if you want something simple, but you can also drop down to straight mathematics, or build your own abstractions. You can even use Flux&#39;s utilities (like optimisers) with a completely different backend (like 
+        <a href="https://github.com/denizyuret/Knet.jl">
+Knet
+        </a>
+) or mix and match approaches.
       </p>
       <p>
-So what&#39;s the catch? Flux is at an early &quot;working prototype&quot; stage; many things work but the API is still in a state of... well, it might change. If you&#39;re interested to find out what works, read on!
+Note that Flux is in alpha. Many things work but the API is still in a state of... well, it might change.
       </p>
       <p>
         <strong>
@@ -190,7 +194,7 @@ The
         <a href="examples/logreg.html">
 examples
         </a>
- are the best way to get a feel for how Flux looks. This a great way to start if you&#39;re a relative newbie to machine learning or neural networks; you should be able to get the examples running fairly easily.
+ give a feel for high-level usage. This a great way to start if you&#39;re a relative newbie to machine learning or neural networks; you can get up and running running easily.
       </p>
       <p>
 If you have more experience with ML, or you just don&#39;t want to see 
@@ -201,22 +205,7 @@ those digits
         <a href="models/basics.html">
 model building guide
         </a>
- instead. The Guide attempts to motivate Flux&#39;s programming model and approach with examples. However, it also gets into advanced usage very quickly; it&#39;s not necessary to memorise all the details to use Flux effectively.
-      </p>
-      <p>
-The sections on 
-        <a href="models/recurrent.html">
-Recurrence
-        </a>
-, 
-        <a href="models/debugging.html">
-Debugging
-        </a>
- and 
-        <a href="apis/batching.html">
-Batching
-        </a>
- best illustrate what makes Flux unique.
+ instead. The guide attempts to show how Flux&#39;s abstractions are built up and why it&#39;s powerful, but it&#39;s not all necessary to get started.
       </p>
       <h2>
         <a class="nav-anchor" id="Installation-1" href="#Installation-1">
diff --git a/latest/internals.html b/latest/internals.html
index fa4e9a82..9a4913b9 100644
--- a/latest/internals.html
+++ b/latest/internals.html
@@ -136,7 +136,7 @@ Internals
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/internals.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/basics.html b/latest/models/basics.html
index 22113ca2..5be58553 100644
--- a/latest/models/basics.html
+++ b/latest/models/basics.html
@@ -57,14 +57,24 @@ Building Models
 Model Building Basics
               </a>
               <ul class="internal">
+                <li>
+                  <a class="toctext" href="#Functions-1">
+Functions
+                  </a>
+                </li>
                 <li>
                   <a class="toctext" href="#The-Model-1">
 The Model
                   </a>
                 </li>
                 <li>
-                  <a class="toctext" href="#Combining-Models-1">
-Combining Models
+                  <a class="toctext" href="#Layers-1">
+Layers
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Combining-Layers-1">
+Combining Layers
                   </a>
                 </li>
                 <li>
@@ -155,7 +165,7 @@ Model Building Basics
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/models/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/models/basics.md">
             <span class="fa">
 
             </span>
@@ -169,6 +179,43 @@ Model Building Basics
 Model Building Basics
         </a>
       </h1>
+      <h2>
+        <a class="nav-anchor" id="Functions-1" href="#Functions-1">
+Functions
+        </a>
+      </h2>
+      <p>
+Flux&#39;s core feature is the 
+<code>@net</code>
+ macro, which adds some superpowers to regular ol&#39; Julia functions. Consider this simple function with the 
+<code>@net</code>
+ annotation applied:
+      </p>
+<pre><code class="language-julia">@net f(x) = x .* x
+f([1,2,3]) == [1,4,9]</code></pre>
+      <p>
+This behaves as expected, but we have some extra features. For example, we can convert the function to run on 
+        <a href="https://www.tensorflow.org/">
+TensorFlow
+        </a>
+ or  
+        <a href="https://github.com/dmlc/MXNet.jl">
+MXNet
+        </a>
+:
+      </p>
+<pre><code class="language-julia">f_mxnet = mxnet(f)
+f_mxnet([1,2,3]) == [1.0, 4.0, 9.0]</code></pre>
+      <p>
+Simples! Flux took care of a lot of boilerplate for us and just ran the multiplication on MXNet. MXNet can optimise this code for us, taking advantage of parallelism or running the code on a GPU.
+      </p>
+      <p>
+Using MXNet, we can get the gradient of the function, too:
+      </p>
+<pre><code class="language-julia">back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0])</code></pre>
+      <p>
+At first glance, this may seem broadly similar to building a graph in TensorFlow. The difference is that the Julia code still behaves like Julia code. Error messages continue to give you helpful stacktraces that pinpoint mistakes. You can step through the code in the debugger. The code only runs once when it&#39;s called, as usual, rather than once to build the graph and once to execute it.
+      </p>
       <h2>
         <a class="nav-anchor" id="The-Model-1" href="#The-Model-1">
 The Model
@@ -214,6 +261,11 @@ parameters
 <code>b</code>
 , and if we tweak those parameters we&#39;ll tweak the result – hopefully to make the predictions more accurate.
       </p>
+      <h2>
+        <a class="nav-anchor" id="Layers-1" href="#Layers-1">
+Layers
+        </a>
+      </h2>
       <p>
 This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a 
         <em>
@@ -244,8 +296,8 @@ templates
  out of the box, so we&#39;ll use that for now.
       </p>
       <h2>
-        <a class="nav-anchor" id="Combining-Models-1" href="#Combining-Models-1">
-Combining Models
+        <a class="nav-anchor" id="Combining-Layers-1" href="#Combining-Layers-1">
+Combining Layers
         </a>
       </h2>
       <p>
diff --git a/latest/models/debugging.html b/latest/models/debugging.html
index 811f6b53..72ee259e 100644
--- a/latest/models/debugging.html
+++ b/latest/models/debugging.html
@@ -139,7 +139,7 @@ Debugging
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/models/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/models/debugging.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/recurrent.html b/latest/models/recurrent.html
index 6a493dda..bc3b6a8d 100644
--- a/latest/models/recurrent.html
+++ b/latest/models/recurrent.html
@@ -139,7 +139,7 @@ Recurrence
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/models/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/models/recurrent.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/templates.html b/latest/models/templates.html
index d6a6e814..4f0337c2 100644
--- a/latest/models/templates.html
+++ b/latest/models/templates.html
@@ -155,7 +155,7 @@ Model Templates
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c9dcc815dc9ddce0981bf004453214e3d5aab929/docs/src/models/templates.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/9b76b307b6e59e4102b12de769122471af208582/docs/src/models/templates.md">
             <span class="fa">
 
             </span>
diff --git a/latest/search_index.js b/latest/search_index.js
index 81d62e4e..d0986e47 100644
--- a/latest/search_index.js
+++ b/latest/search_index.js
@@ -13,7 +13,7 @@ var documenterSearchIndex = {"docs": [
     "page": "Home",
     "title": "Flux",
     "category": "section",
-    "text": "Flux is a high-level interface for machine learning, implemented in Julia.Flux aims to be an intuitive and powerful notation, close to the mathematics, that provides advanced features like auto-unrolling and closures. Simple models are trivial, while the most complex architectures are tractable, taking orders of magnitude less code than in other frameworks. Meanwhile, the Flux compiler provides excellent error messages and tools for debugging when things go wrong.So what's the catch? Flux is at an early \"working prototype\" stage; many things work but the API is still in a state of... well, it might change. If you're interested to find out what works, read on!Note: If you're using Julia v0.5 please see this version of the docs instead."
+    "text": "Flux is a machine learning library, implemented in Julia. In a nutshell, it simply lets you run normal Julia code on a backend like TensorFlow. It also provides many conveniences for doing deep learning in particular.This gives you great flexibility. You can use a convenient Keras-like API if you want something simple, but you can also drop down to straight mathematics, or build your own abstractions. You can even use Flux's utilities (like optimisers) with a completely different backend (like Knet) or mix and match approaches.Note that Flux is in alpha. Many things work but the API is still in a state of... well, it might change.Note: If you're using Julia v0.5 please see this version of the docs instead."
 },
 
 {
@@ -21,7 +21,7 @@ var documenterSearchIndex = {"docs": [
     "page": "Home",
     "title": "Where do I start?",
     "category": "section",
-    "text": "The examples are the best way to get a feel for how Flux looks. This a great way to start if you're a relative newbie to machine learning or neural networks; you should be able to get the examples running fairly easily.If you have more experience with ML, or you just don't want to see those digits again, check out the model building guide instead. The Guide attempts to motivate Flux's programming model and approach with examples. However, it also gets into advanced usage very quickly; it's not necessary to memorise all the details to use Flux effectively.The sections on Recurrence, Debugging and Batching best illustrate what makes Flux unique."
+    "text": "The examples give a feel for high-level usage. This a great way to start if you're a relative newbie to machine learning or neural networks; you can get up and running running easily.If you have more experience with ML, or you just don't want to see those digits again, check out the model building guide instead. The guide attempts to show how Flux's abstractions are built up and why it's powerful, but it's not all necessary to get started."
 },
 
 {
@@ -48,18 +48,34 @@ var documenterSearchIndex = {"docs": [
     "text": ""
 },
 
+{
+    "location": "models/basics.html#Functions-1",
+    "page": "Model Building Basics",
+    "title": "Functions",
+    "category": "section",
+    "text": "Flux's core feature is the @net macro, which adds some superpowers to regular ol' Julia functions. Consider this simple function with the @net annotation applied:@net f(x) = x .* x\nf([1,2,3]) == [1,4,9]This behaves as expected, but we have some extra features. For example, we can convert the function to run on TensorFlow or  MXNet:f_mxnet = mxnet(f)\nf_mxnet([1,2,3]) == [1.0, 4.0, 9.0]Simples! Flux took care of a lot of boilerplate for us and just ran the multiplication on MXNet. MXNet can optimise this code for us, taking advantage of parallelism or running the code on a GPU.Using MXNet, we can get the gradient of the function, too:back!(f_mxnet, [1,1,1], [1,2,3]) == ([2.0, 4.0, 6.0])At first glance, this may seem broadly similar to building a graph in TensorFlow. The difference is that the Julia code still behaves like Julia code. Error messages continue to give you helpful stacktraces that pinpoint mistakes. You can step through the code in the debugger. The code only runs once when it's called, as usual, rather than once to build the graph and once to execute it."
+},
+
 {
     "location": "models/basics.html#The-Model-1",
     "page": "Model Building Basics",
     "title": "The Model",
     "category": "section",
-    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate.This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own (randomly initialised) version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see templates), but Flux provides Affine out of the box, so we'll use that for now."
+    "text": "... Initialising Photon Beams ...The core concept in Flux is the model. A model (or \"layer\") is simply a function with parameters. For example, in plain Julia code, we could define the following function to represent a logistic regression (or simple neural network):W = randn(3,5)\nb = randn(3)\naffine(x) = W * x + b\n\nx1 = rand(5) # [0.581466,0.606507,0.981732,0.488618,0.415414]\ny1 = softmax(affine(x1)) # [0.32676,0.0974173,0.575823]affine is simply a function which takes some vector x1 and outputs a new one y1. For example, x1 could be data from an image and y1 could be predictions about the content of that image. However, affine isn't static. It has parameters W and b, and if we tweak those parameters we'll tweak the result – hopefully to make the predictions more accurate."
 },
 
 {
-    "location": "models/basics.html#Combining-Models-1",
+    "location": "models/basics.html#Layers-1",
     "page": "Model Building Basics",
-    "title": "Combining Models",
+    "title": "Layers",
+    "category": "section",
+    "text": "This is all well and good, but we usually want to have more than one affine layer in our network; writing out the above definition to create new sets of parameters every time would quickly become tedious. For that reason, we want to use a template which creates these functions for us:affine1 = Affine(5, 5)\naffine2 = Affine(5, 5)\n\nsoftmax(affine1(x1)) # [0.167952, 0.186325, 0.176683, 0.238571, 0.23047]\nsoftmax(affine2(x1)) # [0.125361, 0.246448, 0.21966, 0.124596, 0.283935]We just created two separate Affine layers, and each contains its own (randomly initialised) version of W and b, leading to a different result when called with our data. It's easy to define templates like Affine ourselves (see templates), but Flux provides Affine out of the box, so we'll use that for now."
+},
+
+{
+    "location": "models/basics.html#Combining-Layers-1",
+    "page": "Model Building Basics",
+    "title": "Combining Layers",
     "category": "section",
     "text": "... Inflating Graviton Zeppelins ...A more complex model usually involves many basic layers like affine, where we use the output of one layer as the input to the next:mymodel1(x) = softmax(affine2(σ(affine1(x))))\nmymodel1(x1) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]This syntax is again a little unwieldy for larger networks, so Flux provides another template of sorts to create the function for us:mymodel2 = Chain(affine1, σ, affine2, softmax)\nmymodel2(x2) # [0.187935, 0.232237, 0.169824, 0.230589, 0.179414]mymodel2 is exactly equivalent to mymodel1 because it simply calls the provided functions in sequence. We don't have to predefine the affine layers and can also write this as:mymodel3 = Chain(\n  Affine(5, 5), σ,\n  Affine(5, 5), softmax)You now know enough to take a look at the logistic regression example, if you haven't already."
 },