diff --git a/latest/apis/backends.html b/latest/apis/backends.html
index 3167c5d9..dece983e 100644
--- a/latest/apis/backends.html
+++ b/latest/apis/backends.html
@@ -140,7 +140,7 @@ Backends
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/apis/backends.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/apis/backends.md">
             <span class="fa">
 
             </span>
diff --git a/latest/apis/batching.html b/latest/apis/batching.html
index 9d985349..711162b9 100644
--- a/latest/apis/batching.html
+++ b/latest/apis/batching.html
@@ -145,7 +145,7 @@ Batching
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/apis/batching.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/apis/batching.md">
             <span class="fa">
 
             </span>
@@ -298,17 +298,12 @@ Right now, the
       <ul>
         <li>
           <p>
-Code can be written in a batch-agnostic way or be generic across batching setups. Code works with a single data point, and batching happens independently.
+Code can be written in a batch-agnostic way or be generic across batching strategies.
           </p>
         </li>
         <li>
           <p>
-Automatic batching can be done with correctness assured, reducing programmer errors when manipulating dimensions.
-          </p>
-        </li>
-        <li>
-          <p>
-Optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.
+Batching and optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.
           </p>
         </li>
         <li>
@@ -317,6 +312,52 @@ This also opens the door for more automatic optimisations, e.g. having the compi
           </p>
         </li>
       </ul>
+      <p>
+Here&#39;s a more detailed illustration of how it might look for code to be &quot;generic across batching&quot;. Take for example a weight matrix 
+<code>W</code>
+ times a vector 
+<code>x</code>
+, as used in a logistic regression or a simple neural network:
+      </p>
+<pre><code class="language-julia">   W    *   x  =&gt;   y
+(10×28) * (28) =&gt; (10)</code></pre>
+      <p>
+If we want to work with a batch of 50 
+<code>x</code>
+s, one option is to stack the data into a matrix of size 
+<code>28 × 50</code>
+.
+      </p>
+<pre><code class="language-julia">   W    *    x    =&gt;    y
+(10×28) * (28×50) =&gt; (10×50)</code></pre>
+      <p>
+This works, but we may find that it&#39;s slow or doesn&#39;t fit well with the rest of the model, which batches on the first dimension. For that reason we may instead want to put the data in a 
+<code>50 × 28</code>
+ matrix and alter the code as follows:
+      </p>
+<pre><code class="language-julia">   x    *    W&#39;   =&gt;    y
+(50×28) * (28×10) =&gt; (50×10)</code></pre>
+      <p>
+to make the shapes work out. This code change is not ideal; in more complex cases it can become fiddly and error-prone, and it means that the code is less reusable, tied to a particular implementation strategy.
+      </p>
+      <p>
+There&#39;s an alternative. We keep the same code, but represent the batched 
+<code>x</code>
+s as either a 
+<code>Batch{Vector,1}</code>
+ or a 
+<code>Batch{Vector,2}</code>
+, depending on how the data is stacked. Then we can simply overload 
+<code>*</code>
+ as follows:
+      </p>
+<pre><code class="language-julia">*(W::Matrix, x::Batch{Vector,1}) = x * W&#39;
+*(W::Matrix, x::Batch{Vector,2}) = W * x</code></pre>
+      <p>
+This means that we can always write 
+<code>W*x</code>
+, and the code is reusable in a larger network regardless of the overall batching approach. Moreover, Julia&#39;s type system ensures there&#39;s no runtime cost to doing this, and we can compile the code appropriately for backends like TensorFlow as well.
+      </p>
       <footer>
         <hr/>
         <a class="previous" href="../models/debugging.html">
diff --git a/latest/contributing.html b/latest/contributing.html
index 1425d58c..e6fd4848 100644
--- a/latest/contributing.html
+++ b/latest/contributing.html
@@ -126,7 +126,7 @@ Contributing &amp; Help
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/contributing.md">
             <span class="fa">
 
             </span>
diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html
index 328d7113..ff26556e 100644
--- a/latest/examples/logreg.html
+++ b/latest/examples/logreg.html
@@ -129,7 +129,7 @@ Logistic Regression
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/examples/logreg.md">
             <span class="fa">
 
             </span>
diff --git a/latest/index.html b/latest/index.html
index d2e06e06..ade7156f 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -132,7 +132,7 @@ Home
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/index.md">
             <span class="fa">
 
             </span>
diff --git a/latest/internals.html b/latest/internals.html
index 06274d45..3a55975f 100644
--- a/latest/internals.html
+++ b/latest/internals.html
@@ -126,7 +126,7 @@ Internals
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/internals.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/basics.html b/latest/models/basics.html
index fd65a02a..9ed2829b 100644
--- a/latest/models/basics.html
+++ b/latest/models/basics.html
@@ -145,7 +145,7 @@ Model Building Basics
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/models/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/models/basics.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/debugging.html b/latest/models/debugging.html
index 11b1419d..71c1cd4f 100644
--- a/latest/models/debugging.html
+++ b/latest/models/debugging.html
@@ -129,7 +129,7 @@ Debugging
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/models/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/models/debugging.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/recurrent.html b/latest/models/recurrent.html
index 4d0610e1..b079c899 100644
--- a/latest/models/recurrent.html
+++ b/latest/models/recurrent.html
@@ -129,7 +129,7 @@ Recurrence
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/models/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/models/recurrent.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/templates.html b/latest/models/templates.html
index 32235ca9..42eb8d7e 100644
--- a/latest/models/templates.html
+++ b/latest/models/templates.html
@@ -145,7 +145,7 @@ Model Templates
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/e1cd688917d90dd80ff59f926ae24e27e1c7635e/docs/src/models/templates.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/c05ef6286aa284f818bb0f3b90d4347efe8c29e6/docs/src/models/templates.md">
             <span class="fa">
 
             </span>
diff --git a/latest/search_index.js b/latest/search_index.js
index c38c1e6d..1749db76 100644
--- a/latest/search_index.js
+++ b/latest/search_index.js
@@ -173,7 +173,7 @@ var documenterSearchIndex = {"docs": [
     "page": "Batching",
     "title": "Future Work",
     "category": "section",
-    "text": "The design of batching is still a fairly early work in progress, though it's used in a few places in the system. For example, all Flux models expect to be given Batch objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it's convenient to call a model with a single data point like f([1,2,3]).Right now, the Batch or Seq types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:Code can be written in a batch-agnostic way or be generic across batching setups. Code works with a single data point, and batching happens independently.\nAutomatic batching can be done with correctness assured, reducing programmer errors when manipulating dimensions.\nOptimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.\nThis also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations."
+    "text": "The design of batching is still a fairly early work in progress, though it's used in a few places in the system. For example, all Flux models expect to be given Batch objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it's convenient to call a model with a single data point like f([1,2,3]).Right now, the Batch or Seq types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:Code can be written in a batch-agnostic way or be generic across batching strategies.\nBatching and optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.\nThis also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations.Here's a more detailed illustration of how it might look for code to be \"generic across batching\". Take for example a weight matrix W times a vector x, as used in a logistic regression or a simple neural network:   W    *   x  =>   y\n(10×28) * (28) => (10)If we want to work with a batch of 50 xs, one option is to stack the data into a matrix of size 28 × 50.   W    *    x    =>    y\n(10×28) * (28×50) => (10×50)This works, but we may find that it's slow or doesn't fit well with the rest of the model, which batches on the first dimension. For that reason we may instead want to put the data in a 50 × 28 matrix and alter the code as follows:   x    *    W'   =>    y\n(50×28) * (28×10) => (50×10)to make the shapes work out. This code change is not ideal; in more complex cases it can become fiddly and error-prone, and it means that the code is less reusable, tied to a particular implementation strategy.There's an alternative. We keep the same code, but represent the batched xs as either a Batch{Vector,1} or a Batch{Vector,2}, depending on how the data is stacked. Then we can simply overload * as follows:*(W::Matrix, x::Batch{Vector,1}) = x * W'\n*(W::Matrix, x::Batch{Vector,2}) = W * xThis means that we can always write W*x, and the code is reusable in a larger network regardless of the overall batching approach. Moreover, Julia's type system ensures there's no runtime cost to doing this, and we can compile the code appropriately for backends like TensorFlow as well."
 },
 
 {